TL;DR
Build knowledge graphs with Neo4j for LLM apps. Entity extraction, schema design, Cypher queries, and production integration patterns with Python.
Vector search finds similar text. Knowledge graphs find connected meaning. When your documents reference each other—contracts linking amendments, people connected to projects, policies tied to regulations—you need a graph to capture these relationships. Neo4j is the leading choice for LLM knowledge graphs. Here's how to build one.
What is a Knowledge Graph?
A knowledge graph is a database that stores information as entities (nodes) and relationships (edges). Unlike relational databases with rigid tables, knowledge graphs naturally represent how things connect. For LLMs, knowledge graphs enable multi-hop reasoning: "Find the manager of the person who approved this contract" requires traversing relationships, not just matching keywords.
1. Why Neo4j for LLMs
Neo4j Advantages
| Feature | Benefit for LLMs |
|---|---|
| Native graph storage | Fast traversal, no JOINs |
| Cypher query language | Intuitive relationship queries |
| Built-in visualization | Debug and explore data |
| LLM integrations | LangChain, LlamaIndex connectors |
| Managed cloud (Aura) | No DevOps overhead |
Alternatives Comparison
| Database | Strengths | Weaknesses |
|---|---|---|
| Neo4j | Best tooling, largest community | Higher cost at scale |
| Amazon Neptune | AWS native, managed | Less intuitive query language |
| TigerGraph | Fastest at extreme scale | Steeper learning curve |
| Memgraph | In-memory, very fast | Smaller ecosystem |
For most LLM applications, Neo4j Aura (managed cloud) is the best starting point.
2. Designing Your Graph Schema
Good schema design is 50% of GraphRAG success.
Schema Design Principles
1. Start with Questions
What will users ask?
"Who approved this contract?"
→ Need: Contract, Person, APPROVED_BY relationship
"What policies reference this regulation?"
→ Need: Policy, Regulation, REFERENCES relationship
"Show all documents related to Project Atlas"
→ Need: Document, Project, RELATED_TO relationship
2. Keep Node Types Focused
❌ Bad: Generic "Entity" node for everything
✅ Good: Specific types (Person, Contract, Project, Policy)
3. Relationships Should Have Meaning
❌ Bad: -[:RELATED]- (too vague)
✅ Good: -[:APPROVED_BY]-, -[:REFERENCES]-, -[:WORKS_FOR]-
Example Schema: Contract Analysis
// Node Types
(:Contract {id, title, effective_date, value, status})
(:Party {name, type, jurisdiction})
(:Person {name, title, email, department})
(:Clause {id, type, text, section_number})
(:Amendment {id, date, description})
(:Document {id, content, created_at})
// Relationship Types
(:Contract)-[:BETWEEN {role: "buyer"|"seller"}]->(:Party)
(:Contract)-[:APPROVED_BY {date}]->(:Person)
(:Contract)-[:CONTAINS]->(:Clause)
(:Contract)-[:AMENDED_BY]->(:Amendment)
(:Clause)-[:REFERENCES]->(:Clause)
(:Person)-[:WORKS_FOR]->(:Party)
(:Person)-[:REPORTS_TO]->(:Person)
(:Document)-[:MENTIONS]->(:Contract)
3. Setting Up Neo4j
Option A: Neo4j Aura (Recommended)
# 1. Create account at https://neo4j.com/cloud/aura/
# 2. Create a new database (Free tier available)
# 3. Note connection details:
# - URI: neo4j+s://xxxxx.databases.neo4j.io
# - Username: neo4j
# - Password: (generated)
Option B: Local Docker
docker run \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/your-password \
-e NEO4J_PLUGINS='["apoc"]' \
neo4j:5.15.0
Python Connection
from neo4j import GraphDatabase
class KnowledgeGraph:
def __init__(self, uri: str, user: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def query(self, cypher: str, params: dict = None):
with self.driver.session() as session:
result = session.run(cypher, params or {})
return [record.data() for record in result]
# Initialize
graph = KnowledgeGraph(
uri="neo4j+s://xxxxx.databases.neo4j.io",
user="neo4j",
password="your-password"
)
4. Entity Extraction with LLMs
The quality of your graph depends on extraction quality.
LLM-Based Extraction
from anthropic import Anthropic
client = Anthropic()
EXTRACTION_PROMPT = """Extract entities and relationships from this document.
Document:
{document}
Return JSON with exactly this structure:
{{
"entities": [
{{"name": "entity name", "type": "Person|Contract|Party|Clause|Project", "properties": {{}}}}
],
"relationships": [
{{"source": "entity name", "target": "entity name", "type": "APPROVED_BY|REFERENCES|WORKS_FOR|CONTAINS", "properties": {{}}}}
]
}}
JSON:"""
def extract_entities(document: str) -> dict:
response = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=2000,
messages=[{"role": "user", "content": EXTRACTION_PROMPT.format(document=document)}]
)
import json
text = response.content[0].text
if "```json" in text:
text = text.split("```json")[1].split("```")[0]
return json.loads(text.strip())
5. Ingesting Documents
Full Ingestion Pipeline
class GraphIngestion:
def __init__(self, graph: KnowledgeGraph):
self.graph = graph
def ingest_document(self, doc_id: str, content: str):
extracted = extract_entities(content)
# Create document node
self.graph.query("""
MERGE (d:Document {id: $id})
SET d.content = $content, d.created_at = datetime()
""", {"id": doc_id, "content": content[:5000]})
# Create entity nodes and relationships
for entity in extracted["entities"]:
self._create_entity(entity, doc_id)
for rel in extracted["relationships"]:
self._create_relationship(rel)
6. Essential Cypher Queries
Basic Traversal
// Find all contracts approved by a person
MATCH (p:Person {name: "John Smith"})<-[:APPROVED_BY]-(c:Contract)
RETURN c.title, c.effective_date
// Find the approval chain for a contract
MATCH (c:Contract {id: "CONTRACT-001"})-[:APPROVED_BY]->(approver:Person)
OPTIONAL MATCH path = (approver)-[:REPORTS_TO*1..3]->(manager:Person)
RETURN approver.name, [node in nodes(path) | node.name] as chain
Multi-Hop Queries
// Find contracts that reference a specific clause from another contract
MATCH (c1:Contract)-[:CONTAINS]->(clause1:Clause)-[:REFERENCES]->(clause2:Clause)<-[:CONTAINS]-(c2:Contract)
WHERE c1 <> c2
RETURN c1.title as source_contract, c2.title as referenced_contract
7. Integrating with Your LLM
LangChain Integration
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_anthropic import ChatAnthropic
graph = Neo4jGraph(url="neo4j+s://xxxxx.databases.neo4j.io", username="neo4j", password="your-password")
llm = ChatAnthropic(model="claude-3-sonnet-20240229")
chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph, verbose=True)
response = chain.invoke({"query": "Who approved the contract with Acme Corp?"})
8. Performance Optimization
Essential Indexes
CREATE CONSTRAINT contract_id FOR (c:Contract) REQUIRE c.id IS UNIQUE;
CREATE CONSTRAINT person_name FOR (p:Person) REQUIRE p.name IS UNIQUE;
CREATE INDEX contract_date FOR (c:Contract) ON (c.effective_date);
CREATE FULLTEXT INDEX clause_text FOR (c:Clause) ON EACH [c.text];
9. Common Mistakes
Mistake 1: Overly Generic Schema
❌ Bad: (:Entity {type: "Person", name: "John"})
✅ Good: (:Person {name: "John"})
Mistake 2: Missing Relationship Direction
❌ Bad: (a)-[:RELATED]-(b)
✅ Good: (contract)-[:APPROVED_BY]->(person)
Mistake 3: Storing Full Documents in Graph
❌ Bad: (:Document {content: "... 50,000 characters ..."})
✅ Good: (:Document {id: "doc-001", snippet: "...", s3_key: "documents/doc-001.pdf"})
Next Steps
- GraphRAG Implementation Guide → - Complete architecture with Neo4j
- Hybrid Search Implementation → - Combine graph with vector search
- RAG vs GraphRAG → - When to use each approach
Need help building knowledge graphs?
At Cognilium, we built Legal Lens AI with 4.2M nodes and 8.7M relationships. Let's discuss your graph →
Share this article
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years experience
Mudassir Marwat is the Founder & CEO of Cognilium AI, where he leads the design and deployment of pr...
