Back to Blog
Published:
Last Updated:
Fresh Content

Building Knowledge Graphs for LLMs with Neo4j

11 min read
2,200 words
high priority
M

Muhammad Mudassir

Founder & CEO, Cognilium AI

Neo4j knowledge graph visualization showing entities and relationships connected to LLM for retrieval
Build production knowledge graphs with Neo4j for LLM retrieval. Entity extraction, schema design, Cypher queries, and integration patterns with code.
knowledge graph constructionNeo4j LLM integrationentity extraction graphCypher queries LLMgraph database AI

Vector search finds similar text. Knowledge graphs find connected meaning. When your documents reference each other—contracts linking amendments, people connected to projects, policies tied to regulations—you need a graph to capture these relationships. Neo4j is the leading choice for LLM knowledge graphs. Here's how to build one.

What is a Knowledge Graph?

A knowledge graph is a database that stores information as entities (nodes) and relationships (edges). Unlike relational databases with rigid tables, knowledge graphs naturally represent how things connect. For LLMs, knowledge graphs enable multi-hop reasoning: "Find the manager of the person who approved this contract" requires traversing relationships, not just matching keywords.

1. Why Neo4j for LLMs

Neo4j Advantages

FeatureBenefit for LLMs
Native graph storageFast traversal, no JOINs
Cypher query languageIntuitive relationship queries
Built-in visualizationDebug and explore data
LLM integrationsLangChain, LlamaIndex connectors
Managed cloud (Aura)No DevOps overhead

Alternatives Comparison

DatabaseStrengthsWeaknesses
Neo4jBest tooling, largest communityHigher cost at scale
Amazon NeptuneAWS native, managedLess intuitive query language
TigerGraphFastest at extreme scaleSteeper learning curve
MemgraphIn-memory, very fastSmaller ecosystem

For most LLM applications, Neo4j Aura (managed cloud) is the best starting point.

2. Designing Your Graph Schema

Good schema design is 50% of GraphRAG success.

Schema Design Principles

1. Start with Questions

What will users ask?

"Who approved this contract?"
→ Need: Contract, Person, APPROVED_BY relationship

"What policies reference this regulation?"
→ Need: Policy, Regulation, REFERENCES relationship

"Show all documents related to Project Atlas"
→ Need: Document, Project, RELATED_TO relationship

2. Keep Node Types Focused

❌ Bad: Generic "Entity" node for everything
✅ Good: Specific types (Person, Contract, Project, Policy)

3. Relationships Should Have Meaning

❌ Bad: -[:RELATED]- (too vague)
✅ Good: -[:APPROVED_BY]-, -[:REFERENCES]-, -[:WORKS_FOR]-

Example Schema: Contract Analysis

// Node Types
(:Contract {id, title, effective_date, value, status})
(:Party {name, type, jurisdiction})
(:Person {name, title, email, department})
(:Clause {id, type, text, section_number})
(:Amendment {id, date, description})
(:Document {id, content, created_at})

// Relationship Types
(:Contract)-[:BETWEEN {role: "buyer"|"seller"}]->(:Party)
(:Contract)-[:APPROVED_BY {date}]->(:Person)
(:Contract)-[:CONTAINS]->(:Clause)
(:Contract)-[:AMENDED_BY]->(:Amendment)
(:Clause)-[:REFERENCES]->(:Clause)
(:Person)-[:WORKS_FOR]->(:Party)
(:Person)-[:REPORTS_TO]->(:Person)
(:Document)-[:MENTIONS]->(:Contract)

3. Setting Up Neo4j

Option A: Neo4j Aura (Recommended)

# 1. Create account at https://neo4j.com/cloud/aura/
# 2. Create a new database (Free tier available)
# 3. Note connection details:
#    - URI: neo4j+s://xxxxx.databases.neo4j.io
#    - Username: neo4j
#    - Password: (generated)

Option B: Local Docker

docker run \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/your-password \
    -e NEO4J_PLUGINS='["apoc"]' \
    neo4j:5.15.0

Python Connection

from neo4j import GraphDatabase

class KnowledgeGraph:
    def __init__(self, uri: str, user: str, password: str):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    
    def close(self):
        self.driver.close()
    
    def query(self, cypher: str, params: dict = None):
        with self.driver.session() as session:
            result = session.run(cypher, params or {})
            return [record.data() for record in result]

# Initialize
graph = KnowledgeGraph(
    uri="neo4j+s://xxxxx.databases.neo4j.io",
    user="neo4j",
    password="your-password"
)

4. Entity Extraction with LLMs

The quality of your graph depends on extraction quality.

LLM-Based Extraction

from anthropic import Anthropic

client = Anthropic()

EXTRACTION_PROMPT = """Extract entities and relationships from this document.

Document:
{document}

Return JSON with exactly this structure:
{{
    "entities": [
        {{"name": "entity name", "type": "Person|Contract|Party|Clause|Project", "properties": {{}}}}
    ],
    "relationships": [
        {{"source": "entity name", "target": "entity name", "type": "APPROVED_BY|REFERENCES|WORKS_FOR|CONTAINS", "properties": {{}}}}
    ]
}}

JSON:"""

def extract_entities(document: str) -> dict:
    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=2000,
        messages=[{"role": "user", "content": EXTRACTION_PROMPT.format(document=document)}]
    )
    import json
    text = response.content[0].text
    if "```json" in text:
        text = text.split("```json")[1].split("```")[0]
    return json.loads(text.strip())

5. Ingesting Documents

Full Ingestion Pipeline

class GraphIngestion:
    def __init__(self, graph: KnowledgeGraph):
        self.graph = graph
    
    def ingest_document(self, doc_id: str, content: str):
        extracted = extract_entities(content)
        
        # Create document node
        self.graph.query("""
            MERGE (d:Document {id: $id})
            SET d.content = $content, d.created_at = datetime()
        """, {"id": doc_id, "content": content[:5000]})
        
        # Create entity nodes and relationships
        for entity in extracted["entities"]:
            self._create_entity(entity, doc_id)
        for rel in extracted["relationships"]:
            self._create_relationship(rel)

6. Essential Cypher Queries

Basic Traversal

// Find all contracts approved by a person
MATCH (p:Person {name: "John Smith"})<-[:APPROVED_BY]-(c:Contract)
RETURN c.title, c.effective_date

// Find the approval chain for a contract
MATCH (c:Contract {id: "CONTRACT-001"})-[:APPROVED_BY]->(approver:Person)
OPTIONAL MATCH path = (approver)-[:REPORTS_TO*1..3]->(manager:Person)
RETURN approver.name, [node in nodes(path) | node.name] as chain

Multi-Hop Queries

// Find contracts that reference a specific clause from another contract
MATCH (c1:Contract)-[:CONTAINS]->(clause1:Clause)-[:REFERENCES]->(clause2:Clause)<-[:CONTAINS]-(c2:Contract)
WHERE c1 <> c2
RETURN c1.title as source_contract, c2.title as referenced_contract

7. Integrating with Your LLM

LangChain Integration

from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_anthropic import ChatAnthropic

graph = Neo4jGraph(url="neo4j+s://xxxxx.databases.neo4j.io", username="neo4j", password="your-password")
llm = ChatAnthropic(model="claude-3-sonnet-20240229")

chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph, verbose=True)
response = chain.invoke({"query": "Who approved the contract with Acme Corp?"})

8. Performance Optimization

Essential Indexes

CREATE CONSTRAINT contract_id FOR (c:Contract) REQUIRE c.id IS UNIQUE;
CREATE CONSTRAINT person_name FOR (p:Person) REQUIRE p.name IS UNIQUE;
CREATE INDEX contract_date FOR (c:Contract) ON (c.effective_date);
CREATE FULLTEXT INDEX clause_text FOR (c:Clause) ON EACH [c.text];

9. Common Mistakes

Mistake 1: Overly Generic Schema

❌ Bad: (:Entity {type: "Person", name: "John"})
✅ Good: (:Person {name: "John"})

Mistake 2: Missing Relationship Direction

❌ Bad: (a)-[:RELATED]-(b)
✅ Good: (contract)-[:APPROVED_BY]->(person)

Mistake 3: Storing Full Documents in Graph

❌ Bad: (:Document {content: "... 50,000 characters ..."})
✅ Good: (:Document {id: "doc-001", snippet: "...", s3_key: "documents/doc-001.pdf"})

Next Steps

  1. GraphRAG Implementation Guide → - Complete architecture with Neo4j
  2. Hybrid Search Implementation → - Combine graph with vector search
  3. RAG vs GraphRAG → - When to use each approach

Need help building knowledge graphs?

At Cognilium, we built Legal Lens AI with 4.2M nodes and 8.7M relationships. Let's discuss your graph →

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

Mudassir Marwat is the Founder & CEO of Cognilium AI, where he leads the design and deployment of pr...

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us