Basic RAG retrieves chunks. GraphRAG retrieves understanding. When your enterprise knowledge spans 1.2 million documents with complex relationships—contracts referencing other contracts, policies linking to regulations, employees connected to projects—basic vector search fails. GraphRAG uses knowledge graphs to capture these relationships and deliver accurate, contextual answers.
What is GraphRAG?
GraphRAG (Graph Retrieval-Augmented Generation) is an architecture that combines knowledge graphs with vector embeddings for LLM retrieval. Unlike basic RAG which retrieves text chunks based on semantic similarity, GraphRAG understands entity relationships, traverses connections, and retrieves contextually relevant information even when the exact words don't match. Microsoft introduced the term in 2024, and it's now the standard for enterprise knowledge systems.
What is Basic RAG?
Basic RAG (Retrieval-Augmented Generation) embeds documents into vectors and retrieves the most semantically similar chunks to a query. It works well for simple lookups but struggles with multi-hop reasoning, relationship queries, and large document collections where context matters more than keyword matching.
1. Why Basic RAG Fails at Scale
Basic RAG works great for simple use cases:
- "What's our refund policy?"
- "How do I reset my password?"
- "What are the product features?"
But it breaks down when queries require:
Multi-Hop Reasoning
Query: "Which contracts with Acme Corp reference the 2023 pricing amendment?"
Basic RAG: Searches for "Acme Corp" + "2023 pricing amendment"
Result: Returns chunks mentioning these terms, but misses contracts that
reference the amendment indirectly through clause numbers
GraphRAG: Traverses: Acme Corp → Contracts → References → Amendment 2023-P
Result: Returns all 7 contracts, including 3 with indirect references
Relationship Queries
Query: "Who approved the budget for Project Atlas and what's their reporting chain?"
Basic RAG: Finds mentions of "Project Atlas" and "budget" and "approved"
Result: Scattered chunks, no clear answer
GraphRAG: Traverses: Project Atlas → Budget → Approval → Sarah Chen → Reports To → ...
Result: "Sarah Chen (VP Engineering) approved. Reports to: Mike Johnson (CTO) → CEO"
Temporal Context
Query: "What changed in our security policy after the 2024 audit?"
Basic RAG: Returns current security policy and audit report
Result: No diff, no timeline
GraphRAG: Traverses: Security Policy → Versions → Changes After → Audit 2024
Result: "3 changes: MFA requirement added, password rotation increased,
new vendor review process. Effective March 2024."
The Numbers
| Scenario | Basic RAG Accuracy | GraphRAG Accuracy |
|---|---|---|
| Simple lookup | 92% | 94% |
| Multi-hop reasoning | 54% | 89% |
| Relationship queries | 41% | 87% |
| Temporal queries | 38% | 82% |
| Average (complex queries) | 44% | 86% |
GraphRAG delivers ~2x better accuracy on complex enterprise queries.
2. GraphRAG Architecture Overview
The Three Layers
Component Breakdown
| Component | Purpose | Technology Options |
|---|---|---|
| Vector Store | Semantic similarity search | Pinecone, Weaviate, Qdrant, pgvector |
| Graph Database | Relationship traversal | Neo4j, Amazon Neptune, TigerGraph |
| Keyword Search | Exact match, filters | OpenSearch, Elasticsearch, PostgreSQL |
| Fusion Layer | Combine and re-rank results | Custom logic, Cohere Rerank |
| LLM | Generate final response | Claude, GPT-4, Gemini |
3. When to Use GraphRAG vs Basic RAG
Use Basic RAG When:
✅ Documents are independent (no relationships matter) ✅ Queries are simple lookups ("What is X?") ✅ Document collection is small (<10,000 docs) ✅ Speed is critical (graph traversal adds latency) ✅ Budget is limited (graphs add infrastructure cost)
Examples: FAQ bots, simple documentation search, customer support for straightforward issues.
Use GraphRAG When:
✅ Documents reference each other (contracts, policies, regulations) ✅ Queries involve relationships ("Who approved what?") ✅ Multi-hop reasoning is common ✅ Accuracy on complex queries is critical ✅ Document collection is large and interconnected
Examples: Legal document analysis, enterprise knowledge bases, compliance systems, research databases.
Decision Matrix
| Factor | Choose Basic RAG | Choose GraphRAG |
|---|---|---|
| Query complexity | Simple lookups | Multi-hop reasoning |
| Document relationships | Independent | Interconnected |
| Accuracy requirement | Good enough (85%+) | Must be high (95%+) |
| Latency tolerance | <500ms required | 1-3s acceptable |
| Infrastructure budget | Limited | Available |
| Team expertise | Basic ML | Graph + ML |
4. Building Your Knowledge Graph
Step 1: Entity Extraction
Extract entities from your documents:
from langchain.chat_models import ChatAnthropic
from langchain.prompts import ChatPromptTemplate
llm = ChatAnthropic(model="claude-3-sonnet-20240229")
extraction_prompt = ChatPromptTemplate.from_template("""
Extract all entities and relationships from this document.
Document:
{document}
Return JSON with:
- entities: list of {{name, type, properties}}
- relationships: list of {{source, target, type, properties}}
Entity types: Person, Organization, Contract, Policy, Project, Date, Amount
Relationship types: WORKS_FOR, APPROVED_BY, REFERENCES, EFFECTIVE_DATE, AMOUNT
JSON:
""")
def extract_entities(document: str) -> dict:
response = llm.invoke(extraction_prompt.format(document=document))
return json.loads(response.content)
Step 2: Build the Graph in Neo4j
from neo4j import GraphDatabase
class KnowledgeGraph:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def add_entity(self, entity: dict):
query = """
MERGE (e:{type} {{name: $name}})
SET e += $properties
RETURN e
""".format(type=entity['type'])
with self.driver.session() as session:
session.run(query, name=entity['name'], properties=entity.get('properties', {}))
def add_relationship(self, rel: dict):
query = """
MATCH (source {{name: $source}})
MATCH (target {{name: $target}})
MERGE (source)-[r:{type}]->(target)
SET r += $properties
""".format(type=rel['type'])
with self.driver.session() as session:
session.run(query,
source=rel['source'],
target=rel['target'],
properties=rel.get('properties', {}))
def ingest_document(self, document: str, doc_id: str):
# Extract entities and relationships
extracted = extract_entities(document)
# Add document node
self.add_entity({
'type': 'Document',
'name': doc_id,
'properties': {'content': document[:1000]} # Store snippet
})
# Add entities
for entity in extracted['entities']:
self.add_entity(entity)
# Link entity to document
self.add_relationship({
'source': doc_id,
'target': entity['name'],
'type': 'CONTAINS'
})
# Add relationships
for rel in extracted['relationships']:
self.add_relationship(rel)
Step 3: Create Vector Embeddings
from langchain.embeddings import BedrockEmbeddings
import pinecone
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1")
def embed_and_store(document: str, doc_id: str, metadata: dict):
# Generate embedding
vector = embeddings.embed_query(document)
# Store in Pinecone
pinecone_index.upsert([
{
'id': doc_id,
'values': vector,
'metadata': {
**metadata,
'text': document[:8000] # Store text for retrieval
}
}
])
Graph Schema Example
// Schema for a contract analysis system
// Nodes
(:Contract {id, title, effective_date, status})
(:Party {name, type, jurisdiction})
(:Clause {id, type, text})
(:Amendment {id, date, description})
(:Person {name, title, email})
// Relationships
(:Contract)-[:BETWEEN]->(:Party)
(:Contract)-[:CONTAINS]->(:Clause)
(:Contract)-[:AMENDED_BY]->(:Amendment)
(:Clause)-[:REFERENCES]->(:Clause)
(:Contract)-[:APPROVED_BY]->(:Person)
(:Person)-[:WORKS_FOR]->(:Party)
5. Implementing Hybrid Search
The power of GraphRAG comes from combining multiple search methods.
Hybrid Query Flow
class HybridSearch:
def __init__(self, vector_store, graph_db, keyword_search):
self.vector = vector_store
self.graph = graph_db
self.keyword = keyword_search
def search(self, query: str, top_k: int = 10) -> list:
# 1. Vector search (semantic similarity)
vector_results = self.vector.similarity_search(query, k=top_k)
# 2. Graph traversal (relationship-based)
entities = self.extract_entities_from_query(query)
graph_results = self.graph.traverse(entities, max_hops=2)
# 3. Keyword search (exact match)
keyword_results = self.keyword.search(query, k=top_k)
# 4. Fuse results using Reciprocal Rank Fusion
fused = self.reciprocal_rank_fusion([
vector_results,
graph_results,
keyword_results
], weights=[0.4, 0.4, 0.2])
return fused[:top_k]
def reciprocal_rank_fusion(self, result_lists: list, weights: list, k=60) -> list:
"""
Combine multiple ranked lists using RRF.
Score = sum(weight / (k + rank)) for each list
"""
scores = {}
for results, weight in zip(result_lists, weights):
for rank, doc in enumerate(results):
doc_id = doc['id']
if doc_id not in scores:
scores[doc_id] = {'doc': doc, 'score': 0}
scores[doc_id]['score'] += weight / (k + rank + 1)
# Sort by fused score
ranked = sorted(scores.values(), key=lambda x: x['score'], reverse=True)
return [item['doc'] for item in ranked]
Graph Traversal Queries
// Find all contracts related to a party and their amendments
MATCH (p:Party {name: $party_name})<-[:BETWEEN]-(c:Contract)
OPTIONAL MATCH (c)-[:AMENDED_BY]->(a:Amendment)
OPTIONAL MATCH (c)-[:CONTAINS]->(clause:Clause)-[:REFERENCES]->(ref:Clause)
RETURN c, collect(DISTINCT a) as amendments, collect(DISTINCT ref) as references
// Find approval chain for a document
MATCH (d:Document {id: $doc_id})-[:APPROVED_BY]->(approver:Person)
MATCH path = (approver)-[:REPORTS_TO*0..3]->(manager:Person)
RETURN path
// Find related documents through shared entities
MATCH (d1:Document {id: $doc_id})-[:CONTAINS]->(e:Entity)<-[:CONTAINS]-(d2:Document)
WHERE d1 <> d2
RETURN d2, count(e) as shared_entities
ORDER BY shared_entities DESC
LIMIT 10
6. Production Architecture
Recommended Stack
AWS Implementation
# Infrastructure as Code (CDK)
from aws_cdk import (
Stack,
aws_lambda as lambda_,
aws_apigateway as apigw,
aws_s3 as s3,
aws_neptune as neptune, # Or use Neo4j Aura
)
class GraphRAGStack(Stack):
def __init__(self, scope, id, **kwargs):
super().__init__(scope, id, **kwargs)
# Document bucket
self.doc_bucket = s3.Bucket(self, "DocumentBucket")
# Graph database (Neptune or Neo4j Aura)
self.graph_db = neptune.DatabaseCluster(self, "GraphDB",
instance_type=neptune.InstanceType.R5_LARGE,
vpc=vpc
)
# Query handler
self.query_handler = lambda_.Function(self, "QueryHandler",
runtime=lambda_.Runtime.PYTHON_3_11,
handler="query.handler",
timeout=Duration.seconds(30),
memory_size=1024,
environment={
"GRAPH_ENDPOINT": self.graph_db.cluster_endpoint.hostname,
"PINECONE_INDEX": "graphrag-prod",
"OPENSEARCH_ENDPOINT": opensearch_domain.domain_endpoint
}
)
# API Gateway
self.api = apigw.RestApi(self, "GraphRAGAPI")
self.api.root.add_resource("query").add_method(
"POST",
apigw.LambdaIntegration(self.query_handler)
)
Scaling Considerations
| Component | Scaling Strategy |
|---|---|
| Neo4j | Read replicas, causal clustering |
| Pinecone | Managed scaling, pod selection |
| OpenSearch | Horizontal sharding |
| Lambda | Concurrency limits, provisioned concurrency |
| LLM | Request queuing, fallback models |
7. Performance Optimization
Query Latency Breakdown
| Stage | Typical Latency | Optimization |
|---|---|---|
| Intent classification | 50-100ms | Cache common patterns |
| Vector search | 50-150ms | Reduce k, use approximate NN |
| Graph traversal | 100-500ms | Index hot paths, limit hops |
| Keyword search | 20-50ms | Optimize index settings |
| Result fusion | 10-30ms | Pre-compute common fusions |
| LLM generation | 1-3s | Stream response, use smaller context |
| Total | 1.5-4s |
Optimization Techniques
1. Graph Indexing
// Create indexes for common traversal patterns
CREATE INDEX contract_party FOR (c:Contract) ON (c.party_name);
CREATE INDEX clause_type FOR (cl:Clause) ON (cl.type);
CREATE INDEX person_name FOR (p:Person) ON (p.name);
// Create full-text index for text search in graph
CREATE FULLTEXT INDEX contract_text FOR (c:Contract) ON EACH [c.title, c.summary];
2. Caching Strategy
import redis
cache = redis.Redis(host='localhost', port=6379)
def cached_search(query: str, cache_ttl: int = 3600):
cache_key = f"graphrag:{hash(query)}"
# Check cache
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# Execute search
results = hybrid_search.search(query)
# Cache results
cache.setex(cache_key, cache_ttl, json.dumps(results))
return results
3. Query Planning
def plan_query(query: str) -> dict:
"""
Analyze query and determine optimal search strategy.
"""
# Classify query type
if is_relationship_query(query):
return {
'strategy': 'graph_first',
'weights': {'vector': 0.2, 'graph': 0.7, 'keyword': 0.1}
}
elif is_exact_match_query(query):
return {
'strategy': 'keyword_first',
'weights': {'vector': 0.2, 'graph': 0.2, 'keyword': 0.6}
}
else:
return {
'strategy': 'balanced',
'weights': {'vector': 0.4, 'graph': 0.4, 'keyword': 0.2}
}
8. Real Implementation: Legal Lens AI
At Cognilium, we built Legal Lens AI using GraphRAG architecture. Here's how it performs in production.
The Challenge
- 1.2 million contracts to analyze
- Contracts reference other contracts, amendments, clauses
- Users ask complex queries: "Find all contracts with Acme Corp that have non-standard termination clauses referencing the 2022 master agreement"
- 95% accuracy required for enterprise compliance
Our Architecture
Results
| Metric | Basic RAG | GraphRAG (Legal Lens) |
|---|---|---|
| Simple query accuracy | 91% | 95% |
| Multi-hop query accuracy | 52% | 94% |
| Relationship query accuracy | 43% | 96% |
| Average latency | 1.2s | 2.8s |
| User satisfaction | 72% | 94% |
Key Learnings
-
Entity extraction quality matters most: Garbage in, garbage out. We spent 40% of development time on extraction prompts.
-
Graph schema design is critical: Bad schema = slow queries. We redesigned twice before production.
-
Hybrid search beats single-method: Neither vector nor graph alone matched hybrid performance.
-
Caching reduces latency significantly: 60% of queries hit cache, reducing average latency from 2.8s to 1.1s.
9. Common GraphRAG Mistakes
Mistake 1: Over-Extracting Entities
# ❌ Bad: Extract everything
entities: ["the", "contract", "shall", "be", "effective", ...]
# ✅ Good: Extract meaningful entities
entities: ["Acme Corp", "Master Agreement 2024", "Section 5.2", "$500,000"]
Mistake 2: Ignoring Relationship Direction
// ❌ Bad: Undirected relationships lose meaning
(Contract)-[:RELATED]-(Amendment)
// ✅ Good: Directed relationships preserve semantics
(Contract)-[:AMENDED_BY]->(Amendment)
(Amendment)-[:AMENDS]->(Contract)
Mistake 3: No Graph Indexes
// ❌ Bad: Full graph scan on every query
MATCH (c:Contract) WHERE c.title CONTAINS "Acme"
// ✅ Good: Use indexes
CREATE FULLTEXT INDEX contract_title FOR (c:Contract) ON EACH [c.title]
CALL db.index.fulltext.queryNodes("contract_title", "Acme") YIELD node
Mistake 4: Ignoring Edge Cases
# ❌ Bad: Assumes extraction always works
entities = extract_entities(document)
for entity in entities:
graph.add(entity)
# ✅ Good: Handle extraction failures
try:
entities = extract_entities(document)
if not entities:
# Fall back to keyword extraction
entities = keyword_extract(document)
for entity in entities:
graph.add(entity)
except ExtractionError as e:
logger.warning(f"Extraction failed for {doc_id}: {e}")
# Store document without graph enrichment
vector_store.add(document, doc_id)
Mistake 5: No Evidence Mapping
# ❌ Bad: Return answer without sources
return {"answer": "The contract was approved by John Smith."}
# ✅ Good: Include evidence trail
return {
"answer": "The contract was approved by John Smith.",
"evidence": [
{
"source": "Contract-2024-001, Section 12",
"text": "Approved by: John Smith, VP Legal",
"confidence": 0.95
}
],
"graph_path": ["Contract-2024-001", "APPROVED_BY", "John Smith"]
}
10. Getting Started
Quick Start: 30 Minutes to First GraphRAG Query
Prerequisites:
- Python 3.9+
- Neo4j Aura account (free tier available)
- Pinecone account (free tier available)
- AWS Bedrock access
Step 1: Install Dependencies
pip install neo4j pinecone-client langchain boto3
Step 2: Set Up Neo4j
from neo4j import GraphDatabase
driver = GraphDatabase.driver(
"neo4j+s://your-instance.neo4j.io",
auth=("neo4j", "your-password")
)
# Create schema
with driver.session() as session:
session.run("""
CREATE CONSTRAINT doc_id IF NOT EXISTS FOR (d:Document) REQUIRE d.id IS UNIQUE
""")
Step 3: Ingest a Document
# See code examples in Section 4 above
graph = KnowledgeGraph(uri, user, password)
graph.ingest_document(document_text, "doc-001")
Step 4: Query
hybrid = HybridSearch(vector_store, graph, keyword_search)
results = hybrid.search("Find contracts referencing Section 5.2")
Next Steps
-
RAG vs GraphRAG Comparison → Detailed benchmarks and decision framework
-
Building Knowledge Graphs with Neo4j → Deep dive on graph construction
-
Enterprise RAG Security → RBAC, audit trails, compliance
-
Hybrid Search Implementation → Vector + Keyword + Graph fusion
-
Evidence-Mapped Retrieval → Why citations matter in enterprise
Need help building GraphRAG for your enterprise?
At Cognilium, we built Legal Lens AI analyzing 1.2M contracts with 95% accuracy. Let's discuss your knowledge system →
Share this article
Muhammad Mudassir
Founder & CEO, Cognilium AI
Muhammad Mudassir
Founder & CEO, Cognilium AI
Mudassir Marwat is the Founder & CEO of Cognilium AI, where he leads the design and deployment of pr...
