What's the performance impact of hybrid search vs single-method?

Hybrid search adds 50-200ms latency (parallel execution + fusion) but improves accuracy by 15-25%. The accuracy gain usually justifies the latency cost. Use caching to reduce repeat query latency.

Do I need all three search methods?

Start with vector + keyword (most common hybrid). Add graph search only if you have knowledge graph data and relationship queries. Two-way hybrid captures 80% of the benefit.

How do I choose fusion weights?

Start with equal weights, then tune using grid search on a held-out test set. Weights depend on your query distribution: more exact matches means higher keyword weight; more relationships means higher graph weight.

Should I re-rank after fusion?

Yes, if quality is critical. Cross-encoder re-ranking adds 100-300ms but improves NDCG by 5-10%. LLM re-ranking is slower but even better for complex queries.

How do I handle cases where graph search returns nothing?

This is normal when queries don't contain entities. Fall back to vector + keyword fusion when graph returns empty. Use adaptive weights that reduce graph weight to 0 when no entities are detected.

What's the performance impact of hybrid search vs single-method?

Hybrid search adds 50-200ms latency (parallel execution + fusion) but improves accuracy by 15-25%. The accuracy gain usually justifies the latency cost. Use caching to reduce repeat query latency.

Do I need all three search methods?

Start with vector + keyword (most common hybrid). Add graph search only if you have knowledge graph data and relationship queries. Two-way hybrid captures 80% of the benefit.

How do I choose fusion weights?

Start with equal weights, then tune using grid search on a held-out test set. Weights depend on your query distribution: more exact matches means higher keyword weight; more relationships means higher graph weight.

Should I re-rank after fusion?

Yes, if quality is critical. Cross-encoder re-ranking adds 100-300ms but improves NDCG by 5-10%. LLM re-ranking is slower but even better for complex queries.

How do I handle cases where graph search returns nothing?

This is normal when queries don't contain entities. Fall back to vector + keyword fusion when graph returns empty. Use adaptive weights that reduce graph weight to 0 when no entities are detected.

Hybrid Search: Vector + Keyword + Graph RAG Implementation

Vector search finds semantically similar content but misses exact matches. Keyword search finds exact matches but misses synonyms. Graph search finds relationships but misses context. Hybrid search combines all three—and outperforms any single method by 23% in our benchmarks. Here's how to implement it.

What is Hybrid Search?

Hybrid search combines multiple retrieval methods—typically vector (semantic), keyword (lexical), and graph (relational)—and fuses their results into a single ranked list. The fusion algorithm (usually Reciprocal Rank Fusion) weights and combines results so the strengths of each method compensate for others' weaknesses.

1. Why Single-Method Search Fails

Vector Search Weaknesses

Query: "MSA-2024-001"
Vector Search: Returns contracts about "master service agreements" (semantic match)
Expected: The specific contract with ID MSA-2024-001 (exact match)
Result: ❌ Wrong documents

Keyword Search Weaknesses

Query: "unauthorized termination consequences"
Keyword Search: No exact phrase match
Document has: "breach of contract penalties" (same meaning)
Result: ❌ Missed relevant document

Graph Search Weaknesses

Query: "What are the security requirements?"
Graph Search: No entity to start traversal
Needed: Semantic understanding of "security requirements"
Result: ❌ Can't start without entities

Hybrid Search Wins

Query Type	Vector	Keyword	Graph	Hybrid
Semantic	✅	❌	⚠️	✅
Exact match	❌	✅	⚠️	✅
Relationships	⚠️	❌	✅	✅
Average Accuracy	72%	68%	71%	89%

2. The Three Search Methods

Method 1: Vector Search (Semantic)

from pinecone import Pinecone
from anthropic import Anthropic

pc = Pinecone(api_key="your-key")
index = pc.Index("documents")
anthropic = Anthropic()

def vector_search(query: str, top_k: int = 10) -> list:
    response = anthropic.embeddings.create(
        model="voyage-3",
        input=query
    )
    query_embedding = response.embeddings[0]
    
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    return [
        {
            "id": match.id,
            "score": match.score,
            "content": match.metadata.get("content", ""),
            "source": "vector"
        }
        for match in results.matches
    ]

Method 2: Keyword Search (Lexical)

from opensearchpy import OpenSearch

os_client = OpenSearch(hosts=[{"host": "localhost", "port": 9200}])

def keyword_search(query: str, top_k: int = 10) -> list:
    response = os_client.search(
        index="documents",
        body={
            "query": {
                "multi_match": {
                    "query": query,
                    "fields": ["title^2", "content", "summary"],
                    "type": "best_fields"
                }
            },
            "size": top_k
        }
    )
    
    return [
        {
            "id": hit["_id"],
            "score": hit["_score"],
            "content": hit["_source"].get("content", ""),
            "source": "keyword"
        }
        for hit in response["hits"]["hits"]
    ]

Method 3: Graph Search (Relational)

from neo4j import GraphDatabase

driver = GraphDatabase.driver("neo4j+s://xxx.neo4j.io", auth=("neo4j", "password"))

def graph_search(query: str, top_k: int = 10) -> list:
    entities = extract_entities(query)
    
    if not entities:
        return []
    
    entity_names = [e["name"] for e in entities]
    
    with driver.session() as session:
        result = session.run("""
            MATCH (e)-[r*1..2]-(related:Document)
            WHERE e.name IN $names
            RETURN DISTINCT related.id as id, 
                   related.content as content,
                   count(r) as connection_strength
            ORDER BY connection_strength DESC
            LIMIT $limit
        """, {"names": entity_names, "limit": top_k})
        
        return [
            {
                "id": record["id"],
                "score": record["connection_strength"],
                "content": record["content"],
                "source": "graph"
            }
            for record in result
        ]

3. Reciprocal Rank Fusion

RRF combines ranked lists by considering rank positions, not raw scores.

The Formula

RRF_score(doc) = Σ (1 / (k + rank_in_list))

Where k is typically 60 (prevents high-ranked documents from dominating).

Implementation

def reciprocal_rank_fusion(
    result_lists: list[list[dict]], 
    weights: list[float] = None,
    k: int = 60
) -> list[dict]:
    if weights is None:
        weights = [1.0] * len(result_lists)
    
    total_weight = sum(weights)
    weights = [w / total_weight for w in weights]
    
    doc_scores = {}
    
    for results, weight in zip(result_lists, weights):
        for rank, doc in enumerate(results):
            doc_id = doc["id"]
            
            if doc_id not in doc_scores:
                doc_scores[doc_id] = {
                    "doc": doc,
                    "rrf_score": 0,
                    "sources": []
                }
            
            doc_scores[doc_id]["rrf_score"] += weight / (k + rank + 1)
            doc_scores[doc_id]["sources"].append(doc["source"])
    
    ranked = sorted(
        doc_scores.values(), 
        key=lambda x: x["rrf_score"], 
        reverse=True
    )
    
    return [
        {
            **item["doc"],
            "rrf_score": item["rrf_score"],
            "sources": item["sources"]
        }
        for item in ranked
    ]

4. Implementation: Vector + Keyword

Start with the most common hybrid: vector + keyword.

class HybridSearchV1:
    def __init__(self, vector_store, keyword_store):
        self.vector = vector_store
        self.keyword = keyword_store
    
    def search(
        self, 
        query: str, 
        top_k: int = 10,
        vector_weight: float = 0.6,
        keyword_weight: float = 0.4
    ) -> list:
        vector_results = self.vector.search(query, top_k=top_k * 2)
        keyword_results = self.keyword.search(query, top_k=top_k * 2)
        
        fused = reciprocal_rank_fusion(
            [vector_results, keyword_results],
            weights=[vector_weight, keyword_weight]
        )
        
        return fused[:top_k]

5. Implementation: Adding Graph Search

Full three-way hybrid for maximum recall.

class HybridSearchV2:
    def __init__(self, vector_store, keyword_store, graph_store):
        self.vector = vector_store
        self.keyword = keyword_store
        self.graph = graph_store
    
    def search(self, query: str, top_k: int = 10, weights: dict = None) -> list:
        if weights is None:
            weights = {"vector": 0.4, "keyword": 0.3, "graph": 0.3}
        
        import concurrent.futures
        
        with concurrent.futures.ThreadPoolExecutor() as executor:
            vector_future = executor.submit(self.vector.search, query, top_k * 2)
            keyword_future = executor.submit(self.keyword.search, query, top_k * 2)
            graph_future = executor.submit(self.graph.search, query, top_k * 2)
            
            vector_results = vector_future.result()
            keyword_results = keyword_future.result()
            graph_results = graph_future.result()
        
        fused = reciprocal_rank_fusion(
            [vector_results, keyword_results, graph_results],
            weights=[weights["vector"], weights["keyword"], weights["graph"]]
        )
        
        return fused[:top_k]

Adaptive Weights

def adaptive_search(self, query: str, top_k: int = 10) -> list:
    has_exact_ids = bool(re.search(r'[A-Z]{2,}-\d+', query))
    has_entities = len(extract_entities(query)) > 0
    is_semantic = len(query.split()) > 5
    
    if has_exact_ids:
        weights = {"vector": 0.2, "keyword": 0.7, "graph": 0.1}
    elif has_entities:
        weights = {"vector": 0.3, "keyword": 0.2, "graph": 0.5}
    elif is_semantic:
        weights = {"vector": 0.6, "keyword": 0.3, "graph": 0.1}
    else:
        weights = {"vector": 0.4, "keyword": 0.3, "graph": 0.3}
    
    return self.search(query, top_k, weights)

6. Re-ranking for Final Quality

After fusion, re-rank with a cross-encoder or LLM for best results.

Cross-Encoder Re-ranking

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-12-v2")

def rerank(query: str, results: list, top_k: int = 5) -> list:
    pairs = [(query, r["content"]) for r in results]
    scores = reranker.predict(pairs)
    
    for i, result in enumerate(results):
        result["rerank_score"] = float(scores[i])
    
    reranked = sorted(results, key=lambda x: x["rerank_score"], reverse=True)
    return reranked[:top_k]

7. Tuning Fusion Weights

Optimal weights depend on your data and queries.

Grid Search Approach

def tune_weights(test_queries: list, ground_truth: dict) -> dict:
    best_weights = None
    best_score = 0
    
    for v in [0.2, 0.3, 0.4, 0.5, 0.6]:
        for k in [0.2, 0.3, 0.4]:
            g = 1.0 - v - k
            if g < 0:
                continue
            
            weights = {"vector": v, "keyword": k, "graph": g}
            score = evaluate(test_queries, ground_truth, weights)
            
            if score > best_score:
                best_score = score
                best_weights = weights
    
    return best_weights

Recommended Starting Weights

Use Case	Vector	Keyword	Graph
General purpose	0.4	0.3	0.3
Technical docs (exact IDs matter)	0.3	0.5	0.2
Legal/contracts (relationships matter)	0.3	0.2	0.5
Knowledge base (semantic)	0.5	0.3	0.2

8. Production Architecture

Architecture Diagram

Next Steps

GraphRAG Implementation Guide → - Full architecture for graph-enhanced RAG
RAG vs GraphRAG → - When to add graph search
Evidence-Mapped Retrieval → - Traceable citations in search results

Need help implementing hybrid search?

At Cognilium, we've built hybrid search systems processing millions of documents. Let's discuss your retrieval needs →

Hybrid Search: Combining Vector + Keyword + Graph (Code Examples)

What is Hybrid Search?

1. Why Single-Method Search Fails

Vector Search Weaknesses

Keyword Search Weaknesses

Graph Search Weaknesses

Hybrid Search Wins

2. The Three Search Methods

Method 1: Vector Search (Semantic)

Method 2: Keyword Search (Lexical)

Method 3: Graph Search (Relational)

3. Reciprocal Rank Fusion

The Formula

Implementation

4. Implementation: Vector + Keyword

5. Implementation: Adding Graph Search

Adaptive Weights

6. Re-ranking for Final Quality

Cross-Encoder Re-ranking

7. Tuning Fusion Weights

Grid Search Approach

Recommended Starting Weights

8. Production Architecture

Next Steps

Share this article

Muhammad Mudassir

Muhammad Mudassir

Frequently Asked Questions

What's the performance impact of hybrid search vs single-method?

Do I need all three search methods?

How do I choose fusion weights?

Should I re-rank after fusion?

How do I handle cases where graph search returns nothing?

Still have questions?