"The AI said it, so it must be true." That doesn't work in enterprises. When your legal team asks where an answer came from, when auditors need proof, when users demand verification—you need traceable citations. Evidence-mapped retrieval makes every AI answer provable. Here's how to implement it.
What is Evidence-Mapped Retrieval?
Evidence-mapped retrieval is a RAG pattern where every claim in an AI response is explicitly linked to source documents. Instead of just generating an answer, the system extracts citations, calculates confidence scores, and provides an audit trail showing exactly which documents support each statement.
1. Why Citations Matter
The Trust Problem
User: "Can we terminate this contract early?"
AI: "Yes, you can terminate with 30 days notice."
User: "Where does it say that?"
AI: "..." 😬
Without citations, users can't verify. Auditors can't audit. Legal can't rely on it.
Enterprise Requirements
| Stakeholder | Requirement |
|---|---|
| Legal | Every claim traceable to source |
| Compliance | Audit trail for all queries |
| Users | Confidence in answers |
| Auditors | Verifiable evidence chain |
The Business Case
- Reduced escalations: Users verify without asking humans
- Audit readiness: Pass SOC 2 / HIPAA inspections
- User adoption: Trust drives usage
- Error detection: Wrong citations reveal hallucinations
2. The Evidence-Mapping Pattern
Standard RAG Response
{
"answer": "The contract can be terminated with 30 days written notice to the other party."
}
Evidence-Mapped Response
{
"answer": "The contract can be terminated with 30 days written notice to the other party.",
"citations": [
{
"claim": "terminated with 30 days written notice",
"source": {
"document_id": "CONTRACT-2024-001",
"section": "Section 8.2 - Termination",
"page": 12,
"text": "Either party may terminate this Agreement upon thirty (30) days prior written notice to the other party."
},
"confidence": 0.95,
"match_type": "exact"
}
],
"metadata": {
"sources_consulted": 5,
"sources_cited": 1,
"overall_confidence": 0.95
}
}
3. Citation Extraction Methods
Method 1: Inline Citation (LLM-Generated)
CITATION_PROMPT = """Answer the question using only the provided documents.
For each claim, include a citation in brackets like [Doc1, Section 2.3].
Documents:
{documents}
Question: {question}
Answer with inline citations:"""
def generate_with_citations(query: str, documents: list) -> str:
docs_text = "\n\n".join([
f"[Doc{i+1}] {doc['title']}\n{doc['content']}"
for i, doc in enumerate(documents)
])
response = anthropic.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1000,
messages=[{
"role": "user",
"content": CITATION_PROMPT.format(
documents=docs_text,
question=query
)
}]
)
return response.content[0].text
Output:
The contract can be terminated with 30 days written notice [Doc1, Section 8.2].
Both parties must provide notice in writing [Doc1, Section 8.2].
Method 2: Post-Generation Attribution
def attribute_claims(answer: str, documents: list) -> list:
sentences = split_into_sentences(answer)
citations = []
for sentence in sentences:
best_match = None
best_score = 0
for doc in documents:
score = compute_similarity(sentence, doc["content"])
if score > best_score:
best_score = score
best_match = doc
if best_score > 0.7:
citations.append({
"claim": sentence,
"source": {
"document_id": best_match["id"],
"text": find_matching_excerpt(sentence, best_match["content"])
},
"confidence": best_score
})
else:
citations.append({
"claim": sentence,
"source": None,
"confidence": best_score,
"warning": "Low confidence - may be model knowledge"
})
return citations
Method 3: Structured Extraction (Best for Enterprise)
STRUCTURED_PROMPT = """Answer the question using the provided documents.
Return your response as JSON with this exact structure:
{
"answer": "Your complete answer here",
"claims": [
{
"statement": "A specific claim from your answer",
"source_doc": "Document ID",
"source_section": "Section/page reference",
"source_quote": "Exact quote supporting this claim",
"confidence": 0.0-1.0
}
]
}
Documents:
{documents}
Question: {question}
JSON Response:"""
4. Confidence Scoring
Confidence Factors
| Factor | Weight | Description |
|---|---|---|
| Semantic similarity | 40% | How closely claim matches source text |
| Lexical overlap | 20% | Exact word matches |
| Source authority | 20% | Document type, recency, official status |
| Retrieval rank | 20% | Higher-ranked docs = higher confidence |
Implementation
def calculate_confidence(
claim: str,
source_text: str,
document_metadata: dict,
retrieval_rank: int
) -> float:
semantic_score = compute_similarity(claim, source_text)
claim_words = set(claim.lower().split())
source_words = set(source_text.lower().split())
lexical_score = len(claim_words & source_words) / len(claim_words | source_words)
authority_scores = {
"official_policy": 1.0,
"contract": 0.95,
"internal_doc": 0.8,
"email": 0.6,
"draft": 0.4
}
authority_score = authority_scores.get(document_metadata.get("type"), 0.5)
rank_score = 1.0 / (retrieval_rank + 1)
confidence = (
0.4 * semantic_score +
0.2 * lexical_score +
0.2 * authority_score +
0.2 * rank_score
)
return min(1.0, max(0.0, confidence))
Confidence Thresholds
| Confidence | Action |
|---|---|
| > 0.9 | High confidence citation |
| 0.7 - 0.9 | Medium confidence, show citation |
| 0.5 - 0.7 | Low confidence, warn user |
| < 0.5 | No citation, flag as uncertain |
5. Building Audit Trails
Audit Log Schema
@dataclass
class AuditEntry:
timestamp: datetime
query_id: str
user_id: str
query_text: str
documents_retrieved: List[str]
documents_cited: List[str]
answer_generated: str
citations: List[dict]
overall_confidence: float
model_used: str
latency_ms: int
class AuditTrail:
def __init__(self, storage):
self.storage = storage
def log_query(self, entry: AuditEntry):
self.storage.write({
"timestamp": entry.timestamp.isoformat(),
"query_id": entry.query_id,
"user_id": entry.user_id,
"query_text": entry.query_text,
"documents_retrieved": entry.documents_retrieved,
"documents_cited": entry.documents_cited,
"answer": entry.answer_generated,
"citations": entry.citations,
"confidence": entry.overall_confidence,
"model": entry.model_used,
"latency_ms": entry.latency_ms
})
6. Implementation Code
Complete Evidence-Mapped RAG
class EvidenceMappedRAG:
def __init__(self, retriever, llm, audit_trail):
self.retriever = retriever
self.llm = llm
self.audit_trail = audit_trail
def query(self, query: str, user_id: str) -> dict:
query_id = generate_id()
start_time = time.time()
documents = self.retriever.search(query, top_k=10)
result = self.generate_with_citations(query, documents)
for citation in result["claims"]:
citation["confidence"] = calculate_confidence(
citation["statement"],
citation["source_quote"],
documents[citation["source_doc"]],
citation.get("retrieval_rank", 0)
)
confidences = [c["confidence"] for c in result["claims"] if c.get("source_doc")]
result["overall_confidence"] = np.mean(confidences) if confidences else 0.0
self.audit_trail.log_query(AuditEntry(
timestamp=datetime.now(),
query_id=query_id,
user_id=user_id,
query_text=query,
documents_retrieved=[d["id"] for d in documents],
documents_cited=[c["source_doc"] for c in result["claims"] if c.get("source_doc")],
answer_generated=result["answer"],
citations=result["claims"],
overall_confidence=result["overall_confidence"],
model_used="claude-3-sonnet",
latency_ms=int((time.time() - start_time) * 1000)
))
return result
7. User Experience Design
Citation Display
Confidence Indicators
| Score | Display | Color |
|---|---|---|
| > 0.9 | ████████████ High | Green |
| 0.7-0.9 | ████████░░░░ Medium | Yellow |
| 0.5-0.7 | ████░░░░░░░░ Low | Orange |
| < 0.5 | ⚠️ Uncertain | Red |
Next Steps
- GraphRAG Implementation Guide → - Full architecture for enterprise knowledge systems
- Enterprise RAG Security → - RBAC and compliance controls
- Hybrid Search Implementation → - Better retrieval for better citations
Need help implementing evidence-mapped retrieval?
At Cognilium, we built Legal Lens AI with 95% citation accuracy on 1.2M contracts. Let's discuss your requirements →
Share this article
Muhammad Mudassir
Founder & CEO, Cognilium AI
Muhammad Mudassir
Founder & CEO, Cognilium AI
Mudassir Marwat is the Founder & CEO of Cognilium AI, where he leads the design and deployment of pr...
