Back to Blog
Published:
Last Updated:
Fresh Content

Gemini-Driven Entity Disambiguation With Post-Creation Mislink Detection

7 min read
1,400 words
medium priority
Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

Gemini-Driven Entity Disambiguation With Post-Creation Mislink Detection — Cognilium AI

TL;DR

Auto-merging "Acme Corp" with "Acme Corporation" is the easy half. The hard half is catching the merges that should not have happened — a re-check pass after

Auto-merging "Acme Corp" with "Acme Corporation" is the easy half. The hard half is catching the merges that should not have happened — a re-check pass after creation that flags 3% of merges as suspect.
entity resolutionknowledge graph linkingNeo4jname disambiguationGemini extractionmislink detectiondocument AI

Two PPMs reference "Acme Corp." A cap table references "Acme Corporation." A side letter references "Acme Holdings LLC." Are these the same entity? Probably the first two are. The third is more interesting — it might be the parent company. Getting this right matters because everything downstream (financial roll-ups, ownership tracking, compliance reporting) depends on the entity graph being correct.

Why one-pass linking fails

Conservative linker: merges only when surface form is near-identical. Misses "Acme Corp" / "Acme Corporation" merges that should have happened. Low recall.

Aggressive linker: merges on partial-match heuristics. Merges "Acme Corp" with "Acme Capital" when they are different companies. Low precision.

There is no threshold that gets both. The two-pass approach — aggressive merge with a precision-recovery pass — gets both at the cost of a second pipeline stage.

Pass 1: Gemini-driven disambiguation

At entity ingestion, the candidate entity is shown to Gemini with up to 10 graph neighbors that name-match. Gemini sees: surface form, jurisdiction, EIN if present, registered address, top relationships, source document type. Output: {action: "merge_with_X" | "create_new", confidence}.

Confident merges (>0.85) auto-merge. Uncertain ones (0.6-0.85) queue for human review with the model's reasoning attached. Below 0.6, default to create_new.

Pass 2: post-creation mislink detection

After auto-merge, a check pass compares the merged entity's attributes across all source documents. If "Acme Corp" in document A has post-money valuation $50M and "Acme Corp" in document B has $52M, the merge is suspect — same entity should have the same valuation as of the same date.

  • Numerical fields compared with tolerance (1% on valuations, exact on share counts).
  • Legal-entity fields compared exact (jurisdiction, EIN, registered address).
  • Relational structure compared loosely (≥80% officer overlap).

Disagreement above tolerance flags the merge. The flag goes to a human review queue with both source documents linked and the conflicting attributes highlighted.

What this catches

  • ~15% of auto-merges flagged by mislink check
  • ~3% turn out to be actual mislinks (entities the aggressive linker over-merged)
  • ~12% are tolerable inconsistencies (typos, time-shifted valuations) — human confirms the merge
  • Net precision after mislink check: >99% of merges

Cost

Pass 1 runs at ingestion: ~50 entities/second on cached inputs, ~$0.001 per entity. Pass 2 runs in batch overnight: ~5 entities/second (graph queries are the bottleneck), $0.005 per entity. The mislink check overhead is small relative to the document-extraction cost it backstops.

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI | 10+ years

Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...

Founder & CEO of Cognilium AI; 100+ production AI systems shipped; multi-cloud AI architecture (AWSGCPAzure); built and operated 4 production AI products
Agentic AIRAG → GraphRAG retrievalVoice AIMulti-Agent Orchestration

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us

Related Articles

Continue exploring related topics and insights from our content library.

The 8-Stage Document Intelligence Pipeline
11 min
1
Muhammad Mudassir
May 5, 2026

The 8-Stage Document Intelligence Pipeline

Parse, classify, evidence-map, extract, validate, score, graph, cross-document-link. The eight-stage pipeline that turns unstructured legal/financial PDFs into validated structured data with mislink detection at the end.

words
Read Article
Smart Category Routing for Contract Review
6 min
2
Muhammad Mudassir
May 5, 2026

Smart Category Routing for Contract Review

A focused application of the LLMOps routing pattern to legal contract analysis — the analyst-selection logic that ships fewer clauses to fewer agents and finishes a 3,300-call review in 154 seconds.

words
Read Article
Zero-Trust Multi-Tenant Firestore: Middleware, Claims, and 60+ Wildcard Permissions
9 min
3
Muhammad Mudassir
May 5, 2026

Zero-Trust Multi-Tenant Firestore: Middleware, Claims, and 60+ Wildcard Permissions

Hard tenant isolation on Firestore is not a query-pattern choice — it is a middleware layer, an immutable claim source, and a permission model with wildcards. The architecture that makes cross-tenant data leakage structurally impossible.

words
Read Article

Explore More Insights

Discover more expert articles on AI, engineering, and technology trends.