What is entity resolution in a knowledge graph?

It is the process of deciding which extracted records refer to the same real-world thing, then merging them into one node. Extraction creates a node per mention; resolution collapses the mentions that describe the same entity, so one company is one node instead of eleven.

Why can't an LLM just do entity resolution by itself?

Because extraction and identity are different problems. An LLM reading one document is good at naming what's in it, but it has no view across all the other documents and tends to merge confidently on thin evidence. Reliable resolution uses the LLM only for the ambiguous cases, on top of normalization, candidate matching, and confidence thresholds.

What signals are most reliable for matching entities?

Identifiers and relationships beat names. A shared tax ID, registered address, or set of officers is hard to share by coincidence, so it carries far more weight than a name match, which is noisy because names are reused, abbreviated, and spelled inconsistently.

What is the difference between deduplication and entity resolution?

Deduplication usually means removing identical copies. Entity resolution is harder: it decides that records which look different (“Acme LLC” vs “ACME, L.L.C.” vs “Acme”) are the same entity, and that records which look similar (two people named Daniel Chen) are not.

How is resolution different on a graph that updates continuously?

It never finishes. New documents keep introducing new spellings and new entities, so resolution has to run on every ingestion, not once at launch. A graph that is resolved only once is clean briefly and then rots as fresh documents arrive.

Can you fix duplicate entities without rebuilding the graph?

Usually yes. Duplicates and wrong merges live in the identity layer, which can be re-resolved in place with a fresh matching pass and a reconciliation step. A full rebuild is only needed when the extraction schema itself was wrong.

One Company, Eleven Names: How a Knowledge Graph Learns Iden

In the last post I described finding the same portfolio company in a client's knowledge graph under eleven different names, and called that kind of silent decay graph rot. This post is about the fix: entity resolution, the part of the pipeline that decides eleven names are one company.

It is the least glamorous problem in knowledge graph engineering and the one that breaks the most systems. Here is how we handle it on the document-intelligence platform we run for a family office managing hundreds of millions in assets.

Why does one company end up as eleven nodes?

Because documents don't agree on names, and extraction copies whatever it reads. “Acme Holdings LLC” in a PPM, “Acme Holdings” in a cap table, “ACME HOLDINGS, L.L.C.” in a K-1, and “Acme” in an email are four strings describing one company. An extraction model reads each document on its own and faithfully creates a node for each spelling. Across six document types (PPMs, SPAs, SAFEs, K-1s, cap tables, and operating agreements), one company can easily pick up a dozen aliases before anyone looks.

The aliases are not random noise. Each one is correct in its own context. A K-1 uses the legal name the way the IRS wants it. A pitch deck uses the short marketing name. An email uses whatever the sender typed in a hurry. Multiply that across years of documents and dozens of holdings, and the graph accumulates a sprawl of near-duplicates that each look authoritative on their own page.

The graph isn't wrong about any single document. It is wrong about the world, because it never decided which names point to the same thing.

What is the difference between extraction and entity resolution?

Extraction reads the words. Entity resolution decides who the words are about. They are two separate jobs, and conflating them is the root cause of duplicate-entity rot.

A name is a string. An identity is a decision.

In our pipeline, extraction runs first: Gemini 2.5 Pro pulls structured fields out of each document with a confidence score on every value. That step is good at “this paragraph names a company called X.” It has no opinion on whether company X already exists in the graph. That opinion comes from a dedicated resolution pass that runs after extraction, looking across every document at once instead of one at a time, and writing the result into a Neo4j graph of five node types (company, person, investment, vehicle, document).

What does resolving one entity actually look like?

Take a real shape of the problem. A new SPA arrives naming “Meridian Capital Partners LLC.” The graph already holds “Meridian Capital,” “MERIDIAN CAPITAL, L.L.C.,” and a fourth node that is just “Meridian” with no suffix at all.

Normalization collapses the first three immediately: strip the suffix, lowercase, drop punctuation, and the three reduce to the same root. The bare “Meridian” is the hard one. It could be the same firm, or it could be a different Meridian entirely, because the name is common.

So the system does not guess. It checks what else the two nodes share: the same principal listed as a signatory, the same registered address, the same fund referenced across two documents. Three shared signals push the match above threshold, and the nodes become one. Without those signals, the bare “Meridian” stays separate and gets flagged for a person to decide. That restraint is the whole difference between a clean graph and a confidently wrong one.

How do you resolve entities at scale without merging the wrong ones?

You resolve in stages, and you treat merging as a decision that needs evidence, not a string match. Our cross-document linker runs four steps:

Normalize. Strip legal suffixes, casing, and punctuation so “ACME HOLDINGS, L.L.C.” and “Acme Holdings LLC” reduce to the same comparable form. This alone collapses the easy duplicates and shrinks the work that follows.
Find candidates. For each entity, pull the small set of existing nodes it could plausibly match, rather than comparing it against the whole graph. Comparing everything to everything does not scale and is not necessary; almost every pair is obviously unrelated and never needs a second look.
Score the match. Compare candidates on more than the name: shared people, shared addresses, shared identifiers, the documents they appear in. A name match with no supporting signal is weaker than a partial name match that shares a tax ID and three directors.
Decide by confidence. High-confidence matches merge automatically. Low-confidence matches are flagged for review instead of guessed. The threshold is the whole game, and it is deliberately cautious, because the cost of a wrong merge is higher than the cost of a missed one.

Flowchart: many name strings pass through normalize, candidate matching, and evidence scoring into a confidence gate; matches at 0.85 or higher merge into one node, lower-confidence pairs are flagged for human review, and a reconciler verifies merges against the cap table.

What signals actually decide a match?

The name is the weakest signal in the set, even though it is the one people reach for first. Names are shared by accident (two Meridians), spelled inconsistently on purpose (legal versus marketing), and truncated to fit a field. A resolver that leans on the name alone inherits all of that noise.

The signals that actually carry weight are the ones that are hard to share by coincidence: a tax identifier, a registered agent or address, the same people attached as officers or signatories, and co-occurrence in the same documents. Each is a vote. We weight them by how discriminating they are, so a matching tax ID counts for far more than a matching city, and sum the evidence into a single score. A pair that shares a name and nothing else sits low; a pair that shares an identifier and three officers sits high even when the names look different. This is the same evidence-first habit we bring to all of our data engineering work: trust the signal that is expensive to fake.

When do you let an LLM make the identity call?

Only on the ambiguous middle, and only with the evidence in front of it. Normalization and scoring resolve most pairs cleanly. What's left is the genuinely hard set: two entities with similar names, partial overlap, and no single decisive field.

For those, we hand the LLM the two candidate records plus the source passages they came from and ask it to judge whether they are the same entity, with a reason. The model is not free-associating from a name. It is reading the evidence we already extracted and making a call we can audit later. Confident pairs never reach this step, which keeps the cost down and keeps the model's reasoning focused on the cases that actually need judgment.

How do you avoid the opposite mistake: merging two things that aren't the same?

You watch for it on purpose, because it is the hardest rot to detect. A silent merge fuses two different entities into one. Two different people named Daniel Chen become a single person with two careers, and the evidence of the mistake is destroyed by the mistake.

We run a reconciler that cross-checks merged entities against the cap table and the source documents, looking for the contradictions a wrong merge produces: a single “person” controlling stakes that don't add up, a company with two conflicting formation dates. After the graph is built, a separate mislink-detection pass looks for edges and merges that the individual steps each thought were fine but that don't survive a second look. These are the same failure modes from the manifesto on the seven ways a knowledge graph rots, caught before an agent ever queries them.

What happens when a human has to decide?

The flagged pairs do not vanish into a queue nobody reads. Each one is presented as a decision with the evidence attached: the two records side by side, the documents each came from, the signals that matched and the ones that did not. The reviewer answers a specific question, “are these the same entity, yes or no,” instead of hunting through raw files.

That framing matters. A vague “clean up the duplicates” task is open-ended and never gets done. A concrete “these two share an address but not a name, same or not” question takes seconds and produces a decision the graph can record. We store the human verdicts, so the same ambiguous pair is never asked twice, and the pattern of those verdicts tells us whether the automatic threshold is set too high or too low.

Why does any of this matter to the AI on top?

Because the agent querying the graph cannot tell a clean node from a dirty one. It trusts whatever the graph says. If Meridian exists as four nodes, a question like “what is our total exposure to Meridian” returns one node's slice and silently omits the other three. The answer is specific, confident, and wrong, which is the worst combination for a system people are starting to make decisions on.

Clean entity resolution is what lets a retrieval system actually retrieve. When we build an enterprise search and retrieval layer on top of a graph, its accuracy is capped by the identity layer underneath it. No amount of clever retrieval recovers from a graph that thinks one company is eleven. Resolution is not cleanup that happens after the real work. It is the real work.

Where does this get hardest? The cap table.

A cap table is where entity resolution and arithmetic meet, and where a wrong merge stops being abstract. Every row ties an owner to a number of shares or a percentage, and the percentages have to sum to a whole. That makes the cap table double as a checksum on resolution.

If we merge two owners who are actually one, the combined stake can exceed what the company issued, and the reconciler catches it. If we leave one owner split across two nodes, their position looks smaller than it is and the totals come up short. Either error surfaces as numbers that do not reconcile, which is far easier to catch than a quietly duplicated company buried in a graph of thousands of nodes.

We use that to our advantage. The cap table parser pulls the ownership rows, the matcher aligns them to resolved entities, and the reconciler checks the math. When the totals balance and every owner maps to exactly one node, resolution on that company has a second, independent confirmation. When they don't, the break points straight at the entity that needs another look.

How do you know it actually worked?

You measure the graph against the documents, not against itself. Our nodes carry confidence scores from 0.0 to 1.0, so we can ask the graph directly which entities are low-confidence and route those to human review. The test that matters is simple: ask “how many companies do we hold?” and get a number the team trusts. When that number is stable and defensible, resolution is working. When it drifts every time someone uploads a document, it isn't.

What we got wrong the first time

Our early version merged too eagerly. We set the threshold where it caught all the obvious duplicates, shipped it, and then spent days untangling the silent merges it had created: two different people fused because they happened to share a common name and a city. The fix was not a smarter matching algorithm. It was moving the threshold up and accepting that some duplicates would survive to be caught by a person. A missed merge is a visible duplicate you can fix later. A wrong merge is invisible damage that corrupts every query touching it.

The second thing we got wrong was treating resolution as a one-time batch job. Documents keep arriving, so identity has to be re-decided continuously. A graph that was clean at launch and never re-resolved is just a graph that rots more slowly.

We build and fix knowledge graphs for AI systems, including a document-intelligence platform for a family office managing hundreds of millions in assets. If your graph is full of duplicates, book a 15-minute call.

One Company, Eleven Names: How a Knowledge Graph Learns Identity

Why does one company end up as eleven nodes?

What is the difference between extraction and entity resolution?

What does resolving one entity actually look like?

How do you resolve entities at scale without merging the wrong ones?

What signals actually decide a match?

When do you let an LLM make the identity call?

How do you avoid the opposite mistake: merging two things that aren't the same?

What happens when a human has to decide?

Why does any of this matter to the AI on top?

Where does this get hardest? The cap table.

How do you know it actually worked?

What we got wrong the first time

Share this article

Muhammad Mudassir

Muhammad Mudassir

Frequently Asked Questions

What is entity resolution in a knowledge graph?

Why can't an LLM just do entity resolution by itself?

What signals are most reliable for matching entities?

What is the difference between deduplication and entity resolution?

How is resolution different on a graph that updates continuously?

Can you fix duplicate entities without rebuilding the graph?

Still have questions?

Related Articles

Graph Rot: Why Your Knowledge Graph Is Lying to Your AI

The Edge That Shouldn't Exist: Detecting Wrong Relationships in a Knowledge Graph

How We Score a Knowledge Graph Before We Trust It

Explore More Insights