Does a multi-agent AI system need a knowledge graph?

Not by default. Agents that work on independent items in parallel, like scoring, classifying, or summarizing, coordinate well through a shared state table and a message queue. You need a knowledge graph when the relationships between entities are themselves the answer, such as multi-hop questions or provenance chains. A 23-agent contract reviewer we built uses no graph database on purpose, while a smaller family-office platform we built is graph-native, because its value is the connections between companies, investments, and people.

What is the difference between an orchestration graph and a knowledge graph?

An orchestration graph, sometimes called a routing graph, describes control flow: which agent calls which, and under what condition. A knowledge graph describes data: entities and the explicit relationships between them. They are easy to confuse because both are "graphs," but one is about how computation flows and the other is about what your facts mean. A system can have a rich orchestration graph and no knowledge graph at all, which is exactly the case in our contract reviewer.

How does agent routing reduce LLM cost?

By not running every agent on every input. In our contract reviewer, a first wave of scoring agents rates each clause against each legal category, and a router fires only the specialist analysts whose category scored above a threshold. Pruning those routing edges cut model calls by roughly 75% against a brute-force baseline that runs every analyst on everything. The savings come entirely from the edges you choose not to traverse.

Do the 23 agents share memory or talk to each other?

No. They never communicate directly. Each agent reads and writes a single record per chunk, holding the category scores and the list of analysts to run, in a key-value store, and pulls work from a queue. That is shared state, not shared memory. The design tolerates partial failure: scoring succeeds if at least six of the twelve scorers return, and analysis proceeds if at least half of the routed analysts do, with a circuit breaker that trips after three consecutive failures.

When would you add a knowledge graph to a contract-review system?

At the cross-clause dependency layer. Clauses are not independent: an indemnity modifies a liability cap which is qualified by a warranty, so a redline that is correct in isolation can be wrong in context. Modeling clauses as nodes and their dependencies as edges turns conflict detection into a graph traversal instead of a deduplication heuristic. We currently handle this with a consolidation step rather than a graph, and that dependency graph is the frontier we would build next.

What 23 Agents Taught Us About Knowledge Graphs

We build a contract-review system that runs 23 AI agents over every contract a client sends it. It reads each clause, scores it against twelve legal categories, and hands lawyers risk-rated redlines inside Microsoft Word. When engineers hear "23 agents" and "legal documents" in the same sentence, they assume there is a knowledge graph underneath, because that is the architecture the industry has been told to reach for. There is not. We looked at the problem, decided a graph would be expensive decoration, and shipped without one.

That decision, and the reasoning behind it, taught us more about when a system actually needs a knowledge graph than most of the graphs we have built. This is the seventh post in our series on graph rot, and it is the counterweight to the rest. The other posts are about graphs that earn their place. This one is about a genuinely complex agent system that, on purpose, has none.

The graph that mattered was the one between the agents, not the one in the data.

That line is the whole post. There are two completely different things people mean by "graph" in an AI system, and conflating them is how teams end up paying for a knowledge graph they never needed.

What does a 23-agent contract reviewer actually do?

It reviews a contract clause by clause against a company's own legal playbook, and it does the reading in two waves of agents.

A contract arrives, gets parsed, and is cut into chunks of roughly five hundred tokens each, so a typical contract becomes a few dozen pieces. Then the first wave runs: twelve scoring agents, one per legal category, covering scope of supply, commercial terms, delivery, warranty and liability, intellectual property, regulatory compliance, confidentiality, insurance, termination, force majeure, dispute resolution, and general provisions. Every scoring agent reads every chunk and rates it from 0.0 to 1.0 for how strongly that chunk belongs to its category. The second wave is eleven domain analyst agents, one per specialty, and each one produces a risk assessment and a suggested revision for the clauses that land in its lane.

Twelve scorers plus eleven analysts is the twenty-three. It is worth being precise about that count, because our Paralegent product page talks about eleven specialists. Those are the eleven analysts, the agents that actually write the redlines a lawyer reads. The twelve scoring agents run upstream and never produce a redline. Their only job is to decide which analysts get called, which is the part of the story this post is about.

The system is a custom multi-agent design, not a LangGraph or CrewAI build, running on a fast and inexpensive model (Claude 3 Haiku on AWS Bedrock) because the call volume is high. End to end, a contract takes about five to ten minutes. One real run in our logs processed twenty-two clause chunks with one hundred and sixteen model calls in about a hundred and fifty-four seconds. Across the full pipeline a single contract can fire anywhere from one thousand to over three thousand model calls. That number is the reason everything below matters.

So where is the graph in a system that has no graph database?

It is in the orchestration. The agents themselves form a graph, even though no graph is ever stored.

An agent routing graph: each contract clause is scored by 12 scoring agents, then a router fires only the analyst agents whose category scored above the threshold, and the pruned edges represent roughly 75% fewer model calls.

Picture the twelve scorers, the router, and the eleven analysts as nodes. It is a directed, roughly bipartite graph: every scored category can, in principle, hand work to an analyst, and an edge lights up only when a chunk's score for that category crosses a threshold, which we default to 0.5. Drawn out, the system is scatter, route, gather. Scatter each chunk to all twelve scorers. Route on the scores. Gather the analysts that the scores selected. A consolidation step at the end reconciles what the analysts produced.

So there is a graph here, and it is a real one. It is a graph of control flow, not a graph of facts. It describes which agent calls which, under what condition, not what any clause means or how clauses relate. Once you see the routing as a graph, the single most valuable engineering decision in the system stops being "what database" and becomes "which edges do we refuse to traverse."

What did treating routing as a graph problem actually buy us?

About a 75% cut in model calls, which is most of what makes the system affordable to run at all.

The brute-force version of this design is a fully connected graph: run all eleven analysts on every chunk, no matter what the chunk contains. A confidentiality clause gets analyzed by the insurance analyst, the force-majeure analyst, the intellectual-property analyst, and the rest, almost all of them producing nothing useful. With dozens of chunks and a thousand-plus model calls already in flight, that waste compounds into real latency and real cost.

Smart routing prunes the graph. The scores from the first wave decide which analysts a given chunk actually reaches, so a confidentiality clause goes to the confidentiality analyst and stops there. Pruning those edges cuts model calls by roughly 75% against the brute-force baseline. Read that back: the largest efficiency win in a 23-agent system came from a graph that stored no knowledge at all. It simply decided what not to compute.

The same graph view is what makes the system tolerate failure gracefully. Scoring is treated as good enough if at least six of the twelve scorers return, analysis proceeds if at least half of the routed analysts succeed, and a circuit breaker trips after three consecutive failures and recovers after two minutes. A graph of independent agents degrades node by node. One analyst timing out does not sink the contract, it just dims one edge. This is the discipline behind our multi-agent and agent-orchestration work, and almost none of it requires a stored graph.

Why didn't we store any of it in a graph database?

Because none of the data has relationships worth traversing, and the coordination that looked like it needed a graph did not.

There are two reasons people assume a system this size needs a knowledge graph, and neither one held here. The first is the data. The unit of work is a single clause, scored against a single category, judged on its own content. There is no chain to walk, no "this clause controls that entity which appears in that document." Compare the family-office investment platform we described in do you actually need a knowledge graph: there the entire value was a path, from a holding vehicle to an investment to a company to the person who controls it, and a plain retrieval system could not assemble it. Here the answer to every question lives inside one clause. When the value is in the content rather than the connections, you are not in graph territory, and we said exactly that in the buyer post.

The second reason is coordination. Twenty-three agents sound like they need a shared brain. They do not talk to each other at all. Each agent reads and writes one record per chunk, a row that carries the category scores and the list of analysts to run, in an ordinary key-value store, and pulls its work off a queue. That is shared state, not shared memory, and it is certainly not a graph. The lesson we keep relearning: agents that score independent things in parallel coordinate fine through a table and a router. You reach for a graph when the relationships between the things are the point, not when you are judging the things one at a time.

When would a knowledge graph have earned its place here?

At exactly one frontier: the dependencies between clauses. It is the thing we would model with a graph next.

Contract clauses are not as independent as a scoring grid treats them. An indemnity clause modifies a limitation-of-liability cap, which interacts with the warranty, which is qualified by a force-majeure clause three sections away. A redline that is correct in isolation can be wrong once you account for the clause that overrides it. Today the system handles this with a consolidation step that deduplicates and reconciles overlapping recommendations across analysts. Publicly we describe it as an orchestrator that resolves conflicts when clauses overlap and keeps only the most accurate redlines. It is the pragmatic, non-graph way to catch the obvious collisions.

A consolidation pass flattens those relationships into a cleanup heuristic. A clause-dependency graph would model them directly: clauses as nodes, relationships like "modifies," "caps," and "is qualified by" as edges, and conflict detection as a traversal rather than a deduplication. That is a genuine GraphRAG problem hiding inside a system that, for its core job of scoring and routing, correctly has no graph. Naming that frontier honestly matters, because "where would a graph help next" is a completely different question from "should the whole system have been a graph." The first answer is yes, in one specific layer. The second is no.

So when do we actually reach for a knowledge graph?

When the relationships between entities are the product, not a side effect. We have built both kinds of system, and the line between them is clean.

On the family-office platform, the deliverable was the connections. It has six entity types joined by explicit edges, a managed Neo4j graph for the structure and provenance, a separate retrieval index for the unstructured text, and a small set of agents wired together by a router that decides which to call. We reached for a graph there because the questions were paths, and because identity and provenance had to hold up under scrutiny. That is the system in the buyer post, and the graph is the right shape because relationships are the answer.

On the contract reviewer, the deliverable is a judgment on each clause. Twenty-three agents, a scores table, a router, no stored graph. We did not reach for a graph because there was no relationship in the data that a client was paying us to traverse. Same firm, same toolbox, opposite call. The deciding question was identical to the one from the buyer post: is the value in the connections, or in the content of each piece. That judgment, made before a line of code, is most of what our legal AI and data engineering work actually delivers.

What did 23 agents teach us about graphs?

Four lessons, and they apply far outside legal contracts.

First, tell the two graphs apart. A graph of agents, meaning control flow and who calls whom, is not a graph of facts, meaning entities and the relationships between them. A system can be built from agents arranged in an elaborate graph and still need no knowledge graph at all. Most "do we need a graph" confusion is this single conflation.

Second, the cheapest computation is the edge you do not traverse. The biggest efficiency win in the system, roughly 75% fewer model calls, came from treating routing as a graph and pruning it, not from any database. If your agents are expensive, look at your orchestration graph before you look at your data model.

Third, coordination is not a reason to build a graph. Independent agents scoring things in parallel coordinate fine through a shared table and a queue. Shared state is not shared memory, and shared memory is not a graph.

Fourth, know the frontier where a graph would help, and be honest that you have not crossed it yet. For us that frontier is cross-clause dependency, a real GraphRAG problem we have scoped and not built, sitting beside a core that correctly has none.

The pattern under all four is the same one this whole series keeps arriving at. A knowledge graph is a tool for relationships between entities, and the discipline is refusing to use it for anything else, no matter how complex the system gets. The 23-agent reviewer is complex. It still did not need a knowledge graph. Knowing why is the same skill as knowing when you do, which is the test we laid out in scoring a graph before you trust it and the reason graph rot is the name of this series.

We build multi-agent systems and knowledge graphs, and a fair amount of our work is telling clients which of the two their problem actually needs. If you are designing an agent system and are not sure whether a graph belongs in it, book a 15-minute call.

What 23 Agents Taught Us About Knowledge Graphs

What does a 23-agent contract reviewer actually do?

So where is the graph in a system that has no graph database?

What did treating routing as a graph problem actually buy us?

Why didn't we store any of it in a graph database?

When would a knowledge graph have earned its place here?

So when do we actually reach for a knowledge graph?

What did 23 agents teach us about graphs?

Share this article

Muhammad Mudassir

Muhammad Mudassir

Terms in this article

Frequently Asked Questions

Does a multi-agent AI system need a knowledge graph?

What is the difference between an orchestration graph and a knowledge graph?

How does agent routing reduce LLM cost?

Do the 23 agents share memory or talk to each other?

When would you add a knowledge graph to a contract-review system?

Still have questions?

Related Articles

Graph Rot: Why Your Knowledge Graph Is Lying to Your AI

One Company, Eleven Names: How a Knowledge Graph Learns Identity

The Edge That Shouldn't Exist: Detecting Wrong Relationships in a Knowledge Graph

Explore More Insights