Back to Blog
Published:
Last Updated:
Fresh Content
Graph Rot & Knowledge Graph QualityChapter 6

What 23 Agents Taught Us About Knowledge Graphs

12 min read
2,544 words
high priority
Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

What 23 Agents Taught Us About Knowledge Graphs

TL;DR

We built a 23-agent contract reviewer with no knowledge graph, on purpose. When agent routing is enough, when you truly need a graph, and the routing trick that cut model calls about 75%.

We built a 23-agent contract-review system and deliberately gave it no knowledge graph. From a team that ships both: how to tell an orchestration graph from a knowledge graph, and when each one actually earns its place.
multi-agent system knowledge graphorchestration graph vs knowledge graphdo agents need a knowledge graphagent routing LLM costGraphRAG multi-agentwhen to use a knowledge graph

We build a contract-review system that runs 23 AI agents over every contract a client sends it. It reads each clause, scores it against twelve legal categories, and hands lawyers risk-rated redlines inside Microsoft Word. When engineers hear "23 agents" and "legal documents" in the same sentence, they assume there is a knowledge graph underneath, because that is the architecture the industry has been told to reach for. There is not. We looked at the problem, decided a graph would be expensive decoration, and shipped without one.

That decision, and the reasoning behind it, taught us more about when a system actually needs a knowledge graph than most of the graphs we have built. This is the seventh post in our series on graph rot, and it is the counterweight to the rest. The other posts are about graphs that earn their place. This one is about a genuinely complex agent system that, on purpose, has none.

The graph that mattered was the one between the agents, not the one in the data.

That line is the whole post. There are two completely different things people mean by "graph" in an AI system, and conflating them is how teams end up paying for a knowledge graph they never needed.

What does a 23-agent contract reviewer actually do?

It reviews a contract clause by clause against a company's own legal playbook, and it does the reading in two waves of agents.

A contract arrives, gets parsed, and is cut into chunks of roughly five hundred tokens each, so a typical contract becomes a few dozen pieces. Then the first wave runs: twelve scoring agents, one per legal category, covering scope of supply, commercial terms, delivery, warranty and liability, intellectual property, regulatory compliance, confidentiality, insurance, termination, force majeure, dispute resolution, and general provisions. Every scoring agent reads every chunk and rates it from 0.0 to 1.0 for how strongly that chunk belongs to its category. The second wave is eleven domain analyst agents, one per specialty, and each one produces a risk assessment and a suggested revision for the clauses that land in its lane.

Twelve scorers plus eleven analysts is the twenty-three. It is worth being precise about that count, because our Paralegent product page talks about eleven specialists. Those are the eleven analysts, the agents that actually write the redlines a lawyer reads. The twelve scoring agents run upstream and never produce a redline. Their only job is to decide which analysts get called, which is the part of the story this post is about.

The system is a custom multi-agent design, not a LangGraph or CrewAI build, running on a fast and inexpensive model (Claude 3 Haiku on AWS Bedrock) because the call volume is high. End to end, a contract takes about five to ten minutes. One real run in our logs processed twenty-two clause chunks with one hundred and sixteen model calls in about a hundred and fifty-four seconds. Across the full pipeline a single contract can fire anywhere from one thousand to over three thousand model calls. That number is the reason everything below matters.

So where is the graph in a system that has no graph database?

It is in the orchestration. The agents themselves form a graph, even though no graph is ever stored.

An agent routing graph: each contract clause is scored by 12 scoring agents, then a router fires only the analyst agents whose category scored above the threshold, and the pruned edges represent roughly 75% fewer model calls.

Picture the twelve scorers, the router, and the eleven analysts as nodes. It is a directed, roughly bipartite graph: every scored category can, in principle, hand work to an analyst, and an edge lights up only when a chunk's score for that category crosses a threshold, which we default to 0.5. Drawn out, the system is scatter, route, gather. Scatter each chunk to all twelve scorers. Route on the scores. Gather the analysts that the scores selected. A consolidation step at the end reconciles what the analysts produced.

So there is a graph here, and it is a real one. It is a graph of control flow, not a graph of facts. It describes which agent calls which, under what condition, not what any clause means or how clauses relate. Once you see the routing as a graph, the single most valuable engineering decision in the system stops being "what database" and becomes "which edges do we refuse to traverse."

What did treating routing as a graph problem actually buy us?

About a 75% cut in model calls, which is most of what makes the system affordable to run at all.

The brute-force version of this design is a fully connected graph: run all eleven analysts on every chunk, no matter what the chunk contains. A confidentiality clause gets analyzed by the insurance analyst, the force-majeure analyst, the intellectual-property analyst, and the rest, almost all of them producing nothing useful. With dozens of chunks and a thousand-plus model calls already in flight, that waste compounds into real latency and real cost.

Smart routing prunes the graph. The scores from the first wave decide which analysts a given chunk actually reaches, so a confidentiality clause goes to the confidentiality analyst and stops there. Pruning those edges cuts model calls by roughly 75% against the brute-force baseline. Read that back: the largest efficiency win in a 23-agent system came from a graph that stored no knowledge at all. It simply decided what not to compute.

The same graph view is what makes the system tolerate failure gracefully. Scoring is treated as good enough if at least six of the twelve scorers return, analysis proceeds if at least half of the routed analysts succeed, and a circuit breaker trips after three consecutive failures and recovers after two minutes. A graph of independent agents degrades node by node. One analyst timing out does not sink the contract, it just dims one edge. This is the discipline behind our multi-agent and agent-orchestration work, and almost none of it requires a stored graph.

Why didn't we store any of it in a graph database?

Because none of the data has relationships worth traversing, and the coordination that looked like it needed a graph did not.

There are two reasons people assume a system this size needs a knowledge graph, and neither one held here. The first is the data. The unit of work is a single clause, scored against a single category, judged on its own content. There is no chain to walk, no "this clause controls that entity which appears in that document." Compare the family-office investment platform we described in do you actually need a knowledge graph: there the entire value was a path, from a holding vehicle to an investment to a company to the person who controls it, and a plain retrieval system could not assemble it. Here the answer to every question lives inside one clause. When the value is in the content rather than the connections, you are not in graph territory, and we said exactly that in the buyer post.

The second reason is coordination. Twenty-three agents sound like they need a shared brain. They do not talk to each other at all. Each agent reads and writes one record per chunk, a row that carries the category scores and the list of analysts to run, in an ordinary key-value store, and pulls its work off a queue. That is shared state, not shared memory, and it is certainly not a graph. The lesson we keep relearning: agents that score independent things in parallel coordinate fine through a table and a router. You reach for a graph when the relationships between the things are the point, not when you are judging the things one at a time.

When would a knowledge graph have earned its place here?

At exactly one frontier: the dependencies between clauses. It is the thing we would model with a graph next.

Contract clauses are not as independent as a scoring grid treats them. An indemnity clause modifies a limitation-of-liability cap, which interacts with the warranty, which is qualified by a force-majeure clause three sections away. A redline that is correct in isolation can be wrong once you account for the clause that overrides it. Today the system handles this with a consolidation step that deduplicates and reconciles overlapping recommendations across analysts. Publicly we describe it as an orchestrator that resolves conflicts when clauses overlap and keeps only the most accurate redlines. It is the pragmatic, non-graph way to catch the obvious collisions.

A consolidation pass flattens those relationships into a cleanup heuristic. A clause-dependency graph would model them directly: clauses as nodes, relationships like "modifies," "caps," and "is qualified by" as edges, and conflict detection as a traversal rather than a deduplication. That is a genuine GraphRAG problem hiding inside a system that, for its core job of scoring and routing, correctly has no graph. Naming that frontier honestly matters, because "where would a graph help next" is a completely different question from "should the whole system have been a graph." The first answer is yes, in one specific layer. The second is no.

So when do we actually reach for a knowledge graph?

When the relationships between entities are the product, not a side effect. We have built both kinds of system, and the line between them is clean.

On the family-office platform, the deliverable was the connections. It has six entity types joined by explicit edges, a managed Neo4j graph for the structure and provenance, a separate retrieval index for the unstructured text, and a small set of agents wired together by a router that decides which to call. We reached for a graph there because the questions were paths, and because identity and provenance had to hold up under scrutiny. That is the system in the buyer post, and the graph is the right shape because relationships are the answer.

On the contract reviewer, the deliverable is a judgment on each clause. Twenty-three agents, a scores table, a router, no stored graph. We did not reach for a graph because there was no relationship in the data that a client was paying us to traverse. Same firm, same toolbox, opposite call. The deciding question was identical to the one from the buyer post: is the value in the connections, or in the content of each piece. That judgment, made before a line of code, is most of what our legal AI and data engineering work actually delivers.

What did 23 agents teach us about graphs?

Four lessons, and they apply far outside legal contracts.

First, tell the two graphs apart. A graph of agents, meaning control flow and who calls whom, is not a graph of facts, meaning entities and the relationships between them. A system can be built from agents arranged in an elaborate graph and still need no knowledge graph at all. Most "do we need a graph" confusion is this single conflation.

Second, the cheapest computation is the edge you do not traverse. The biggest efficiency win in the system, roughly 75% fewer model calls, came from treating routing as a graph and pruning it, not from any database. If your agents are expensive, look at your orchestration graph before you look at your data model.

Third, coordination is not a reason to build a graph. Independent agents scoring things in parallel coordinate fine through a shared table and a queue. Shared state is not shared memory, and shared memory is not a graph.

Fourth, know the frontier where a graph would help, and be honest that you have not crossed it yet. For us that frontier is cross-clause dependency, a real GraphRAG problem we have scoped and not built, sitting beside a core that correctly has none.

The pattern under all four is the same one this whole series keeps arriving at. A knowledge graph is a tool for relationships between entities, and the discipline is refusing to use it for anything else, no matter how complex the system gets. The 23-agent reviewer is complex. It still did not need a knowledge graph. Knowing why is the same skill as knowing when you do, which is the test we laid out in scoring a graph before you trust it and the reason graph rot is the name of this series.

We build multi-agent systems and knowledge graphs, and a fair amount of our work is telling clients which of the two their problem actually needs. If you are designing an agent system and are not sure whether a graph belongs in it, book a 15-minute call.

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI | 10+ years

Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...

Founder & CEO of Cognilium AI; 50+ projects delivered with 96% client satisfaction; 4 production AI products built and operated; multi-cloud AI architecture (AWSGCPAzure)
Agentic AIRAG → GraphRAG retrievalVoice AIMulti-Agent Orchestration

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us

Related Articles

Continue exploring related topics and insights from our content library.

Graph Rot: Why Your Knowledge Graph Is Lying to Your AI
6 min
1
Muhammad Mudassir
June 5, 2026

Graph Rot: Why Your Knowledge Graph Is Lying to Your AI

Graph rot is the silent decay of a knowledge graph's correctness. The 7 ways production graphs go bad, from an engineering team that builds them.

words
Read Article
One Company, Eleven Names: How a Knowledge Graph Learns Identity
10 min
2
Muhammad Mudassir
June 9, 2026

One Company, Eleven Names: How a Knowledge Graph Learns Identity

Extraction gives you names. Entity resolution decides identity. How we taught a $850M family office knowledge graph to tell one company from its eleven aliases.

words
Read Article
The Edge That Shouldn't Exist: Detecting Wrong Relationships in a Knowledge Graph
10 min
3
Muhammad Mudassir
June 15, 2026

The Edge That Shouldn't Exist: Detecting Wrong Relationships in a Knowledge Graph

A mislink is an edge between two real nodes that no document supports. How we detect wrong relationships in a production knowledge graph.

words
Read Article

Explore More Insights

Discover more expert articles on AI, engineering, and technology trends.