Back to Blog
Published:
Last Updated:
Recently Updated
Enterprise Document AIChapter 2

Smart Category Routing for Contract Review

6 min read
1,300 words
medium priority
Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

Smart Category Routing for Contract Review — Cognilium AI

TL;DR

A focused application of the LLMOps routing pattern to legal contract analysis — the analyst-selection logic that ships fewer clauses to fewer agents and fini

A focused application of the LLMOps routing pattern to legal contract analysis — the analyst-selection logic that ships fewer clauses to fewer agents and finishes a 3,300-call review in 154 seconds.
AI contract reviewlegal document AIclause analysissmart routingmulti-agent legalChromaDBHyDE retrieval

Contract review is a domain where the LLMOps routing pattern earns its complexity. A typical contract has 50-100 chunks. Each chunk is potentially relevant to one or two of 11 specialist analysts (compliance, indemnity, IP, payment, termination, etc.). Running every analyst on every chunk is 1,100+ LLM calls. Routing cuts that to ~250.

Playbook-driven configuration

Each customer has a playbook in S3 — categories that matter to them, severity weights, and party-specific clauses (one customer cares deeply about IP indemnity; another cares about data-residency clauses). At job start, the system loads the playbook and configures analysts accordingly.

  • Categories: 12 standard, 1-3 customer-specific
  • Severity weights: how much to escalate findings in each category
  • Party-specific clauses: customer-defined patterns the analyst should specifically look for

Per-chunk scoring

Each chunk runs through 12 category scorers (cheap model, $0.25/M tokens). Each scorer emits a 0-100 score for "is this chunk relevant to my category?" The router selects analysts to run based on the scores: above-threshold categories trigger their analyst; below-threshold categories skip.

HyDE-augmented retrieval

Within each analyst's context, retrieval pulls related chunks. HyDE generates a hypothetical ideal answer for the analyst's question, embeds that, retrieves real chunks similar to it. Better recall than embedding the literal question — especially when the analyst question uses legal jargon and the contract uses plain English (or vice versa).

LLM reranking after HyDE

HyDE retrieves 30 candidates; an LLM reranker (cheap model, scores each candidate 0-100) picks the top 5 to actually include in the analyst context. Reranking buys ~10-15% F1 over pure embedding similarity at modest cost.

Numbers from production

  • 22 chunks → 116 LLM calls per chunk (12 scorers + ~3 routed analysts × ~30 LLM calls each) = ~660 calls per chunk on the misleading top-line
  • Actually: 22 chunks × 12 scorers + ~3 selected analysts per chunk × 8 calls = 264 + 528 = ~800 calls per contract typical
  • P50 review time: 154 seconds end-to-end
  • Per-contract cost: $0.50-2.00 depending on contract length and customer playbook
  • Reduction vs. naive fan-out: ~75%

Where this fails

Customer playbooks with overlapping categories (the "compliance" category overlaps with "regulatory" and "data-handling" 70% of the time). Routing collapses to "everyone." Mitigation: routing analytics dashboard shows per-category overlap rates; surfaces the problem; encourages playbook tightening.

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI | 10+ years

Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...

Founder & CEO of Cognilium AI; 50+ projects delivered with 96% client satisfaction; 4 production AI products built and operated; multi-cloud AI architecture (AWSGCPAzure)
Agentic AIRAG → GraphRAG retrievalVoice AIMulti-Agent Orchestration
Next in this series
Zero-Trust Multi-Tenant Firestore: Middleware, Claims, and 60+ Wildcard Permissions
Chapter 3 · 9 min

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us

Related Articles

Continue exploring related topics and insights from our content library.

The Production LLMOps Stack: Evals, Judges, Retries, Circuit Breakers
11 min
1
Muhammad Mudassir
May 5, 2026

The Production LLMOps Stack: Evals, Judges, Retries, Circuit Breakers

The day-2 ops layer of an LLM product — what to evaluate, what to judge in real time, what to retry, and when to fail closed. The components that turn a prototype into something operable.

words
Read Article
LLM-as-Judge With Temperature-Escalation Retry Inside a 60-Second Budget
7 min
2
Muhammad Mudassir
May 5, 2026

LLM-as-Judge With Temperature-Escalation Retry Inside a 60-Second Budget

Judge scores below 85? Retry with temperature 0.3, 0.4, 0.5 — three attempts inside a 60-second wall-clock budget. The simple loop that hits 99.5% on-spec output without crossing the latency ceiling.

words
Read Article
Smart Category-Score Routing That Cuts LLM Cost ~75%
7 min
3
Muhammad Mudassir
May 5, 2026

Smart Category-Score Routing That Cuts LLM Cost ~75%

A pipeline of 12 scorers + 11 analysts does not need to fan out everywhere. Route each chunk to matching analysts and save three quarters of the LLM bill.

words
Read Article

Explore More Insights

Discover more expert articles on AI, engineering, and technology trends.