TL;DR
A focused application of the LLMOps routing pattern to legal contract analysis — the analyst-selection logic that ships fewer clauses to fewer agents and fini
Contract review is a domain where the LLMOps routing pattern earns its complexity. A typical contract has 50-100 chunks. Each chunk is potentially relevant to one or two of 11 specialist analysts (compliance, indemnity, IP, payment, termination, etc.). Running every analyst on every chunk is 1,100+ LLM calls. Routing cuts that to ~250.
Playbook-driven configuration
Each customer has a playbook in S3 — categories that matter to them, severity weights, and party-specific clauses (one customer cares deeply about IP indemnity; another cares about data-residency clauses). At job start, the system loads the playbook and configures analysts accordingly.
- Categories: 12 standard, 1-3 customer-specific
- Severity weights: how much to escalate findings in each category
- Party-specific clauses: customer-defined patterns the analyst should specifically look for
Per-chunk scoring
Each chunk runs through 12 category scorers (cheap model, $0.25/M tokens). Each scorer emits a 0-100 score for "is this chunk relevant to my category?" The router selects analysts to run based on the scores: above-threshold categories trigger their analyst; below-threshold categories skip.
HyDE-augmented retrieval
Within each analyst's context, retrieval pulls related chunks. HyDE generates a hypothetical ideal answer for the analyst's question, embeds that, retrieves real chunks similar to it. Better recall than embedding the literal question — especially when the analyst question uses legal jargon and the contract uses plain English (or vice versa).
LLM reranking after HyDE
HyDE retrieves 30 candidates; an LLM reranker (cheap model, scores each candidate 0-100) picks the top 5 to actually include in the analyst context. Reranking buys ~10-15% F1 over pure embedding similarity at modest cost.
Numbers from production
- 22 chunks → 116 LLM calls per chunk (12 scorers + ~3 routed analysts × ~30 LLM calls each) = ~660 calls per chunk on the misleading top-line
- Actually: 22 chunks × 12 scorers + ~3 selected analysts per chunk × 8 calls = 264 + 528 = ~800 calls per contract typical
- P50 review time: 154 seconds end-to-end
- Per-contract cost: $0.50-2.00 depending on contract length and customer playbook
- Reduction vs. naive fan-out: ~75%
Where this fails
Customer playbooks with overlapping categories (the "compliance" category overlaps with "regulatory" and "data-handling" 70% of the time). Routing collapses to "everyone." Mitigation: routing analytics dashboard shows per-category overlap rates; surfaces the problem; encourages playbook tightening.
Share this article
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years experience
Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...
