How does this differ from the LLMOps routing chapter?

Same routing pattern, applied to contract analysis specifically. This chapter walks the legal-domain choices: how categories are defined, how the playbook is loaded per job, how HyDE-augmented retrieval seeds the per-chunk context.

What is HyDE-augmented retrieval?

Hypothetical Document Embeddings: generate a pretend "ideal answer" to the query, embed that, retrieve similar real chunks. Often retrieves better than embedding the literal query — particularly when query language differs from corpus language.

Why ChromaDB instead of Pinecone or Qdrant?

Per-job ephemeral storage. Each contract review creates a fresh ChromaDB instance with that job's chunks; tear down at job end. No tenant cross-contamination, no long-running index to maintain. Pinecone would work; the cost-per-job is lower with ephemeral Chroma.

How is the playbook loaded?

Each customer has a playbook in S3 (categories, severity weights, party-specific clauses). At job start, the playbook loads into the analyst configuration. Different customers analyze the same contract differently — playbook-driven configuration handles that.

Where does this break?

When the playbook is poorly defined (categories overlap, severity weights are uniform), routing collapses to "everyone." Encourages teams to invest in playbook precision; provides routing analytics to show where overlaps cause routing failures.

Smart Category Routing for Contract Review

Contract review is a domain where the LLMOps routing pattern earns its complexity. A typical contract has 50-100 chunks. Each chunk is potentially relevant to one or two of 11 specialist analysts (compliance, indemnity, IP, payment, termination, etc.). Running every analyst on every chunk is 1,100+ LLM calls. Routing cuts that to ~250.

Playbook-driven configuration

Each customer has a playbook in S3 — categories that matter to them, severity weights, and party-specific clauses (one customer cares deeply about IP indemnity; another cares about data-residency clauses). At job start, the system loads the playbook and configures analysts accordingly.

Categories: 12 standard, 1-3 customer-specific
Severity weights: how much to escalate findings in each category
Party-specific clauses: customer-defined patterns the analyst should specifically look for

Per-chunk scoring

Each chunk runs through 12 category scorers (cheap model, $0.25/M tokens). Each scorer emits a 0-100 score for "is this chunk relevant to my category?" The router selects analysts to run based on the scores: above-threshold categories trigger their analyst; below-threshold categories skip.

HyDE-augmented retrieval

Within each analyst's context, retrieval pulls related chunks. HyDE generates a hypothetical ideal answer for the analyst's question, embeds that, retrieves real chunks similar to it. Better recall than embedding the literal question — especially when the analyst question uses legal jargon and the contract uses plain English (or vice versa).

LLM reranking after HyDE

HyDE retrieves 30 candidates; an LLM reranker (cheap model, scores each candidate 0-100) picks the top 5 to actually include in the analyst context. Reranking buys ~10-15% F1 over pure embedding similarity at modest cost.

Numbers from production

22 chunks → 116 LLM calls per chunk (12 scorers + ~3 routed analysts × ~30 LLM calls each) = ~660 calls per chunk on the misleading top-line
Actually: 22 chunks × 12 scorers + ~3 selected analysts per chunk × 8 calls = 264 + 528 = ~800 calls per contract typical
P50 review time: 154 seconds end-to-end
Per-contract cost: $0.50-2.00 depending on contract length and customer playbook
Reduction vs. naive fan-out: ~75%

Where this fails

Customer playbooks with overlapping categories (the "compliance" category overlaps with "regulatory" and "data-handling" 70% of the time). Routing collapses to "everyone." Mitigation: routing analytics dashboard shows per-category overlap rates; surfaces the problem; encourages playbook tightening.

Smart Category Routing for Contract Review

Playbook-driven configuration

Per-chunk scoring

HyDE-augmented retrieval

LLM reranking after HyDE

Numbers from production

Where this fails

Share this article

Muhammad Mudassir

Muhammad Mudassir

Frequently Asked Questions

How does this differ from the LLMOps routing chapter?

What is HyDE-augmented retrieval?

Why ChromaDB instead of Pinecone or Qdrant?

How is the playbook loaded?

Where does this break?

Still have questions?

Related Articles

The Production LLMOps Stack: Evals, Judges, Retries, Circuit Breakers

LLM-as-Judge With Temperature-Escalation Retry Inside a 60-Second Budget

Smart Category-Score Routing That Cuts LLM Cost ~75%

Explore More Insights