Back to Blog
Published:
Last Updated:
Fresh Content

Anti-Hallucination via Runtime Grounding Against a Domain Vocabulary

6 min read
1,300 words
medium priority
Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

Anti-Hallucination via Runtime Grounding Against a Domain Vocabulary — Cognilium AI

TL;DR

A startup-loaded domain vocabulary the generator must match against, plus framework rules baked into every prompt — a low-cost pattern that catches hallucinat

A startup-loaded domain vocabulary the generator must match against, plus framework rules baked into every prompt — a low-cost pattern that catches hallucinated terminology before the user sees it.
hallucination detectionstructured outputLLM groundingdomain-faithful generationvalidation passPydantic

Domain-specific generation has a recurring failure mode: the LLM produces output that is fluent and confident but uses terminology that does not exist in the domain. A K-12 writing methodology has 298 specific terms; the generator may produce "voiceful sentence" or "stylistic figure" — words that sound right and are not in the framework. End users notice, trust evaporates.

A validator that runs at output time and compares generated terminology against a startup-loaded vocabulary catches this before the user sees it, at 2-5ms of latency. The pattern is cheap and underused.

Vocabulary loading

At process startup, the validator loads the domain vocabulary from a versioned source — for our K-12 system it is a 298-term file extracted from the framework documentation. The vocab is parsed into a set with synonyms and inflection variants pre-computed. Total size: ~2KB in memory.

Validation flow

Every generated output gets one validation pass. The pass runs the structured output through a regex-based extractor that finds named-entity-style terms (capitalized, multi-word, or matching domain patterns). Each extracted term is checked against the vocabulary set.

  • Match: pass.
  • No match, fuzzy-match within edit distance 1: log + auto-correct (covers typos in generation).
  • No match, no fuzzy match: validation fails.

What happens on failure

Failure triggers a retry with a stricter prompt that lists the allowed vocabulary inline ("Use only these terms for craft elements: ..."). After two retries the system falls back to the closest valid term by embedding similarity and flags the output for human review. The flag goes to a queue; an editor sees the original prompt, the failed generation, and the auto-correction within hours.

Why not just put the vocabulary in the system prompt?

You can. It costs tokens on every call. With 298 terms (roughly 4KB of text), that is ~1,000 tokens per request times every request. The validator approach loads the vocab in process memory and keeps system prompts short. The cost trade is real and worth measuring.

What we measured

  • Validation latency: 2-5ms (regex + set lookup, no LLM call)
  • Hard-fail rate after two retries: <0.5%
  • False positives caught (terms LLM invented): ~3-5% of raw generations
  • Vocab size: 298 terms, ~2KB resident memory
  • Editor flag queue: ~10-20 items per day at 30K queries/month, manageable for a part-time reviewer

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI | 10+ years

Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...

Founder & CEO of Cognilium AI; 100+ production AI systems shipped; multi-cloud AI architecture (AWSGCPAzure); built and operated 4 production AI products
Agentic AIRAG → GraphRAG retrievalVoice AIMulti-Agent Orchestration

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us

Related Articles

Continue exploring related topics and insights from our content library.

RAG vs GraphRAG: When the Vector Database Stops Being Enough
12 min
1
Muhammad Mudassir
May 4, 2026

RAG vs GraphRAG: When the Vector Database Stops Being Enough

Plain vector RAG hits a ceiling around 100K documents. This is where graph-augmented retrieval becomes the right tool — and how to know if you need it.

words
Read Article
Hybrid Retrieval With Prefetch-Time Metadata Filtering
8 min
2
Muhammad Mudassir
May 5, 2026

Hybrid Retrieval With Prefetch-Time Metadata Filtering

Why filtering after RRF fusion loses the right chunks, and how a "drop trait → mode → grade" progressive relaxation ladder keeps narrow queries answerable without dropping retrieval quality.

words
Read Article
Organizational Memory: RAG Across Slack, Confluence, and Loom
9 min
3
Muhammad Mudassir
May 5, 2026

Organizational Memory: RAG Across Slack, Confluence, and Loom

Building a single retrieval surface over heterogeneous unstructured media — meeting transcripts, Slack threads, Confluence pages, Loom recordings — with source attribution that survives the ingestion fan-out.

words
Read Article

Explore More Insights

Discover more expert articles on AI, engineering, and technology trends.