Post-filter or prefilter — what is the actual difference?

Post-filter: retrieve top-K, then drop chunks whose metadata fails the filter. K shrinks below your reranker cutoff and you lose answer quality. Prefilter: filter the candidate set to filter-passing chunks first, then retrieve top-K from that smaller set. K stays full at the cost of compute.

Why progressive relaxation instead of just widening filters upfront?

Most queries pass strict filters fine. Widening upfront pollutes those queries with irrelevant chunks. Relaxation triggers only on the narrow queries that need it — keeping precision high in the common case.

What gets dropped and in what order?

Domain-specific traits first (most specific, most likely to be over-narrow), then mode (active vs. passive lesson), then grade level. Anything before "drop nothing" — at which point the query is genuinely empty.

How does RRF fusion interact with the filter?

Fusion happens after both dense and sparse retrievers return. If we post-filter: each retriever returns 100 chunks, fusion picks 30, filter drops 25, you have 5. Prefilter: candidate pool is filter-passing chunks (say 800 of 5,000), each retriever returns 100 from that pool, fusion picks 30 — full reranker input.

What does this cost on Qdrant?

Qdrant supports filter-during-search natively, so it is not a separate query — the cost is the index lookup with the filter pushed down. Indexed metadata fields stay fast (sub-50ms for our 584-chunk corpus). Non-indexed filters degrade to scan and you feel it.

Hybrid Retrieval With Prefetch-Time Metadata Filtering

A hybrid retriever combines a dense embedding model with a sparse BM25 index, fuses results with reciprocal rank fusion, and reranks. Adding metadata filtering on top of this — "only chunks tagged grade=4 and mode=active" — looks like a one-line change. It is not. Where the filter applies decides whether your retrieval quality survives narrow queries.

Post-filter loses chunks before the reranker sees them

The naive integration: retrieve top-K from each retriever, fuse, drop chunks whose metadata fails the filter. For broad queries this is fine — most chunks pass. For narrow queries (a specific grade and mode in a small corpus), 80% of the top-K may fail the filter. Now the reranker has 4 chunks to work with instead of 30, and the answer goes from "evidence-grounded" to "best of a poor pool."

Prefilter keeps the candidate pool full

The fix: push the filter down into both retrievers. Qdrant supports filter-during-search natively, so the dense side already retrieves only filter-passing chunks. The sparse side runs BM25 over the same prefiltered set. Fusion sees 200 candidates instead of 200-of-which-160-fail. The reranker gets a full 30-chunk input regardless of how narrow the filter is.

Progressive relaxation handles the empty-set case

Narrow filters sometimes return zero candidates — the corpus has no grade-4 active-mode chunk for "synonym practice for adjectives." A retrieval that returns zero is worse than one that returns slightly off-target chunks; the LLM produces "I do not have material on this" instead of generating from analogous content.

The relaxation ladder: drop the most specific trait (the writing-trait tag) first, retry; if still empty, drop mode (active/passive); if still empty, drop grade. Each step is one Qdrant call. The query that hit the relaxed level is logged so editors can see which trait/mode/grade combinations are sparse and decide whether to add content or merge tags.

What this looks like in practice

Strict-filter queries: ~85% — relaxation never triggers
~12% relax once (drop trait), ~3% relax twice (drop mode), <0.5% relax three times
Reranker input size: stays at 30 chunks regardless of filter narrowness
Corpus: 584 chunks across 188 catalogued lessons
P50 retrieval latency: ~80ms strict, +40ms per relaxation level

When this hurts

Push-down filters require indexed metadata fields. If your filter dimensions change weekly, every change is a reindex. Pick filter dimensions that are part of your domain model — grade, content type, language — not transient experiment flags.

Hybrid Retrieval With Prefetch-Time Metadata Filtering

Post-filter loses chunks before the reranker sees them

Prefilter keeps the candidate pool full

Progressive relaxation handles the empty-set case

What this looks like in practice

When this hurts

Share this article

Muhammad Mudassir

Muhammad Mudassir

Frequently Asked Questions

Post-filter or prefilter — what is the actual difference?

Why progressive relaxation instead of just widening filters upfront?

What gets dropped and in what order?

How does RRF fusion interact with the filter?

What does this cost on Qdrant?

Still have questions?

Related Articles

RAG vs GraphRAG: When the Vector Database Stops Being Enough

Organizational Memory: RAG Across Slack, Confluence, and Loom

Anti-Hallucination via Runtime Grounding Against a Domain Vocabulary

Explore More Insights