TL;DR
FIFO for chunk ordering, Standard for parallel analysis fan-out. Why a single queue type for the whole pipeline is the wrong default, with the dead-letter and
An agent pipeline with multiple stages tends to default to one queue type for the whole topology. That works for tutorials and breaks at production scale. The pipeline this writeup describes has three queue boundaries and uses three different topology choices for them — and the reasoning is worth writing down because most teams hit this and pick the wrong default.
Stage 1: chunk extraction (FIFO)
A document is split into 50–100 chunks. The assembler downstream stitches them into a structured contract analysis. Chunks must arrive in order — chunk 7 cannot land before chunk 6, otherwise the assembler either reorders (expensive) or skips and waits (slow). FIFO with message group ID = job_id keeps order inside one job while different jobs run in parallel. Per-group throughput is 300 messages/sec without batching, 3,000 with — enough for a job that emits 100 chunks in a couple seconds.
Stage 2: analysis fan-out (Standard)
Each chunk fans out to 11 specialist analyst agents. There is no ordering relationship — bias, severity, ambiguity, party, and the rest can finish in any order; the merger just collects them. FIFO here would force per-group serialization and tank parallelism. Standard queue with at-least-once delivery + idempotent worker is the right call.
Stage 3: result assembly (Standard with deduplication)
Results from 11 agents per chunk merge back. Standard works because the merge step is idempotent — writing the same agent result twice produces the same output. The trick: each merge writes to DynamoDB with a conditional update on (chunk_id, agent_id). Duplicate deliveries hit the conditional and short-circuit.
Dead-letter settings per queue
- FIFO chunk queue: maxReceiveCount = 3, DLQ retention 14 days. A wedged chunk blocks its group — fail fast and alert.
- Standard analysis queue: maxReceiveCount = 10, retention 14 days. Leaf-level retries are cheap, model-side rate limits resolve naturally.
- Standard merge queue: maxReceiveCount = 5, retention 7 days. Failures here usually mean a chunk_id has been deleted from DynamoDB — short DLQ, fast triage.
When to ignore this advice
Pipelines with fewer than ~50 messages per request rarely justify the topology split — the operational overhead of three queue types and three DLQ alerts costs more than reordering at the assembler. The split earns its keep when one job emits 1,000+ messages and a single bad message must not block the rest.
What we measured
- 22 chunks per contract → 22 FIFO messages → ~5 sec to drain
- 22 chunks × 11 agents = 242 Standard messages → ~25 sec parallel processing
- DLQ rate steady-state: <0.1% of messages
- P95 end-to-end: 154 sec from upload to assembled report
Share this article
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years experience
Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...
