Why not Standard for everything? FIFO is slower.

FIFO is slower per message (300/s per group ID without batching). But for the chunk-extraction stage, ordering inside a job matters — the assembler downstream expects chunks in order. Reordering at the assembler is more expensive than the FIFO penalty.

Why not FIFO for everything? Ordering is always nice.

FIFO collapses to per-group serialization. For the analysis stage, 11 specialist agents fan out per chunk — they have no inter-chunk ordering requirement and you want all 11 running simultaneously. FIFO would queue them by group ID and lose your parallelism.

What goes in the message group ID?

For chunk extraction: job_id. All chunks for one job stay ordered; different jobs run in parallel. For other FIFO stages where ordering scope differs, pick the smallest natural ordering unit.

How big can these messages get?

SQS limit is 256KB per message. We keep ChunkMessage under that with the chunk text capped at 900KB stored in S3 — the message carries the S3 key. Related-terms list capped at 50 to bound the post-processing fan-out.

DLQ thresholds — why different per queue?

FIFO chunk queue: DLQ after 3 receives. Standard analysis queue: DLQ after 10. The FIFO queue is upstream and a bad message there blocks its group; you want it out fast. The analysis queue is leaf-level — re-tries are cheap and fan-out tolerates more before declaring dead.

When to Mix SQS FIFO and Standard Queues in an Agent Pipe…

An agent pipeline with multiple stages tends to default to one queue type for the whole topology. That works for tutorials and breaks at production scale. The pipeline this writeup describes has three queue boundaries and uses three different topology choices for them — and the reasoning is worth writing down because most teams hit this and pick the wrong default.

Stage 1: chunk extraction (FIFO)

A document is split into 50–100 chunks. The assembler downstream stitches them into a structured contract analysis. Chunks must arrive in order — chunk 7 cannot land before chunk 6, otherwise the assembler either reorders (expensive) or skips and waits (slow). FIFO with message group ID = job_id keeps order inside one job while different jobs run in parallel. Per-group throughput is 300 messages/sec without batching, 3,000 with — enough for a job that emits 100 chunks in a couple seconds.

Stage 2: analysis fan-out (Standard)

Each chunk fans out to 11 specialist analyst agents. There is no ordering relationship — bias, severity, ambiguity, party, and the rest can finish in any order; the merger just collects them. FIFO here would force per-group serialization and tank parallelism. Standard queue with at-least-once delivery + idempotent worker is the right call.

Stage 3: result assembly (Standard with deduplication)

Results from 11 agents per chunk merge back. Standard works because the merge step is idempotent — writing the same agent result twice produces the same output. The trick: each merge writes to DynamoDB with a conditional update on (chunk_id, agent_id). Duplicate deliveries hit the conditional and short-circuit.

Dead-letter settings per queue

FIFO chunk queue: maxReceiveCount = 3, DLQ retention 14 days. A wedged chunk blocks its group — fail fast and alert.
Standard analysis queue: maxReceiveCount = 10, retention 14 days. Leaf-level retries are cheap, model-side rate limits resolve naturally.
Standard merge queue: maxReceiveCount = 5, retention 7 days. Failures here usually mean a chunk_id has been deleted from DynamoDB — short DLQ, fast triage.

When to ignore this advice

Pipelines with fewer than ~50 messages per request rarely justify the topology split — the operational overhead of three queue types and three DLQ alerts costs more than reordering at the assembler. The split earns its keep when one job emits 1,000+ messages and a single bad message must not block the rest.

What we measured

22 chunks per contract → 22 FIFO messages → ~5 sec to drain
22 chunks × 11 agents = 242 Standard messages → ~25 sec parallel processing
DLQ rate steady-state: <0.1% of messages
P95 end-to-end: 154 sec from upload to assembled report

When to Mix SQS FIFO and Standard Queues in an Agent Pipeline

Stage 1: chunk extraction (FIFO)

Stage 2: analysis fan-out (Standard)

Stage 3: result assembly (Standard with deduplication)

Dead-letter settings per queue

When to ignore this advice

What we measured

Share this article

Muhammad Mudassir

Muhammad Mudassir

Frequently Asked Questions

Why not Standard for everything? FIFO is slower.

Why not FIFO for everything? Ordering is always nice.

What goes in the message group ID?

How big can these messages get?

DLQ thresholds — why different per queue?

Still have questions?

Related Articles

Multi-Agent Orchestration on AWS Bedrock AgentCore

Surviving Partial Failure in a 3,300-Call Agent Pipeline

Supervisor-Router on Google ADK with Per-Org Tool Registration

Explore More Insights