TL;DR
Real-time sentiment scoring drives the handoff decision; full conversation context, transcript, and detected intent travel with it. The escalation that does n
A 22-language voice support agent handles 10,000+ tickets per month per deployment. Most resolve fully — the agent finds the answer, the customer accepts it, the call ends. The 5-15% that do not resolve are the calls that matter. How they are handed off to a human determines whether the customer walks away angry or merely inconvenienced.
The escalation triggers
Three signals fire escalation, any one of them sufficient.
- Rolling sentiment score below -0.3 over a 30-second window. One negative turn does not count — sustained negative does.
- Explicit user request — phrases that map to "I want a human" in any of the 22 supported languages. Detected at the LLM layer, not on raw text, so paraphrases work.
- Self-rated complexity above 0.7 — the agent rates its confidence on each turn. Repeatedly low confidence on the same issue means the agent is past its competence.
The handoff packet
What the human agent gets when they pick up the call:
- Full transcript, with sentiment annotations per turn (so the human sees where the conversation went sideways)
- Detected intent + sub-intent (e.g., "billing dispute > duplicate charge")
- Customer profile: name, account ID, last interaction summary, lifetime value tier — fetched from the CRM at handoff
- What the agent already attempted (e.g., "offered refund of duplicate charge, system rejected — possibly stale data")
- A 2-3 sentence summary the model generates: "Customer has been billed twice for the same order. They have called twice in the last week about this. Refund tool returned an error. They are frustrated."
What changes vs. cold escalation
Without the packet, the human starts with "Hi, can you tell me what is going on?" and the customer re-explains for 90 seconds. With the packet, the human starts with "I see you have been billed twice — let me get that fixed." The customer hears recognition and resolution starts immediately. Resolution time drops, sentiment recovers, the conversation does not have to relitigate the journey.
What we measured
- 67% to 92% first-call resolution rate after introducing the agent
- 24-agent team replaced with 8 in 4 months (the remaining 8 handle escalations + complex cases)
- 60% faster human-resolution time post-handoff vs. cold-transfer baseline
- 22 languages supported — single generator, eight regional sentiment models
Where this gets hard
The complexity self-rating is unreliable on novel issues. The model rates itself confident on questions it has never seen. A heuristic helps: if the agent has consulted the same KB article more than three times in one call without resolution, force-trigger escalation. Self-rating + heuristic catches more than either alone.
Share this article
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years
Muhammad Mudassir
Founder & CEO, Cognilium AI | 10+ years experience
Mudassir Marwat is the Founder & CEO of Cognilium AI. He has shipped 100+ production AI systems acro...
