Back to Blog
Published:
Last Updated:
Fresh Content

Twilio + ElevenLabs: Building Natural Voice Interfaces

12 min read
2,400 words
high priority
M

Muhammad Mudassir

Founder & CEO, Cognilium AI

Build natural voice AI with Twilio and ElevenLabs. Complete integration guide with streaming audio, latency optimization, and production code examples.
ElevenLabs Twilio tutorialvoice AI integrationTTS telephonynatural voice botTwilio voice streaming

Twilio handles the phone call. ElevenLabs makes it sound human. Together, they're the foundation of modern voice AI. But connecting them isn't plug-and-play—you need streaming audio, proper encoding, and latency optimization. This guide covers the complete integration with production-ready code.

Why Twilio + ElevenLabs?

Twilio provides programmable telephony—the ability to make and receive phone calls via API. ElevenLabs provides the most natural-sounding text-to-speech available. Combined with an LLM for conversation and STT for speech recognition, they form the voice AI stack that powers production systems handling millions of calls.

1. Architecture Overview

Architecture Diagram

2. Prerequisites

# Python 3.9+
pip install twilio fastapi uvicorn websockets httpx deepgram-sdk anthropic

# Accounts needed:
# - Twilio: twilio.com (phone number + Media Streams)
# - ElevenLabs: elevenlabs.io (API key + voice ID)
# - Deepgram: deepgram.com (API key)
# - Anthropic: anthropic.com (API key)

Environment Variables

export TWILIO_ACCOUNT_SID="your_account_sid"
export TWILIO_AUTH_TOKEN="your_auth_token"
export TWILIO_PHONE_NUMBER="+1234567890"
export DEEPGRAM_API_KEY="your_deepgram_key"
export ELEVENLABS_API_KEY="your_elevenlabs_key"
export ELEVENLABS_VOICE_ID="your_voice_id"
export ANTHROPIC_API_KEY="your_anthropic_key"

3. Step 1: Twilio Setup

Buy a Phone Number

from twilio.rest import Client

client = Client(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN)

number = client.incoming_phone_numbers.create(
    phone_number="+1234567890",
    voice_url="https://your-server.com/voice",
    voice_method="POST"
)

print(f"Phone number: {number.phone_number}")

Configure Webhook

In Twilio Console:

  1. Go to Phone Numbers → Manage → Active Numbers
  2. Click your number
  3. Set Voice Configuration: Webhook → https://your-server.com/voice

4. Step 2: ElevenLabs Configuration

Select a Voice

import httpx

async def list_voices():
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "https://api.elevenlabs.io/v1/voices",
            headers={"xi-api-key": ELEVENLABS_API_KEY}
        )
        voices = response.json()["voices"]
        for voice in voices:
            print(f"{voice['voice_id']}: {voice['name']}")

# Recommended for phone calls:
# - "Rachel" (professional, clear)
# - "Josh" (conversational, warm)

Test TTS

async def test_tts(text: str) -> bytes:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"https://api.elevenlabs.io/v1/text-to-speech/{ELEVENLABS_VOICE_ID}",
            headers={
                "xi-api-key": ELEVENLABS_API_KEY,
                "Content-Type": "application/json"
            },
            json={
                "text": text,
                "model_id": "eleven_turbo_v2",
                "voice_settings": {
                    "stability": 0.5,
                    "similarity_boost": 0.75
                }
            }
        )
        return response.content

5. Step 3: Webhook Server

from fastapi import FastAPI, Request, Response
from twilio.twiml.voice_response import VoiceResponse, Connect, Stream

app = FastAPI()

@app.post("/voice")
async def handle_incoming_call(request: Request):
    response = VoiceResponse()
    response.say("Hello! I'm connecting you now.", voice="alice")
    
    connect = Connect()
    stream = Stream(url=f"wss://your-server.com/stream")
    stream.parameter(name="caller_id", value=request.form.get("From", "unknown"))
    connect.append(stream)
    response.append(connect)
    
    return Response(content=str(response), media_type="application/xml")

6. Step 4: WebSocket Handler

import asyncio
import websockets
import json
import base64
from deepgram import Deepgram

deepgram = Deepgram(DEEPGRAM_API_KEY)

class CallHandler:
    def __init__(self, websocket):
        self.websocket = websocket
        self.stream_sid = None
        self.conversation_history = []
    
    async def handle(self):
        dg_connection = await deepgram.transcription.live({
            "encoding": "mulaw",
            "sample_rate": 8000,
            "channels": 1,
            "model": "nova-2",
            "punctuate": True,
            "interim_results": True
        })
        
        dg_connection.register_handler(
            dg_connection.event.TRANSCRIPT_RECEIVED,
            self.handle_transcript
        )
        
        try:
            async for message in self.websocket:
                data = json.loads(message)
                
                if data["event"] == "start":
                    self.stream_sid = data["start"]["streamSid"]
                elif data["event"] == "media":
                    audio = base64.b64decode(data["media"]["payload"])
                    await dg_connection.send(audio)
                elif data["event"] == "stop":
                    break
        finally:
            await dg_connection.finish()
    
    async def handle_transcript(self, transcript):
        if transcript.get("is_final"):
            text = transcript["channel"]["alternatives"][0]["transcript"]
            if text.strip():
                response_text = await self.generate_response(text)
                await self.speak(response_text)
    
    async def generate_response(self, user_input: str) -> str:
        self.conversation_history.append({"role": "user", "content": user_input})
        
        response = await anthropic.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=150,
            system="You are a helpful voice assistant. Keep responses brief, 1-3 sentences.",
            messages=self.conversation_history
        )
        
        assistant_message = response.content[0].text
        self.conversation_history.append({"role": "assistant", "content": assistant_message})
        return assistant_message

7. Step 5: Streaming TTS Integration

For lower latency, stream TTS as it generates:

async def stream_tts(self, text: str):
    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            f"https://api.elevenlabs.io/v1/text-to-speech/{ELEVENLABS_VOICE_ID}/stream",
            headers={
                "xi-api-key": ELEVENLABS_API_KEY,
                "Content-Type": "application/json"
            },
            json={
                "text": text,
                "model_id": "eleven_turbo_v2",
                "output_format": "ulaw_8000"  # Direct mulaw output!
            }
        ) as response:
            async for chunk in response.aiter_bytes(chunk_size=160):
                message = {
                    "event": "media",
                    "streamSid": self.stream_sid,
                    "media": {
                        "payload": base64.b64encode(chunk).decode()
                    }
                }
                await self.websocket.send(json.dumps(message))

Key optimization: ElevenLabs supports ulaw_8000 output format—no conversion needed!

8. Step 6: Complete Call Flow

class ProductionCallHandler:
    def __init__(self, websocket):
        self.websocket = websocket
        self.stream_sid = None
        self.is_speaking = False
        self.interrupt_requested = False
    
    async def handle_transcript(self, transcript):
        if transcript.get("is_final"):
            text = transcript["channel"]["alternatives"][0]["transcript"].strip()
            if not text:
                return
            
            if self.is_speaking:
                self.interrupt_requested = True
                await self.stop_audio()
            
            await self.process_and_respond(text)
    
    async def stop_audio(self):
        clear_message = {
            "event": "clear",
            "streamSid": self.stream_sid
        }
        await self.websocket.send(json.dumps(clear_message))
        self.is_speaking = False

9. Latency Optimization

Optimization Checklist

TechniqueLatency SavedImplementation
Use eleven_turbo_v2100-200msModel selection
Use ulaw_8000 output50-100msNo conversion
Stream TTS200-400msAsync streaming
Use Deepgram Nova-250-100msFaster STT
Claude Haiku100-200msFaster LLM
Regional endpoints20-50msClosest region

Latency Monitoring

class LatencyTracker:
    def __init__(self):
        self.metrics = []
    
    async def timed_operation(self, name: str, coro):
        start = time.perf_counter()
        result = await coro
        elapsed = (time.perf_counter() - start) * 1000
        self.metrics.append({"operation": name, "latency_ms": elapsed})
        print(f"{name}: {elapsed:.0f}ms")
        return result

10. Production Deployment

Dockerfile

FROM python:3.11-slim
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Deploy to AWS

version: "3.8"
services:
  voice-ai:
    build: .
    ports:
      - "8000:8000"
    environment:
      - TWILIO_ACCOUNT_SID
      - TWILIO_AUTH_TOKEN
      - DEEPGRAM_API_KEY
      - ELEVENLABS_API_KEY
      - ELEVENLABS_VOICE_ID
      - ANTHROPIC_API_KEY
    deploy:
      replicas: 3

Next Steps

  1. Enterprise Voice AI Guide → - Complete architecture patterns
  2. Voice AI for Sales → - Outbound call automation
  3. Voice AI Compliance → - Recording and consent

Need help with Twilio + ElevenLabs integration?

At Cognilium, we've built production voice systems handling thousands of concurrent calls. Let's discuss your project →

Share this article

Muhammad Mudassir

Muhammad Mudassir

Founder & CEO, Cognilium AI

Mudassir Marwat is the Founder & CEO of Cognilium AI, where he leads the design and deployment of pr...

Frequently Asked Questions

Find answers to common questions about the topics covered in this article.

Still have questions?

Get in touch with our team for personalized assistance.

Contact Us