Twilio Technology Stack

Confirmed Technologies from Architecture Research

Technology Stack Overview

This document catalogs the specific technologies used at Twilio based on engineering blog posts, documentation, and architecture research. Technologies are categorized by function and include their specific use cases within Twilio's platform.

Sources: Segment Engineering Blog, Twilio Engineering Blog, AWS Case Studies, Public Documentation, SIGNAL Conference Presentations
┌─────────────────────────────────────────────────────────────────────────────────────┐ │ TWILIO TECHNOLOGY STACK │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ LANGUAGES & FRAMEWORKS │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ Go (Tracking API, Dedup Workers) │ Node.js (APIs) │ Python │ Java │ React │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ EVENT STREAMING │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ Apache Kafka (AWS MSK) │ NSQ (local buffer) │ Kinesis (Event Streams) │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ DATABASES │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ DynamoDB (Global Tables) │ MySQL/RDS (Centrifuge) │ Aurora PostgreSQL │ │ │ │ RocksDB (Embedded) │ Redis (ElastiCache) │ S3 (Object Storage) │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ COMPUTE │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ Kubernetes (EKS) │ Lambda │ EC2 │ Fargate │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ NETWORKING │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ VPC Lattice │ API Gateway │ CloudFront │ Route 53 │ ALB/NLB │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ OPERATIONS │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ Consul (Locking) │ Terraform │ ArgoCD │ CloudWatch │ DataDog │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ PROTOCOLS │ │ ┌────────────────────────────────────────────────────────────────────────────────┐ │ │ │ SMPP │ SIP │ WebRTC │ OAuth 2.1 │ MCP (AI Agents) │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Event Streaming & Messaging

Apache Kafka DATA

Distributed event streaming platform. Core backbone for Segment CDP and event-driven architecture.

Twilio Usage:
  • Segment Tracking API event backbone
  • Multi-tier failover (primary + secondary clusters per shard)
  • Partitioning by messageId for deduplication locality
  • Nearly 1M messages/second throughput
AWS: Amazon MSK (Managed Streaming for Kafka)

NSQ DATA

Real-time distributed messaging platform. Lightweight local buffer before Kafka.

Twilio Usage:
  • Local buffer on each TAPI (Tracking API) server
  • Absorbs burst traffic before writing to Kafka
  • Decouples API response from Kafka write latency

Amazon Kinesis AWS

Real-time data streaming service. Used for customer-facing event delivery.

Twilio Usage:
  • Event Streams product (customer-facing)
  • Delivers message status updates to customer Kinesis streams
  • Alternative to webhooks for high-volume customers

Kafka Architecture at Segment

Key Insight: Each TAPI shard has its own primary AND secondary Kafka cluster. This provides cluster-level failover, not just broker-level replication. The "Replicated" service monitors broker health and routes traffic.

AspectConfigurationRationale
PartitioningBy messageIdSame ID → same consumer → local dedup
ReplicationMulti-cluster (not just multi-broker)Survives entire cluster failures
Retention7 days minimumEnables replay for recovery
Throughput~1M messages/secondPeak load across all shards

Databases & Storage

RocksDB STORAGE

Embedded key-value store based on LSM trees. Facebook-developed, used by many at scale.

Twilio Usage:
  • Deduplication index in Segment CDP
  • 60 billion keys, 1.5TB per partition
  • 4-week deduplication window
  • Bloom filters for fast "not seen" checks
  • Replaced Memcached (100x improvement)
Why Embedded: No network hop, local disk, rebuildable from Kafka

MySQL / Amazon RDS STORAGE

Relational database. Used in Centrifuge as "database-as-queue" pattern.

Twilio Usage:
  • Centrifuge job queue (not traditional queue)
  • Immutable rows design (no UPDATEs)
  • jobs and job_state_transitions tables
  • KSUID primary keys (time-sortable)
  • TABLE DROP instead of DELETE for cleanup
  • 400K outbound requests/second

Amazon DynamoDB AWS

Fully managed NoSQL database. Key-value with global replication.

Twilio Usage:
  • Customer → Cell routing table (Global Tables)
  • API key and identity lookups
  • Session state for identity service
  • Multi-region with ~5-15ms latency
Global Tables: Multi-master replication across regions

Amazon Aurora PostgreSQL AWS

MySQL/PostgreSQL-compatible relational database with enhanced performance.

Twilio Usage:
  • Primary transactional database per cell
  • Customer account data, configuration
  • Multi-AZ deployment with read replicas
  • Enterprise cells: 6 read replicas
  • SMB cells: 2 read replicas

Redis / Amazon ElastiCache AWS

In-memory data store. Used for caching and session management.

Twilio Usage:
  • Cell Router cache (customer → cell mapping)
  • 95% cache hit rate, ~5ms latency
  • 1-hour TTL for routing entries
  • Session cache for identity service
  • Rate limiting counters

Amazon S3 AWS

Object storage. Durable, scalable storage for any data type.

Twilio Usage:
  • Centrifuge archival (undelivered messages after 4 hours)
  • Media storage (MMS, recordings)
  • Log archival
  • Segment warehouse sync destinations

Database Selection Philosophy

Principle: Use the simplest tool that meets requirements. Don't over-engineer.

Use CaseTechnologyWhy This Choice
Deduplication (60B keys)RocksDBEmbedded, no network, Bloom filters, disk-backed
Job queue (88K queues)MySQLSQL flexibility for QoS changes, immutable rows
Global routingDynamoDB Global TablesMulti-master, multi-region, ~15ms latency
Fast cacheRedisSub-5ms, in-memory, TTL support
TransactionalAurora PostgreSQLACID, read replicas, AWS-managed
ArchivalS3Cheap, durable, queryable with Athena

Compute & Runtime

Go (Golang) COMPUTE

Statically typed, compiled language. Excellent for concurrent, high-throughput systems.

Twilio Usage:
  • Segment Tracking API (TAPI) servers
  • Deduplication workers
  • Custom JSON parser (zero-allocation)
  • High-throughput services requiring low latency
Why Go: 800K RPS, 30ms latency, garbage collector tuned for throughput

Amazon EKS (Kubernetes) AWS

Managed Kubernetes service. Container orchestration at scale.

Twilio Usage:
  • Primary compute platform per cell
  • 100-200 nodes per cell (varies by tier)
  • Runs all microservices
  • Auto-scaling based on CPU/memory
  • IRSA for IAM authentication

AWS Lambda AWS

Serverless compute. Event-driven, auto-scaling, pay-per-use.

Twilio Usage:
  • Cell Router (Lambda@Edge)
  • API Gateway integrations
  • Event-driven processing
  • Control Plane automation tasks

Consul COMPUTE

HashiCorp service mesh and distributed locking. Service discovery and coordination.

Twilio Usage:
  • Centrifuge Director locking
  • One Director per JobDB (exclusive lock)
  • Session-based TTL for automatic release
  • Service discovery

Language Choices

LanguageUse CasesRationale
GoHigh-throughput data plane (TAPI, workers)Performance, concurrency, low GC pause
Node.jsAPIs, TwiML parsing, webhooksAsync I/O, JavaScript ecosystem
PythonData pipelines, ML/AI, scriptingData science libraries, rapid development
JavaEnterprise services, Android SDKMature ecosystem, strong typing

AWS Services

Cloud Strategy: Twilio runs primarily on AWS with a hybrid model. The "Super Network" (carrier connections) uses dedicated infrastructure, but all compute, storage, and managed services are AWS-native.

Networking

Amazon VPC Lattice NETWORK

Application networking service. Routes by service name, not IP address.

Twilio Usage:
  • Cross-cell service mesh
  • Enables overlapping VPC CIDRs (all cells use 10.0.0.0/16)
  • Routes based on X-Twilio-Cell-ID header
  • Eliminates IP address coordination

Amazon API Gateway AWS

Managed API service. REST and WebSocket APIs at scale.

Twilio Usage:
  • Public API endpoint (api.twilio.com)
  • API versioning (/v1/*, /v2/*)
  • Rate limiting
  • Lambda integration for Cell Router

Amazon CloudFront AWS

Content delivery network. Global edge locations.

Twilio Usage:
  • Static asset delivery
  • API acceleration for global customers
  • Lambda@Edge for Cell Router logic

Amazon Route 53 AWS

Managed DNS service. Global traffic management.

Twilio Usage:
  • Global DNS for api.twilio.com
  • Health check-based failover
  • Latency-based routing to nearest region
  • Private hosted zones per cell

Management & Operations

AWS Control Tower AWS

Multi-account governance. Landing zone for AWS Organizations.

Twilio Usage:
  • Cell = AWS Account (via Account Factory)
  • Automated account provisioning
  • Service Control Policies (SCPs)
  • Centralized CloudTrail/Config

Terraform COMPUTE

Infrastructure as Code. HashiCorp's declarative provisioning tool.

Twilio Usage:
  • Cell infrastructure provisioning
  • VPC, EKS, RDS, MSK setup
  • Terragrunt for multi-cell management
  • ~30 min to provision new cell

ArgoCD COMPUTE

GitOps continuous delivery for Kubernetes.

Twilio Usage:
  • Application deployment to cells
  • One ArgoCD application per cell
  • Git as source of truth
  • Automated sync and drift detection

AWS Step Functions AWS

Serverless workflow orchestration. State machines for complex workflows.

Twilio Usage:
  • Control Plane orchestration
  • Cell provisioning workflow
  • Customer migration workflows
  • Long-running async operations

Protocols & Standards

SMPP PROTOCOL

Short Message Peer-to-Peer. Industry standard for SMS carrier connectivity.

Twilio Usage:
  • 4,800 carrier connections worldwide
  • Super Network SMPP gateways
  • PDU encoding, segmentation, concatenation
  • Delivery receipt (DLR) handling

SIP PROTOCOL

Session Initiation Protocol. Standard for voice/video session setup.

Twilio Usage:
  • Programmable Voice connections
  • BYOC (Bring Your Own Carrier) trunking
  • Enterprise PBX integration
  • SIP domains for custom endpoints

WebRTC PROTOCOL

Web Real-Time Communication. Browser/mobile real-time media.

Twilio Usage:
  • Programmable Voice (browser SDK)
  • Programmable Video
  • Global Low Latency (GLL) Edge - 9 locations
  • STUN/TURN handled by Twilio
  • SFU for group video rooms

OAuth 2.1 PROTOCOL

Authorization framework. Industry standard for delegated access.

Twilio Usage:
  • Stytch Connected Apps (AI agent auth)
  • MCP protocol compliance (Anthropic standard)
  • Scoped tokens for fine-grained access
  • Human-in-the-loop step-up authentication

TwiML PROTOCOL

Twilio Markup Language. XML-based instructions for voice/messaging.

Twilio Usage:
  • Voice call control (<Say>, <Dial>, <Gather>)
  • Messaging responses (<Message>)
  • Webhook response format
  • Declarative call flow definition

MCP (Model Context Protocol) PROTOCOL

Anthropic's open standard for AI tool integration.

Twilio Usage:
  • Stytch AI agent authentication
  • Claude/ChatGPT connector support
  • OAuth 2.1 mandate for agent auth
  • Scoped, ephemeral credentials for AI agents

Technology Summary by Function

FunctionTechnologyScale / Notes
Event Streaming
Event backboneApache Kafka (AWS MSK)~1M messages/sec, multi-cluster failover
Local bufferNSQPer-server buffer before Kafka
Customer event deliveryKinesisEvent Streams product
Databases
DeduplicationRocksDB (embedded)60B keys, 1.5TB, Bloom filters
Job queueMySQL/RDSCentrifuge, 400K req/sec, immutable rows
Global routingDynamoDB Global TablesMulti-master, ~15ms latency
CachingRedis (ElastiCache)95% hit rate, ~5ms latency
TransactionalAurora PostgreSQLPer-cell primary database
ArchivalS3Long-term storage, analytics
Compute
Container orchestrationAmazon EKS100-200 nodes per cell
ServerlessAWS LambdaCell Router, event processing
High-performance servicesGoTAPI: 800K RPS, 30ms latency
Distributed lockingConsulDirector locks in Centrifuge
Networking
Service meshVPC LatticeOverlapping VPC CIDRs, service routing
API managementAPI GatewayRate limiting, versioning
CDNCloudFrontLambda@Edge for Cell Router
DNSRoute 53Latency-based routing
Operations
Infrastructure as CodeTerraformCell provisioning (~30 min)
GitOpsArgoCDApplication deployment
Multi-account governanceAWS Control TowerCell = AWS Account
Workflow orchestrationStep FunctionsControl Plane automation
Protocols
SMS carrierSMPP4,800 carrier connections
Voice signalingSIPBYOC, PBX integration
Real-time mediaWebRTC9 GLL Edge locations
AI agent authOAuth 2.1 + MCPStytch Connected Apps

Interview Quick Reference

When asked "What technologies does Twilio use?"

"Twilio runs on AWS with a cell-based architecture. Each cell is a separate AWS account with its own EKS cluster, Aurora PostgreSQL, and MSK (Kafka). For the Segment CDP specifically, they use Go for high-throughput services, RocksDB for embedded deduplication at 60 billion keys, and MySQL as a 'database-as-queue' for Centrifuge which handles 400K outbound requests/second. DynamoDB Global Tables provide multi-region routing, Redis caches for 95% hit rates, and VPC Lattice enables service mesh across cells with overlapping IP spaces. For carrier connectivity, they maintain 4,800 SMPP connections through their Super Network."