Service Mesh: Istio & Envoy
Modern microservices networking: traffic management, security, observability
What is a Service Mesh?
The Problem
In microservices architectures, services need to:
- Find and communicate with each other (service discovery)
- Handle failures gracefully (retries, timeouts, circuit breakers)
- Secure communications (mTLS, authentication)
- Route traffic intelligently (canary deployments, A/B testing)
- Observe what's happening (tracing, metrics, logs)
Traditional Approach: Each service implements this logic (messy, duplicated, error-prone)
Service Mesh Approach: Move networking logic OUT of the app, INTO the infrastructure
The Solution: Service Mesh
A service mesh is a dedicated infrastructure layer for handling service-to-service communication.
- Sidecar Proxy: Each service gets a proxy (Envoy) deployed alongside it
- Control Plane: Manages and configures the proxies (Istio does this)
- Zero Code Changes: Applications don't know about the mesh
Architecture: Without vs With Service Mesh
graph TB
subgraph WITHOUT["WITHOUT Service Mesh"]
A1["Service A
(handles retries,
circuit breaking,
TLS, metrics)"]
B1["Service B
(handles retries,
circuit breaking,
TLS, metrics)"]
C1["Service C
(handles retries,
circuit breaking,
TLS, metrics)"]
A1 -->|"Complex networking
code in app"| B1
B1 -->|"Complex networking
code in app"| C1
end
subgraph WITH["WITH Service Mesh (Istio + Envoy)"]
subgraph POD_A["Pod A"]
APP_A["Service A
(pure business logic)"]
ENVOY_A["Envoy Proxy
(sidecar)"]
end
subgraph POD_B["Pod B"]
APP_B["Service B
(pure business logic)"]
ENVOY_B["Envoy Proxy
(sidecar)"]
end
subgraph POD_C["Pod C"]
APP_C["Service C
(pure business logic)"]
ENVOY_C["Envoy Proxy
(sidecar)"]
end
CONTROL["Istio Control Plane
(configures all proxies)"]
APP_A -->|"localhost"| ENVOY_A
ENVOY_A -->|"mTLS, retries,
load balancing"| ENVOY_B
APP_B -->|"localhost"| ENVOY_B
ENVOY_B -->|"mTLS, retries,
load balancing"| ENVOY_C
APP_C -->|"localhost"| ENVOY_C
CONTROL -.->|"Config"| ENVOY_A
CONTROL -.->|"Config"| ENVOY_B
CONTROL -.->|"Config"| ENVOY_C
end
style WITHOUT fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:#2e3440
style WITH fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD_A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD_B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD_C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style ENVOY_A fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style ENVOY_B fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style ENVOY_C fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style CONTROL fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
Envoy Proxy - The Data Plane
What is Envoy?
Envoy is a high-performance C++ proxy originally built by Lyft. It's the "worker" in the service mesh - handling actual network traffic.
Key Features
- Layer 7 Proxy: Understands HTTP/1.1, HTTP/2, gRPC
- Load Balancing: Round robin, least request, random, ring hash
- Service Discovery: Dynamic endpoint discovery
- Health Checking: Active/passive health checks
- Retries & Timeouts: Automatic retry with backoff
- Circuit Breaking: Prevent cascading failures
- Observability: Rich metrics, distributed tracing
- Performance: Handles 100k+ requests/sec per instance
How Envoy Works (Sidecar Pattern)
graph LR
CLIENT["Client App
(Service A)"]
ENVOY_OUT["Envoy Sidecar
(Outbound)"]
NETWORK["Network"]
ENVOY_IN["Envoy Sidecar
(Inbound)"]
SERVER["Server App
(Service B)"]
CLIENT -->|"1. localhost:8080"| ENVOY_OUT
ENVOY_OUT -->|"2. Intercepts
Applies routing,
retries, mTLS"| NETWORK
NETWORK -->|"3. mTLS connection"| ENVOY_IN
ENVOY_IN -->|"4. Validates mTLS
Checks policies
Forwards to localhost"| SERVER
style CLIENT fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
style ENVOY_OUT fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#2e3440
style ENVOY_IN fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#2e3440
style SERVER fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style NETWORK fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#2e3440
Traffic Flow:
- App calls
http://service-b:8080
- Envoy intercepts (iptables rules redirect traffic)
- Envoy applies routing rules, retries, load balancing, mTLS
- Envoy sends to destination Envoy sidecar
- Destination Envoy validates mTLS, checks policies
- Destination Envoy forwards to app on localhost
Istio - The Control Plane
What is Istio?
Istio is the "brain" of the service mesh. It configures and manages all the Envoy proxies.
Components
- istiod: Control plane daemon (configuration, certificates, service discovery)
- Ingress Gateway: Entry point for external traffic
- Egress Gateway: Exit point for outbound traffic
Istio Architecture
graph TB
KUBECTL["kubectl apply
(VirtualService, DestinationRule)"]
subgraph CONTROL["Istio Control Plane"]
ISTIOD["istiod
- Service discovery
- Configuration
- Certificate Authority"]
end
KUBECTL -->|"Config"| ISTIOD
subgraph K8S["Kubernetes Cluster"]
subgraph POD1["Pod 1"]
APP1["Service A"]
PROXY1["Envoy Proxy"]
end
subgraph POD2["Pod 2"]
APP2["Service B"]
PROXY2["Envoy Proxy"]
end
subgraph POD3["Pod 3"]
APP3["Service C"]
PROXY3["Envoy Proxy"]
end
INGRESS["Istio Ingress Gateway
(Envoy-based)"]
end
EXTERNAL["External Traffic"]
EXTERNAL -->|"HTTPS"| INGRESS
INGRESS --> PROXY1
ISTIOD -.->|"xDS APIs
(config push)"| PROXY1
ISTIOD -.->|"xDS APIs"| PROXY2
ISTIOD -.->|"xDS APIs"| PROXY3
ISTIOD -.->|"xDS APIs"| INGRESS
APP1 -->|"localhost"| PROXY1
PROXY1 -->|"mTLS"| PROXY2
APP2 -->|"localhost"| PROXY2
PROXY2 -->|"mTLS"| PROXY3
APP3 -->|"localhost"| PROXY3
style CONTROL fill:#bbdefb,stroke:#1976d2,stroke-width:3px,color:#2e3440
style K8S fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#2e3440
style POD1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style PROXY1 fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style PROXY2 fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style PROXY3 fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style ISTIOD fill:#90caf9,stroke:#0277bd,stroke-width:2px,color:#2e3440
style INGRESS fill:#ffccbc,stroke:#e64a19,stroke-width:2px,color:#2e3440
Key Service Mesh Capabilities
1. Traffic Management
Canary Deployments (Gradual Rollout)
# Route 90% to v1, 10% to v2 (canary)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-route
spec:
hosts:
- reviews
http:
- match:
- headers:
user-type:
exact: "beta-tester"
route:
- destination:
host: reviews
subset: v2
weight: 100
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
---
# Define subsets (versions)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews-destination
spec:
host: reviews
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Circuit Breaking
# Prevent cascading failures
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews-circuit-breaker
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
maxRequestsPerConnection: 2
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
Retries & Timeouts
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-retries
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
timeout: 10s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure
2. Security (mTLS)
Mutual TLS: Automatic encryption + authentication between services
# Enable strict mTLS for entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT
---
# Authorization: Only service A can call service B
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: reviews-viewer
namespace: production
spec:
selector:
matchLabels:
app: reviews
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/production/sa/productpage"]
to:
- operation:
methods: ["GET"]
graph LR
A_APP["Service A"]
A_PROXY["Envoy A"]
B_PROXY["Envoy B"]
B_APP["Service B"]
A_APP -->|"HTTP"| A_PROXY
A_PROXY -->|"mTLS encrypted
+ Service A cert"| B_PROXY
B_PROXY -->|"Validates cert
Decrypts
Forwards HTTP"| B_APP
style A_PROXY fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style B_PROXY fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style A_APP fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
style B_APP fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#2e3440
Benefits:
- Zero code changes - apps use plain HTTP
- Automatic certificate rotation (Istio CA)
- Identity-based security (not IP-based)
3. Observability
What you get automatically:
- Metrics: Request rate, latency, error rate (RED metrics)
- Distributed Tracing: See request path across services (Jaeger, Zipkin)
- Access Logs: Detailed logs of all requests
Example: Prometheus Metrics
# Istio automatically exports metrics:
# Request rate
istio_requests_total{
source_app="productpage",
destination_app="reviews",
response_code="200"
}
# Request duration
istio_request_duration_milliseconds{
source_app="productpage",
destination_app="reviews"
}
# Query example
rate(istio_requests_total{destination_app="reviews"}[5m])
Distributed Tracing
Envoy automatically propagates trace headers (B3, W3C Trace Context):
# Trace shows entire request flow:
1. Frontend → Gateway (2ms)
2. Gateway → Product Page (5ms)
3. Product Page → Reviews (10ms)
4. Reviews → Ratings (8ms)
Total: 25ms
# Identify bottlenecks visually!
Istio vs Linkerd vs Consul
| Feature |
Istio |
Linkerd |
Consul Connect |
| Proxy |
Envoy (C++) |
Linkerd2-proxy (Rust) |
Envoy |
| Complexity |
High (many features) |
Low (minimalist) |
Medium |
| Performance |
Good (~0.5ms latency) |
Excellent (~0.2ms latency) |
Good |
| Features |
Most comprehensive |
Core features only |
Service discovery + mesh |
| mTLS |
Yes |
Yes (automatic) |
Yes |
| Traffic Management |
Extensive (canary, A/B, etc) |
Basic |
Good |
| Adoption |
Most popular |
Growing (CNCF graduated) |
HashiCorp ecosystem |
| Best For |
Large orgs, complex requirements |
Simplicity, performance-critical |
Multi-cloud, VM + K8s |
When to Use a Service Mesh
✅ Use Service Mesh When:
- Many microservices: 10+ services, complex interactions
- Security requirements: Need mTLS, zero-trust networking
- Advanced traffic control: Canary deployments, A/B testing, traffic splitting
- Multi-language: Services in Java, Python, Go - mesh is language-agnostic
- Observability: Need distributed tracing, detailed metrics without code changes
- Compliance: Regulations require encryption in transit
❌ Don't Use Service Mesh When:
- Simple architecture: 2-3 services, monolith, no complexity
- Resource constraints: Sidecars add ~50-100MB RAM overhead per pod
- Early stage: MVP/prototype - adds operational complexity
- Performance critical: Every request adds ~0.5ms latency
- No K8s: Service meshes work best with Kubernetes
Common Interview Questions
Q: How does Envoy intercept traffic?
A: Istio uses iptables rules to redirect all inbound/outbound traffic to the Envoy sidecar. The app thinks it's connecting directly, but traffic flows through Envoy transparently.
Q: What's the performance impact?
A: Typically adds ~0.5-1ms latency per hop and ~50-100MB RAM per sidecar. For most apps this is acceptable given the benefits.
Q: How does mTLS work without code changes?
A: App sends plain HTTP to localhost → Envoy intercepts, encrypts with mTLS, sends to remote Envoy → Remote Envoy decrypts, forwards plain HTTP to app. Apps never see encryption.
Q: Can you use Envoy without Istio?
A: Yes! Envoy is standalone. Istio is one way to configure Envoy at scale. You can configure Envoy directly for simpler setups.
Q: Istio vs API Gateway?
A:
- API Gateway: External traffic → internal services (north-south)
- Service Mesh: Service-to-service traffic (east-west)
- Often use both: Gateway for ingress, Istio for internal mesh
Q: How does circuit breaking prevent cascading failures?
A: If Service B is slow/failing, Envoy detects pattern (5 consecutive errors), temporarily stops sending traffic to B, returns errors immediately. This prevents Service A from queueing requests and running out of resources.
Quick Reference
| Concept |
Description |
Example |
| Envoy |
High-performance proxy (data plane) |
Sidecar that intercepts all traffic |
| Istio |
Control plane for service mesh |
Configures all Envoy proxies |
| VirtualService |
Traffic routing rules |
90% to v1, 10% to v2 |
| DestinationRule |
Policies after routing decision |
Circuit breaker, load balancing |
| mTLS |
Mutual TLS authentication |
Automatic encryption between services |
| Sidecar |
Container deployed alongside app |
Envoy proxy in same pod |
| Canary Deployment |
Gradual rollout to subset of users |
10% traffic to new version |