Service Mesh: Istio & Envoy

Modern microservices networking: traffic management, security, observability

What is a Service Mesh?

The Problem

In microservices architectures, services need to:

Find and communicate with each other (service discovery)
Handle failures gracefully (retries, timeouts, circuit breakers)
Secure communications (mTLS, authentication)
Route traffic intelligently (canary deployments, A/B testing)
Observe what's happening (tracing, metrics, logs)

Traditional Approach: Each service implements this logic (messy, duplicated, error-prone)

Service Mesh Approach: Move networking logic OUT of the app, INTO the infrastructure

The Solution: Service Mesh

A service mesh is a dedicated infrastructure layer for handling service-to-service communication.

Sidecar Proxy: Each service gets a proxy (Envoy) deployed alongside it
Control Plane: Manages and configures the proxies (Istio does this)
Zero Code Changes: Applications don't know about the mesh

Architecture: Without vs With Service Mesh

graph TB subgraph WITHOUT["WITHOUT Service Mesh"] A1["Service A
(handles retries,
circuit breaking,
TLS, metrics)"] B1["Service B
(handles retries,
circuit breaking,
TLS, metrics)"] C1["Service C
(handles retries,
circuit breaking,
TLS, metrics)"] A1 -->|"Complex networking
code in app"| B1 B1 -->|"Complex networking
code in app"| C1 end subgraph WITH["WITH Service Mesh (Istio + Envoy)"] subgraph POD_A["Pod A"] APP_A["Service A
(pure business logic)"] ENVOY_A["Envoy Proxy
(sidecar)"] end subgraph POD_B["Pod B"] APP_B["Service B
(pure business logic)"] ENVOY_B["Envoy Proxy
(sidecar)"] end subgraph POD_C["Pod C"] APP_C["Service C
(pure business logic)"] ENVOY_C["Envoy Proxy
(sidecar)"] end CONTROL["Istio Control Plane
(configures all proxies)"] APP_A -->|"localhost"| ENVOY_A ENVOY_A -->|"mTLS, retries,
load balancing"| ENVOY_B APP_B -->|"localhost"| ENVOY_B ENVOY_B -->|"mTLS, retries,
load balancing"| ENVOY_C APP_C -->|"localhost"| ENVOY_C CONTROL -.->|"Config"| ENVOY_A CONTROL -.->|"Config"| ENVOY_B CONTROL -.->|"Config"| ENVOY_C end style WITHOUT fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:#2e3440 style WITH fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD_A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD_B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD_C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style ENVOY_A fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style ENVOY_B fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style ENVOY_C fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style CONTROL fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440

Envoy Proxy - The Data Plane

What is Envoy?

Envoy is a high-performance C++ proxy originally built by Lyft. It's the "worker" in the service mesh - handling actual network traffic.

Key Features

Layer 7 Proxy: Understands HTTP/1.1, HTTP/2, gRPC
Load Balancing: Round robin, least request, random, ring hash
Service Discovery: Dynamic endpoint discovery
Health Checking: Active/passive health checks
Retries & Timeouts: Automatic retry with backoff
Circuit Breaking: Prevent cascading failures
Observability: Rich metrics, distributed tracing
Performance: Handles 100k+ requests/sec per instance

How Envoy Works (Sidecar Pattern)

graph LR CLIENT["Client App
(Service A)"] ENVOY_OUT["Envoy Sidecar
(Outbound)"] NETWORK["Network"] ENVOY_IN["Envoy Sidecar
(Inbound)"] SERVER["Server App
(Service B)"] CLIENT -->|"1. localhost:8080"| ENVOY_OUT ENVOY_OUT -->|"2. Intercepts
Applies routing,
retries, mTLS"| NETWORK NETWORK -->|"3. mTLS connection"| ENVOY_IN ENVOY_IN -->|"4. Validates mTLS
Checks policies
Forwards to localhost"| SERVER style CLIENT fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440 style ENVOY_OUT fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#2e3440 style ENVOY_IN fill:#fff9c4,stroke:#f57f17,stroke-width:3px,color:#2e3440 style SERVER fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style NETWORK fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#2e3440

Traffic Flow:

App calls http://service-b:8080
Envoy intercepts (iptables rules redirect traffic)
Envoy applies routing rules, retries, load balancing, mTLS
Envoy sends to destination Envoy sidecar
Destination Envoy validates mTLS, checks policies
Destination Envoy forwards to app on localhost

Istio - The Control Plane

What is Istio?

Istio is the "brain" of the service mesh. It configures and manages all the Envoy proxies.

Components

istiod: Control plane daemon (configuration, certificates, service discovery)
Ingress Gateway: Entry point for external traffic
Egress Gateway: Exit point for outbound traffic

Istio Architecture

graph TB KUBECTL["kubectl apply
(VirtualService, DestinationRule)"] subgraph CONTROL["Istio Control Plane"] ISTIOD["istiod
- Service discovery
- Configuration
- Certificate Authority"] end KUBECTL -->|"Config"| ISTIOD subgraph K8S["Kubernetes Cluster"] subgraph POD1["Pod 1"] APP1["Service A"] PROXY1["Envoy Proxy"] end subgraph POD2["Pod 2"] APP2["Service B"] PROXY2["Envoy Proxy"] end subgraph POD3["Pod 3"] APP3["Service C"] PROXY3["Envoy Proxy"] end INGRESS["Istio Ingress Gateway
(Envoy-based)"] end EXTERNAL["External Traffic"] EXTERNAL -->|"HTTPS"| INGRESS INGRESS --> PROXY1 ISTIOD -.->|"xDS APIs
(config push)"| PROXY1 ISTIOD -.->|"xDS APIs"| PROXY2 ISTIOD -.->|"xDS APIs"| PROXY3 ISTIOD -.->|"xDS APIs"| INGRESS APP1 -->|"localhost"| PROXY1 PROXY1 -->|"mTLS"| PROXY2 APP2 -->|"localhost"| PROXY2 PROXY2 -->|"mTLS"| PROXY3 APP3 -->|"localhost"| PROXY3 style CONTROL fill:#bbdefb,stroke:#1976d2,stroke-width:3px,color:#2e3440 style K8S fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#2e3440 style POD1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style PROXY1 fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style PROXY2 fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style PROXY3 fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style ISTIOD fill:#90caf9,stroke:#0277bd,stroke-width:2px,color:#2e3440 style INGRESS fill:#ffccbc,stroke:#e64a19,stroke-width:2px,color:#2e3440

Key Service Mesh Capabilities

1. Traffic Management

Canary Deployments (Gradual Rollout)

# Route 90% to v1, 10% to v2 (canary)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-route
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        user-type:
          exact: "beta-tester"
    route:
    - destination:
        host: reviews
        subset: v2
      weight: 100
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10

---
# Define subsets (versions)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-destination
spec:
  host: reviews
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

Circuit Breaking

# Prevent cascading failures
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-circuit-breaker
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Retries & Timeouts

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-retries
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
    timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,reset,connect-failure

2. Security (mTLS)

Mutual TLS: Automatic encryption + authentication between services

# Enable strict mTLS for entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

---
# Authorization: Only service A can call service B
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: reviews-viewer
  namespace: production
spec:
  selector:
    matchLabels:
      app: reviews
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/productpage"]
    to:
    - operation:
        methods: ["GET"]

graph LR A_APP["Service A"] A_PROXY["Envoy A"] B_PROXY["Envoy B"] B_APP["Service B"] A_APP -->|"HTTP"| A_PROXY A_PROXY -->|"mTLS encrypted
+ Service A cert"| B_PROXY B_PROXY -->|"Validates cert
Decrypts
Forwards HTTP"| B_APP style A_PROXY fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style B_PROXY fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style A_APP fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440 style B_APP fill:#c8e6c9,stroke:#388e3c,stroke-width:2px,color:#2e3440

Benefits:

Zero code changes - apps use plain HTTP
Automatic certificate rotation (Istio CA)
Identity-based security (not IP-based)

3. Observability

What you get automatically:

Metrics: Request rate, latency, error rate (RED metrics)
Distributed Tracing: See request path across services (Jaeger, Zipkin)
Access Logs: Detailed logs of all requests

Example: Prometheus Metrics

# Istio automatically exports metrics:

# Request rate
istio_requests_total{
  source_app="productpage",
  destination_app="reviews",
  response_code="200"
}

# Request duration
istio_request_duration_milliseconds{
  source_app="productpage",
  destination_app="reviews"
}

# Query example
rate(istio_requests_total{destination_app="reviews"}[5m])

Distributed Tracing

Envoy automatically propagates trace headers (B3, W3C Trace Context):

# Trace shows entire request flow:
1. Frontend → Gateway (2ms)
2. Gateway → Product Page (5ms)
3. Product Page → Reviews (10ms)
4. Reviews → Ratings (8ms)
Total: 25ms

# Identify bottlenecks visually!

Istio vs Linkerd vs Consul

Feature	Istio	Linkerd	Consul Connect
Proxy	Envoy (C++)	Linkerd2-proxy (Rust)	Envoy
Complexity	High (many features)	Low (minimalist)	Medium
Performance	Good (~0.5ms latency)	Excellent (~0.2ms latency)	Good
Features	Most comprehensive	Core features only	Service discovery + mesh
mTLS	Yes	Yes (automatic)	Yes
Traffic Management	Extensive (canary, A/B, etc)	Basic	Good
Adoption	Most popular	Growing (CNCF graduated)	HashiCorp ecosystem
Best For	Large orgs, complex requirements	Simplicity, performance-critical	Multi-cloud, VM + K8s

When to Use a Service Mesh

✅ Use Service Mesh When:

Many microservices: 10+ services, complex interactions
Security requirements: Need mTLS, zero-trust networking
Advanced traffic control: Canary deployments, A/B testing, traffic splitting
Multi-language: Services in Java, Python, Go - mesh is language-agnostic
Observability: Need distributed tracing, detailed metrics without code changes
Compliance: Regulations require encryption in transit

❌ Don't Use Service Mesh When:

Simple architecture: 2-3 services, monolith, no complexity
Resource constraints: Sidecars add ~50-100MB RAM overhead per pod
Early stage: MVP/prototype - adds operational complexity
Performance critical: Every request adds ~0.5ms latency
No K8s: Service meshes work best with Kubernetes

Common Interview Questions

Q: How does Envoy intercept traffic?

A: Istio uses iptables rules to redirect all inbound/outbound traffic to the Envoy sidecar. The app thinks it's connecting directly, but traffic flows through Envoy transparently.

Q: What's the performance impact?

A: Typically adds ~0.5-1ms latency per hop and ~50-100MB RAM per sidecar. For most apps this is acceptable given the benefits.

Q: How does mTLS work without code changes?

A: App sends plain HTTP to localhost → Envoy intercepts, encrypts with mTLS, sends to remote Envoy → Remote Envoy decrypts, forwards plain HTTP to app. Apps never see encryption.

Q: Can you use Envoy without Istio?

A: Yes! Envoy is standalone. Istio is one way to configure Envoy at scale. You can configure Envoy directly for simpler setups.

Q: Istio vs API Gateway?

API Gateway: External traffic → internal services (north-south)
Service Mesh: Service-to-service traffic (east-west)
Often use both: Gateway for ingress, Istio for internal mesh

Q: How does circuit breaking prevent cascading failures?

A: If Service B is slow/failing, Envoy detects pattern (5 consecutive errors), temporarily stops sending traffic to B, returns errors immediately. This prevents Service A from queueing requests and running out of resources.

Quick Reference

Concept	Description	Example
Envoy	High-performance proxy (data plane)	Sidecar that intercepts all traffic
Istio	Control plane for service mesh	Configures all Envoy proxies
VirtualService	Traffic routing rules	90% to v1, 10% to v2
DestinationRule	Policies after routing decision	Circuit breaker, load balancing
mTLS	Mutual TLS authentication	Automatic encryption between services
Sidecar	Container deployed alongside app	Envoy proxy in same pod
Canary Deployment	Gradual rollout to subset of users	10% traffic to new version