☸️ Kubernetes Architecture - Complete Guide
Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications.
Key Benefits:
- Automated rollouts and rollbacks
- Self-healing (restarts failed containers)
- Horizontal scaling
- Service discovery and load balancing
- Secret and configuration management
- Storage orchestration
Kubernetes Cluster Architecture
graph LR
KUBECTL["kubectl"] -.->|"Commands"| API
subgraph CP["CONTROL PLANE (Master)"]
API["API Server
(kube-apiserver)"]
SCHED["Scheduler
(kube-scheduler)"]
CTRL["Controller Manager
(kube-controller-manager)"]
ETCD["etcd
(Distributed key-value store)"]
API --> ETCD
SCHED --> API
CTRL --> API
end
subgraph WN["WORKER NODES"]
subgraph N1["Node 1"]
KUB1["kubelet"]
PROXY1["kube-proxy"]
RT1["Container Runtime
(containerd)"]
subgraph PODS1["Pods"]
POD1["Pod1"]
POD2["Pod2"]
POD3["Pod3"]
POD4["Pod4"]
end
KUB1 --> PODS1
PROXY1 --> PODS1
RT1 --> PODS1
end
N2["Node 2, Node 3, ...
(same structure)"]
end
API -.->|"Manages"| KUB1
API -.->|"Manages"| PROXY1
style CP fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#2e3440
style WN fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#2e3440
style N1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
style PODS1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style API fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
style SCHED fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
style CTRL fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
style ETCD fill:#ffccbc,stroke:#e64a19,stroke-width:2px,color:#2e3440
style KUBECTL fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
Kubernetes Resource Relationships
1. Nodes and Pods Relationship
Concept: Nodes are physical/virtual machines. Pods run on Nodes.
graph TB
subgraph CLUSTER["Kubernetes Cluster"]
subgraph NODE1["Node 1 (Worker Machine)
IP: 10.0.1.5"]
POD1A["Pod: web-app-1
IP: 192.168.1.10
Containers: nginx"]
POD1B["Pod: web-app-2
IP: 192.168.1.11
Containers: nginx"]
POD1C["Pod: cache-1
IP: 192.168.1.12
Containers: redis"]
end
subgraph NODE2["Node 2 (Worker Machine)
IP: 10.0.1.6"]
POD2A["Pod: web-app-3
IP: 192.168.2.10
Containers: nginx"]
POD2B["Pod: db-1
IP: 192.168.2.11
Containers: postgres"]
end
subgraph NODE3["Node 3 (Worker Machine)
IP: 10.0.1.7"]
POD3A["Pod: web-app-4
IP: 192.168.3.10
Containers: nginx"]
POD3B["Pod: worker-1
IP: 192.168.3.11
Containers: python"]
end
end
style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440
style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440
style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440
style POD1A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD1B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD1C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD2A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD2B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD3A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD3B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
Key Points:
- Each Node can run multiple Pods
- Pods get unique IP addresses (pod network)
- Scheduler decides which Node runs which Pod
- Nodes have resources (CPU, Memory) that Pods consume
2. ReplicaSet: Managing Pod Replicas
Concept: ReplicaSet ensures N identical Pods are always running.
graph TB
RS["ReplicaSet: web-app
Desired: 3 replicas
Selector: app=web"]
RS -->|"Creates & Manages"| POD1["Pod: web-app-abc123
Labels: app=web
Status: Running"]
RS -->|"Creates & Manages"| POD2["Pod: web-app-def456
Labels: app=web
Status: Running"]
RS -->|"Creates & Manages"| POD3["Pod: web-app-ghi789
Labels: app=web
Status: Running"]
DEAD["Pod: web-app-xyz
Status: Failed ❌"]
RS -.->|"Detects failure
Creates replacement"| POD3
DEAD -.->|"Was managed by"| RS
NODE1["Node 1"] -.->|"Runs"| POD1
NODE2["Node 2"] -.->|"Runs"| POD2
NODE3["Node 3"] -.->|"Runs"| POD3
style RS fill:#bbdefb,stroke:#1976d2,stroke-width:3px,color:#2e3440
style POD1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style DEAD fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:#2e3440
style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
How it works:
- ReplicaSet continuously monitors Pod count
- If Pod crashes → ReplicaSet creates replacement
- If you delete a Pod → ReplicaSet creates new one
- Pods are matched by labels (app=web)
- All Pods are identical (same container image/config)
3. DaemonSet: One Pod Per Node
Concept: DaemonSet ensures exactly ONE Pod runs on every Node (or matching Nodes).
graph TB
DS["DaemonSet: log-collector
Runs on: ALL nodes"]
subgraph CLUSTER["Cluster"]
subgraph NODE1["Node 1"]
POD1["Pod: log-collector-node1
Collects logs from Node 1"]
APP1A["App Pod 1"]
APP1B["App Pod 2"]
end
subgraph NODE2["Node 2"]
POD2["Pod: log-collector-node2
Collects logs from Node 2"]
APP2A["App Pod 3"]
end
subgraph NODE3["Node 3"]
POD3["Pod: log-collector-node3
Collects logs from Node 3"]
APP3A["App Pod 4"]
APP3B["App Pod 5"]
end
end
DS -->|"Ensures 1 Pod on"| NODE1
DS -->|"Ensures 1 Pod on"| NODE2
DS -->|"Ensures 1 Pod on"| NODE3
POD1 -.->|"Monitors"| APP1A
POD1 -.->|"Monitors"| APP1B
POD2 -.->|"Monitors"| APP2A
POD3 -.->|"Monitors"| APP3A
POD3 -.->|"Monitors"| APP3B
style DS fill:#ce93d8,stroke:#8e24aa,stroke-width:3px,color:#2e3440
style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440
style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440
style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440
style POD1 fill:#e1bee7,stroke:#8e24aa,stroke-width:2px,color:#2e3440
style POD2 fill:#e1bee7,stroke:#8e24aa,stroke-width:2px,color:#2e3440
style POD3 fill:#e1bee7,stroke:#8e24aa,stroke-width:2px,color:#2e3440
style APP1A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style APP1B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style APP2A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style APP3A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style APP3B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
Common DaemonSet Use Cases:
- Log Collection: Fluentd, Logstash on every node
- Monitoring: Node exporters, monitoring agents
- Storage: Storage drivers (Ceph, GlusterFS)
- Networking: Network plugins (kube-proxy, Calico)
4. Sidecar Pattern: Multiple Containers in One Pod
Concept: Pods can have multiple containers that share resources.
graph TB
subgraph POD["Pod: web-app-with-sidecar
IP: 192.168.1.10"]
subgraph SHARED["Shared Resources"]
NETWORK["Shared Network
(localhost)"]
VOLUME["Shared Volume
(/var/log)"]
end
MAIN["Main Container
nginx:1.21
Port: 80
Writes logs to /var/log/nginx/"]
SIDECAR["Sidecar Container
fluentd
Reads logs from /var/log/nginx/
Sends to Elasticsearch"]
MAIN -->|"Shares"| NETWORK
MAIN -->|"Writes to"| VOLUME
SIDECAR -->|"Shares"| NETWORK
SIDECAR -->|"Reads from"| VOLUME
end
EXTERNAL["External Log Storage
(Elasticsearch)"]
SIDECAR -.->|"Forwards logs"| EXTERNAL
style POD fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#2e3440
style MAIN fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440
style SIDECAR fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440
style NETWORK fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#2e3440
style VOLUME fill:#ffe0b2,stroke:#ff6f00,stroke-width:2px,color:#2e3440
style SHARED fill:#fafafa,stroke:#757575,stroke-width:2px,color:#2e3440
style EXTERNAL fill:#c5e1a5,stroke:#558b2f,stroke-width:2px,color:#2e3440
Sidecar Benefits:
- Shared Network: Containers communicate via localhost
- Shared Storage: Containers can share volumes
- Same Lifecycle: Started/stopped together
- Co-located: Always on same Node
Common Sidecar Patterns:
- Logging: Sidecar collects/forwards logs
- Proxying: Envoy/Istio sidecar for service mesh
- Monitoring: Metrics collection sidecar
- Security: Authentication/authorization proxy
5. Complete Hierarchy: Deployment → ReplicaSet → Pods
Concept: Deployments manage ReplicaSets, which manage Pods.
graph TB
DEP["Deployment: web-app
Replicas: 3
Image: nginx:1.21"]
RS_NEW["ReplicaSet: web-app-v2
Replicas: 3
Current"]
RS_OLD["ReplicaSet: web-app-v1
Replicas: 0
Kept for rollback"]
DEP -->|"Creates/Manages"| RS_NEW
DEP -.->|"Keeps for rollback"| RS_OLD
RS_NEW -->|"Manages"| POD1["Pod: web-app-v2-abc
nginx:1.21"]
RS_NEW -->|"Manages"| POD2["Pod: web-app-v2-def
nginx:1.21"]
RS_NEW -->|"Manages"| POD3["Pod: web-app-v2-ghi
nginx:1.21"]
RS_OLD -.->|"Previously managed"| POD_OLD["Pod: web-app-v1-xyz
nginx:1.20
(terminated)"]
subgraph NODES["Distributed Across Nodes"]
NODE1["Node 1"] -.-> POD1
NODE2["Node 2"] -.-> POD2
NODE3["Node 3"] -.-> POD3
end
style DEP fill:#90caf9,stroke:#0277bd,stroke-width:3px,color:#2e3440
style RS_NEW fill:#bbdefb,stroke:#1976d2,stroke-width:3px,color:#2e3440
style RS_OLD fill:#e0e0e0,stroke:#616161,stroke-width:2px,stroke-dasharray: 5 5,color:#2e3440
style POD1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440
style POD_OLD fill:#ffcdd2,stroke:#c62828,stroke-width:2px,stroke-dasharray: 5 5,color:#2e3440
style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440
style NODES fill:#fafafa,stroke:#757575,stroke-width:2px,color:#2e3440
Why this hierarchy?
- Deployment: Declarative updates, rollbacks, versioning
- ReplicaSet: Ensures Pod count, self-healing
- Pod: Runs actual containers
Update Process:
- You update Deployment (change image nginx:1.20 → nginx:1.21)
- Deployment creates NEW ReplicaSet (web-app-v2)
- New ReplicaSet scales UP (creates 3 new Pods)
- Old ReplicaSet scales DOWN (terminates old Pods)
- Old ReplicaSet kept with 0 replicas (for rollback)
Control Plane Components
1. API Server (kube-apiserver)
Purpose: The front-end of the Kubernetes control plane. All components communicate through the API server.
How it works:
- Exposes Kubernetes API (RESTful)
- Validates and processes API requests
- Updates etcd with cluster state
- Only component that talks directly to etcd
- Authenticates and authorizes requests (RBAC)
- Serves as the gateway for kubectl, controllers, scheduler
Request Flow:
- kubectl sends request to API server
- API server authenticates and authorizes
- API server validates the request
- API server writes to etcd
- API server returns response
Interview Tip: The API server is stateless and horizontally scalable. It's the only component that directly accesses etcd. All cluster state changes go through the API server.
2. etcd
Purpose: Distributed, consistent key-value store that holds the entire cluster state.
How it works:
- Stores all cluster configuration and state
- Uses Raft consensus algorithm for consistency
- Provides watch mechanism for state changes
- Highly available (typically 3 or 5 instances)
- Strong consistency (CP in CAP theorem)
Stored data includes:
- Cluster configuration
- Resource definitions (Pods, Services, etc.)
- Secrets and ConfigMaps
- Node status
- Current state vs desired state
Interview Tip: etcd is critical - if etcd fails, the cluster can't function. Regular backups are essential. Uses Raft consensus (see raft_consensus.py).
3. Scheduler (kube-scheduler)
Purpose: Assigns Pods to Nodes based on resource requirements and constraints.
How it works:
- Watches for newly created Pods with no assigned Node
- Filters nodes (eliminates unsuitable nodes)
- Scores nodes (ranks remaining nodes)
- Selects best node and binds Pod to it
Scheduling Process:
- Filtering: Remove nodes that don't meet requirements
- Insufficient CPU/memory
- Node selectors don't match
- Taints/tolerations conflicts
- Volume constraints
- Scoring: Rank remaining nodes
- Resource availability
- Pod spreading (balance across nodes)
- Affinity/anti-affinity rules
- Binding: Assign Pod to highest-scoring node
Interview Tip: Scheduler only assigns Pods to Nodes. kubelet actually runs the Pod. You can write custom schedulers if needed.
4. Controller Manager (kube-controller-manager)
Purpose: Runs controller processes that regulate the cluster state.
How it works:
- Watches the cluster state via API server
- Makes changes to move current state → desired state
- Runs many controllers in a single process
Key Controllers:
- Node Controller: Monitors node health, marks unavailable
- Replication Controller: Maintains correct number of Pods
- Endpoints Controller: Populates Endpoints (Services + Pods)
- Service Account Controller: Creates default service accounts
- Namespace Controller: Manages namespace lifecycle
- Deployment Controller: Manages ReplicaSets for Deployments
- StatefulSet Controller: Manages StatefulSets
- Job Controller: Manages Jobs and CronJobs
Control Loop (Reconciliation):
- Read desired state from API server
- Read current state from API server
- Compare desired vs current
- Take action to reconcile (create, update, delete resources)
- Update status in API server
- Repeat
Interview Tip: Controllers implement the "reconciliation loop" - continuously working to make actual state match desired state. This is Kubernetes' core operating principle.
Node (Worker) Components
5. kubelet
Purpose: Agent that runs on each worker node, ensuring containers are running in Pods.
How it works:
- Registers node with the API server
- Watches for Pod assignments to its node
- Pulls container images
- Starts/stops containers via container runtime
- Reports Pod and node status to API server
- Runs liveness/readiness probes
- Mounts volumes
kubelet Workflow:
- API server assigns Pod to node
- kubelet receives Pod spec
- kubelet tells container runtime to pull images
- kubelet creates volumes if needed
- kubelet tells container runtime to start containers
- kubelet monitors container health
- kubelet reports status back to API server
Interview Tip: kubelet is the "node agent". It doesn't manage containers that weren't created by Kubernetes. It communicates with the container runtime via CRI (Container Runtime Interface).
6. kube-proxy
Purpose: Network proxy that maintains network rules for Pod communication.
How it works:
- Runs on every node
- Watches API server for Service and Endpoint changes
- Maintains network rules (iptables or IPVS)
- Enables communication to Services (load balancing)
- Performs connection forwarding
Modes:
- iptables mode (default): Uses iptables rules for load balancing
- IPVS mode: Uses IPVS (Linux Virtual Server) for better performance
- userspace mode (legacy): Proxies connections in userspace
Service Access Flow:
- Client Pod sends request to Service IP (ClusterIP)
- kube-proxy intercepts via iptables/IPVS rules
- kube-proxy load balances to backend Pod
- Traffic forwarded to selected Pod
Interview Tip: kube-proxy doesn't actually proxy traffic in most modes. It programs iptables/IPVS rules, and the kernel handles the actual routing.
7. Container Runtime
Purpose: Software responsible for running containers.
How it works:
- Pulls container images from registries
- Unpacks images
- Runs containers
- Implements CRI (Container Runtime Interface)
Supported Runtimes:
- containerd (most common): Industry standard, Docker's runtime
- CRI-O: Lightweight, OCI-compliant
- Docker (deprecated): Dockershim removed in K8s 1.24+
Interview Tip: Docker was deprecated as a runtime because K8s talks to containerd directly now (which Docker uses internally anyway). Your Docker images still work!
8. CNI (Container Network Interface)
Purpose: Plugin interface for configuring network interfaces in containers.
How it works:
- Assigns IP addresses to Pods
- Sets up network routing between Pods
- Enables Pod-to-Pod communication across nodes
- Implements network policies (firewalling)
Popular CNI Plugins:
- Calico: L3 networking, network policies, BGP routing
- Flannel: Simple overlay network (VXLAN)
- Weave Net: Simple setup, encrypts traffic
- Cilium: eBPF-based, advanced observability
- AWS VPC CNI: Native AWS networking
Pod Networking:
- kubelet calls CNI plugin when Pod starts
- CNI assigns IP from Pod CIDR range
- CNI sets up virtual network interface
- CNI configures routes for Pod communication
- Pod can now communicate with other Pods
Interview Tip: Kubernetes network model requires: 1) All Pods can communicate without NAT, 2) All nodes can communicate with all Pods, 3) Each Pod has its own IP.
kubectl - The Kubernetes CLI
kubectl
Purpose: Command-line tool for interacting with Kubernetes clusters.
Common Commands:
# Get resources
kubectl get pods
kubectl get nodes
kubectl get services
kubectl get deployments
# Describe (detailed info)
kubectl describe pod my-pod
kubectl describe node node-1
# Create resources
kubectl create -f deployment.yaml
kubectl apply -f service.yaml
# Update resources
kubectl edit deployment my-app
kubectl scale deployment my-app --replicas=5
# Delete resources
kubectl delete pod my-pod
kubectl delete -f deployment.yaml
# Logs and debugging
kubectl logs my-pod
kubectl logs -f my-pod # follow
kubectl exec -it my-pod -- /bin/bash
# Port forwarding
kubectl port-forward pod/my-pod 8080:80
# Labels and selectors
kubectl get pods -l app=nginx
kubectl label pods my-pod env=prod
Interview Tip: kubectl talks to the API server. It reads config from ~/.kube/config which contains cluster info, credentials, and context.
Kubernetes Resource Types
Pod
Smallest deployable unit. One or more containers that share network and storage.
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
Use case: Basic unit, but usually managed by higher-level resources.
ReplicaSet
Maintains a stable set of replica Pods. Ensures specified number of Pods are running.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: nginx-rs
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
# Pod template here
Use case: Rarely used directly; Deployments manage ReplicaSets.
Deployment
Manages ReplicaSets and provides declarative updates. Most common way to run stateless apps.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
Features: Rolling updates, rollback, scaling, self-healing.
StatefulSet
For stateful applications. Provides stable network identity and persistent storage.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: "mysql"
replicas: 3
selector:
matchLabels:
app: mysql
template:
# Pod template
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Use case: Databases, distributed systems (Kafka, Cassandra).
Features: Ordered deployment/scaling, stable network IDs (pod-0, pod-1), persistent volumes.
DaemonSet
Runs a copy of a Pod on every node.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
# Pod template
Use case: Logging agents (Fluentd), monitoring (Prometheus node exporter), CNI plugins.
Job
Runs a task to completion. For batch processing.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculation
spec:
completions: 5
parallelism: 2
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi",
"-wle", "print bpi(2000)"]
restartPolicy: Never
Use case: Data processing, migrations, batch jobs.
CronJob
Runs Jobs on a schedule.
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup-job
spec:
schedule: "0 2 * * *" # 2 AM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup:latest
restartPolicy: OnFailure
Use case: Backups, report generation, cleanup tasks.
Service Types
| Service Type |
Description |
Use Case |
Access Method |
| ClusterIP (default) |
Exposes Service on cluster-internal IP. Only reachable from within cluster. |
Internal microservices communication |
ClusterIP:Port (e.g., 10.96.0.1:80) |
| NodePort |
Exposes Service on each Node's IP at a static port (30000-32767). |
Development, testing, quick external access |
NodeIP:NodePort (e.g., 192.168.1.10:30080) |
| LoadBalancer |
Creates external load balancer (cloud provider). Assigns external IP. |
Production external access on cloud platforms |
External IP provided by cloud (e.g., AWS ELB) |
| ExternalName |
Maps Service to external DNS name (CNAME). |
Access external services (RDS, external APIs) |
DNS name (e.g., database.example.com) |
Service Examples
ClusterIP Service
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: ClusterIP # default
selector:
app: nginx
ports:
- protocol: TCP
port: 80 # Service port
targetPort: 8080 # Container port
LoadBalancer Service
apiVersion: v1
kind: Service
metadata:
name: my-lb-service
spec:
type: LoadBalancer
selector:
app: web
ports:
- protocol: TCP
port: 80
targetPort: 8080
# Cloud provider provisions external LB
Headless Service (for StatefulSet)
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
clusterIP: None # Headless!
selector:
app: mysql
ports:
- port: 3306
# Provides DNS for each Pod: mysql-0.mysql, mysql-1.mysql, etc.
Additional Important Resources
ConfigMap
Store non-sensitive configuration data.
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
database_url: "postgres://db:5432"
log_level: "info"
Usage: Environment variables or mounted as files.
Secret
Store sensitive data (passwords, tokens).
apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
data:
password: cGFzc3dvcmQ= # base64
Note: Base64 encoded, not encrypted. Use external secret managers for production.
PersistentVolume (PV)
Cluster resource representing storage.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-1
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: slow
hostPath:
path: /mnt/data
PersistentVolumeClaim (PVC)
Request for storage by a user.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: slow
Workflow: User creates PVC → K8s binds to matching PV → Pod uses PVC.
Ingress
HTTP/HTTPS routing to Services.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
spec:
rules:
- host: example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
Requires: Ingress Controller (Nginx, Traefik, HAProxy).
Namespace
Virtual cluster for resource isolation.
apiVersion: v1
kind: Namespace
metadata:
name: production
Use case: Separate dev/staging/prod, multi-tenancy, resource quotas.
Default namespaces: default, kube-system, kube-public, kube-node-lease.
Node Labels and Selectors
Node Labels
Key-value pairs attached to nodes for organization and scheduling.
# Label a node
kubectl label nodes node-1 disktype=ssd
kubectl label nodes node-2 environment=production
# View labels
kubectl get nodes --show-labels
Node Selector (Simple)
Schedule Pods only on nodes with specific labels.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
nodeSelector:
disktype: ssd # Only schedule on nodes with this label
Node Affinity (Advanced)
More expressive than nodeSelector, with soft/hard requirements.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
- nvme
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: environment
operator: In
values:
- production
containers:
- name: nginx
image: nginx
Taints and Tolerations
Prevent Pods from scheduling on nodes unless they tolerate the taint.
# Taint a node (repel Pods)
kubectl taint nodes node-1 key=value:NoSchedule
# Pod with toleration (allows scheduling on tainted node)
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
Use cases: Dedicated nodes (GPU, high-memory), node maintenance, workload isolation.
Key Interview Concepts
How does Kubernetes achieve high availability?
- Control plane: Multiple API servers, schedulers, controllers (leader election)
- etcd: Clustered (3 or 5 instances) with Raft consensus
- Worker nodes: Multiple nodes, Pods distributed across nodes
- Self-healing: Controllers restart failed Pods, reschedule from failed nodes
How does a Pod get created? (End-to-end flow)
- User runs
kubectl create -f pod.yaml
- kubectl sends request to API server
- API server validates, authenticates, authorizes
- API server writes Pod spec to etcd
- Scheduler watches for unassigned Pods
- Scheduler selects a node and binds Pod to it (updates etcd)
- kubelet on that node watches for new Pod assignments
- kubelet tells container runtime to pull image and start containers
- Container runtime starts containers
- kubelet reports Pod status to API server
- kube-proxy updates network rules for Service discovery
How does Service discovery work?
- DNS: CoreDNS provides DNS resolution (my-service.namespace.svc.cluster.local)
- Environment variables: K8s injects Service IPs as env vars
- ClusterIP: Virtual IP for Services, load balanced by kube-proxy
Deployment vs StatefulSet vs DaemonSet
| Aspect |
Deployment |
StatefulSet |
DaemonSet |
| Use case |
Stateless apps (web servers, APIs) |
Stateful apps (databases, Kafka) |
Node-level services (logging, monitoring) |
| Pod identity |
Interchangeable, random names |
Stable, ordered (pod-0, pod-1) |
One per node |
| Scaling |
Unordered, parallel |
Ordered (pod-0 before pod-1) |
Auto-scales with cluster |
| Storage |
Ephemeral or shared volumes |
Persistent, per-Pod storage |
Usually host volumes |
Rolling Update Process
- User updates Deployment (new image version)
- Deployment controller creates new ReplicaSet
- New ReplicaSet scales up (creates new Pods)
- Old ReplicaSet scales down (terminates old Pods)
- Process continues until all Pods are new version
- Old ReplicaSet kept for rollback (history)
Parameters: maxSurge (extra Pods during update), maxUnavailable (Pods down during update)
Summary
Key Takeaways for Interviews
- Architecture: Control plane (API server, etcd, scheduler, controller manager) + Worker nodes (kubelet, kube-proxy, container runtime)
- API Server: Central hub, all communication goes through it, only component that talks to etcd
- etcd: Stores cluster state, uses Raft consensus, critical for cluster operation
- Scheduler: Assigns Pods to Nodes based on resources and constraints
- Controllers: Reconciliation loops that make actual state match desired state
- kubelet: Node agent that runs Pods, reports status
- kube-proxy: Network proxy for Service load balancing
- CNI: Network plugin for Pod networking
- Pods: Smallest unit, usually managed by Deployments/StatefulSets
- Services: Load balancing and service discovery (ClusterIP, NodePort, LoadBalancer)
- Deployments: Manage stateless apps with rolling updates
- StatefulSets: Manage stateful apps with stable identities
- Self-healing: Controllers restart failed Pods automatically