Netflix Distributed Systems Engineer (Compute Runtime) — Study Program

Target Role: L5/L6 Distributed Systems Engineer, Compute Runtime Team
Compensation Range: $499,000 – $900,000
Location: USA Remote
Date Created: April 6, 2026

What This Team Does: The Compute Runtime team owns Netflix's Kubernetes data plane — the kubelet, container runtime (containerd), and the entire node-level stack that runs every Netflix workload on AWS. You'd be building and customizing the software that actually executes containers, not just deploying apps on top of K8s.

1. Role Analysis & Gap Assessment

What They Want (Prioritized)

Requirement	Priority	Your Status
Container runtime internals (containerd, runc, NRI plugins, kubelet)	Critical	Gap
Go proficiency	Critical	Gap
Linux system performance debugging	Critical	Partial (have linux_sysadmin)
Large-scale distributed systems design	High	Strong
Kubernetes architecture & operations	High	Strong
Networking (TCP, IPv4, sockets, container networking)	High	Strong
Docker / container fundamentals	High	Strong
Operational troubleshooting at scale	High	Have
Open source contributions	Preferred	Gap
Linux kernel development	Preferred	Gap
AI/ML compute infrastructure	Preferred	Gap

Good News: Your existing study library covers distributed systems, K8s architecture, Docker, networking, and system design very well. The gaps are deep but focused: Go + container runtime internals + Linux perf. These are learnable in a focused program.

2. Study Program Overview

Phase 1 (Weeks 1-4) Phase 2 (Weeks 5-10) Phase 3 (Weeks 11-16) Phase 4 (Weeks 17-22) ┌─────────────────┐ ┌──────────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────────┐ │ Go Language │────▶│ Container Runtime │────▶│ KaaS Portal Project │────▶│ Open Source + Interview │ │ Fundamentals │ │ Internals & K8s Data │ │ (Multi-Cloud K8s) │ │ Prep & Contributions │ │ + Concurrency │ │ Plane Deep Dive │ │ Go + K8s API + Clouds │ │ │ └─────────────────┘ └──────────────────────────┘ └──────────────────────────┘ └──────────────────────────┘ │ │ │ │ Linux Perf ──────────── Linux Perf ──────────────── Linux Perf ──────────────────── Linux Perf (ongoing thread) (ongoing) (ongoing) (ongoing)

3. Phase 1 — Go Language & Linux Performance (Weeks 1-4)

Go Language Mastery

4 weeks

Go is the language of Kubernetes, containerd, and the entire CNCF ecosystem. This isn't about learning syntax — it's about thinking in Go's concurrency model and understanding how K8s itself is written.

Week 1: Go Fundamentals

Install Go, set up workspace and IDE (GoLand or VS Code + gopls)
Types, structs, interfaces, embedding — Go's composition model
Pointers, slices, maps — understand memory layout
Error handling patterns (no exceptions — error interface, fmt.Errorf, wrapping)
Packages, modules, go mod, dependency management
Build: CLI tool that parses /proc filesystem to show container resource usage

Week 2: Concurrency Deep Dive

Goroutines and the Go scheduler (M:N threading, GOMAXPROCS)
Channels — buffered, unbuffered, directional, select
sync package — Mutex, RWMutex, WaitGroup, Once, Pool
Context package — cancellation, timeouts, value propagation
Race detector (go run -race)
Build: Concurrent container health checker that monitors multiple containers with timeouts

Week 3: Systems Programming in Go

syscall and x/sys/unix packages
Working with Linux namespaces from Go (clone, unshare, setns)
cgroups v2 manipulation from Go
File I/O, os/exec for process management
net package — TCP/UDP servers, Unix domain sockets
Build: Minimal container runtime in Go (namespaces + cgroups + chroot)

Week 4: Go Patterns for K8s Development

client-go library — informers, listers, workqueues
Controller pattern and reconciliation loops
Code generation — deepcopy, client, informer generators
Testing in Go — table-driven tests, mocks, integration tests
Profiling — pprof, trace, benchmarks
Build: Simple K8s controller that watches Pods and logs lifecycle events

Key Resources

Resource	Type	Focus
The Go Programming Language (Donovan & Kernighan)	Book	Comprehensive foundation
Concurrency in Go (Katherine Cox-Buday)	Book	Deep concurrency patterns
Let's Go & Let's Go Further (Alex Edwards)	Book	Production Go patterns
Go by Example (gobyexample.com)	Web	Quick reference
Effective Go & Go Blog	Web	Idiomatic patterns
Kubernetes source code (k8s.io/kubernetes)	Code	Real-world Go at scale

Linux Performance (Ongoing Thread)

Ongoing through all phases

Netflix explicitly requires "Linux system performance debugging capability." This is Brendan Gregg territory — he literally works at a cloud provider and wrote the book on this.

Core Skills to Build

Tool / Area	What to Learn	Why It Matters for This Role
`perf`	CPU profiling, flame graphs, perf stat, perf record	Profiling container workloads, finding hot paths in kubelet
`eBPF / bpftrace`	Tracing syscalls, kernel functions, custom probes	Dynamic tracing of container runtime behavior without restarting
`strace / ltrace`	Syscall tracing, latency analysis	Debugging container startup failures, I/O issues
`cgroups v2`	Resource accounting, limits, CPU/memory/IO controllers	This is literally how containers enforce resource limits
`/proc` and `/sys`	Process info, cgroup hierarchies, network stats	Diagnosing container resource issues at the source
Flame graphs	Generating and reading CPU, off-CPU, memory flame graphs	Visual performance analysis of container workloads
Networking tools	`ss`, `ip`, `tc`, `conntrack`, `tcpdump`	Debugging container networking, CNI issues, service mesh overhead

Key Resources

Systems Performance, 2nd Ed (Brendan Gregg) — The bible
BPF Performance Tools (Brendan Gregg) — eBPF deep dive
brendangregg.com — Free articles, methodologies, checklists
Linux Observability with BPF (Calavera & Fontana)

Practice Method: Spin up containers with deliberate performance issues (CPU hog, memory leak, I/O contention) and use these tools to diagnose them. Document your debugging process as you go.

4. Phase 2 — Container Runtime Internals & K8s Data Plane (Weeks 5-10)

Container Runtime Deep Dive

6 weeks

This is the core of the Netflix role. You need to understand not just how to use containers, but how they actually work at every layer.

Kubernetes Node (Data Plane) ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ ┌──────────┐ CRI gRPC ┌────────────┐ │ │ │ kubelet │──────────────────▶│ containerd │ │ │ │ │ │ │ │ │ │ - Pod │ │ - Images │ OCI Runtime │ │ │ lifecycle │ - Snapshots│───────────────┐ │ │ │ - Volume │ NRI Interface │ - Content │ │ │ │ │ mgmt │ │ │ store │ ┌─────▼──┐ │ │ │ - Device │ ▼ │ - Tasks │ │ runc │ │ │ │ plugins│ ┌─────────┐ │ - NRI host │ │ │ │ │ │ - cAdvisor│ │NRI Plugin│ └────────────┘ │ creates│ │ │ └──────────┘ │(your │ │ & runs │ │ │ │ code) │ │ the │ │ │ └─────────┘ │container│ │ │ └────────┘ │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ Linux Kernel │ │ │ │ namespaces | cgroups v2 | seccomp | AppArmor | eBPF │ │ │ └───────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────┘

Weeks 5-6: OCI & containerd

OCI Runtime Specification — config.json, rootfs, lifecycle hooks
OCI Image Specification — layers, manifests, image indexes
runc source code walkthrough — how it creates namespaces, sets up cgroups, pivots root, execs the process
containerd architecture — GRPC API, plugins, shim v2, snapshotter, content store
CRI (Container Runtime Interface) — how kubelet talks to containerd
Hands-on: Use ctr and crictl to pull images, create containers, inspect namespaces directly
Code Read: Walk through containerd's container creation path in the source

Weeks 7-8: Kubelet Internals

Kubelet source code structure — pkg/kubelet/
Pod lifecycle management — SyncPod, pod workers, status manager
Container runtime manager (genericRuntimeManager)
Volume management — attach, mount, unmount, detach lifecycle
Device plugin framework — GPU, FPGA, custom device allocation
Resource management — CPU manager, memory manager, topology manager
cAdvisor integration — how resource metrics are collected
Build: Custom kubelet plugin or admission webhook in Go

Week 9: NRI (Node Resource Interface)

NRI specification and purpose — runtime-level hooks for resource management
NRI plugin API — container create, update, stop events
Existing NRI plugins — topology-aware scheduling, memory tiering, balloon
How Netflix likely uses NRI — custom resource policies, workload isolation
Build: NRI plugin that enforces custom CPU pinning policy
How NRI differs from admission webhooks (node-level vs API-level)

Week 10: Container Networking & Security

CNI (Container Network Interface) — how containers get network interfaces
veth pairs, network namespaces, bridge networking
iptables/nftables rules that kube-proxy creates
Container security — seccomp profiles, AppArmor, SELinux, capabilities
Image security — signing, scanning, admission policies
Hands-on: Trace a packet from pod A to pod B, documenting every hop

Key Resources

Resource	Type	Focus
containerd source (github.com/containerd/containerd)	Code	The runtime Netflix customizes
runc source (github.com/opencontainers/runc)	Code	The OCI reference runtime
Kubernetes kubelet source (k8s.io/kubernetes/pkg/kubelet)	Code	Data plane brain
NRI repo (github.com/containerd/nri)	Code	Plugin interface you'd extend
Container Security (Liz Rice)	Book	Practical container security from the ground up
Kubernetes in Action, 2nd Ed (Marko Lukša)	Book	Excellent K8s internals coverage
Programming Kubernetes (Hausenblas & Schimanski)	Book	Building K8s-native Go applications

Netflix-Specific Context: Netflix runs entirely on AWS. Their Titus platform was their original container orchestrator. They've been migrating to/integrating with Kubernetes. This team specifically maintains the data plane — the node-level software. Read Netflix's tech blog posts about Titus and their K8s journey. Understanding their migration context will set you apart in interviews.

5. Phase 3 — Capstone Project: KaaS Portal (Weeks 11-16)

Kubernetes-as-a-Service Multi-Cloud Portal

6 weeks

This project directly demonstrates the skills Netflix wants: Go proficiency, K8s API mastery, infrastructure automation, and systems thinking. It's also genuinely useful and fun.

Architecture

┌──────────────────────────────────────────────────────┐ │ KaaS Web Portal │ │ (Go backend + HTMX frontend) │ │ │ │ ┌─────────┐ ┌──────────┐ ┌──────────────────┐ │ │ │ Cluster │ │ Workload │ │ Observability │ │ │ │ Manager │ │ Deployer │ │ Dashboard │ │ │ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │ │ │ │ │ │ │ ┌────▼──────────────▼─────────────────▼──────────┐ │ │ │ Unified K8s Abstraction Layer │ │ │ │ (client-go, controller-runtime, dynamic) │ │ │ └─────┬──────────────┬────────────────┬──────────┘ │ │ │ │ │ │ └────────┼──────────────┼────────────────┼──────────────┘ │ │ │ ┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐ │ GKE │ │ EKS │ │ AKS │ │ (GCP) │ │ (AWS) │ │ (Azure) │ └─────────┘ └───────────┘ └───────────┘

Feature Roadmap

Weeks 11-12: Foundation

Go HTTP server with chi or echo router
Multi-cloud provider interface (Go interfaces for GKE/EKS/AKS)
Cluster CRUD via cloud provider SDKs
Kubeconfig management and secure storage
HTMX frontend for dynamic UI without JS framework
Authentication (OAuth2 / OIDC)

Weeks 13-14: K8s Integration

Dynamic client for multi-cluster resource management
Deploy workloads across clusters (Deployments, Services, Ingress)
Real-time log streaming via K8s API
Namespace and RBAC management
Cost estimation per cluster/workload
Custom K8s controller for managing KaaS resources (CRDs)

Weeks 15-16: Advanced Features

Node pool management — scale up/down, instance types
Cluster health monitoring and alerting (Prometheus metrics)
Multi-cluster service mesh or federation
Custom containerd configuration per cluster
GPU node pool support (relevant to AI/ML compute)
Terraform/Pulumi provider under the hood for IaC

Stretch Goals

NRI plugin deployment management
Custom kubelet configuration profiles
eBPF-based node diagnostics dashboard
Cluster upgrade orchestration
Spot/preemptible instance workload scheduling
Disaster recovery: cross-cloud cluster failover

Why This Project Is Perfect:

Demonstrates Go proficiency in a real systems project
Shows K8s API mastery via client-go and controllers
Proves multi-cloud infrastructure experience
The custom containerd/NRI features show runtime-level understanding
It's a portfolio piece that doubles as a useful tool
Open-source it to check the "OSS contributions" box

6. Phase 4 — Open Source Contributions & Interview Prep (Weeks 17-22)

Open Source & Interview Preparation

6 weeks

Open Source Contribution Strategy

The job posting lists "open source project contribution history" as preferred. Target these repos:

Project	Why	Entry Points
containerd/containerd	Directly relevant to the role	Bug fixes, test improvements, documentation, small features tagged "good first issue"
containerd/nri	NRI is explicitly in the job requirements	Example plugins, test coverage, documentation improvements
kubernetes/kubernetes	Direct relevance, especially sig-node	sig-node issues, kubelet improvements, test-infra
opencontainers/runc	OCI runtime they'd expect you to understand	Bug fixes, test improvements

Contribution Approach: Start by joining Kubernetes Slack (sig-node channel). Attend sig-node meetings. Pick up issues labeled good-first-issue or help-wanted. Even small, well-crafted PRs (test fixes, doc improvements) show you understand the development workflow and codebase. Quality over quantity.

Interview Preparation

System Design Topics (Netflix Focus)

Design a container orchestration platform (Titus-like)
Design a multi-tenant Kubernetes cluster with resource isolation
Design a container image distribution system for 100K+ nodes
Design a node health monitoring and auto-remediation system
Design a GPU scheduling system for ML workloads
Design a container runtime with custom security policies
Design a zero-downtime node upgrade system for K8s

Coding Interview Topics (Go)

Implement a basic container runtime (namespaces, cgroups, rootfs)
Implement a K8s controller with leader election
Implement a rate limiter with workqueue
Concurrent systems problems — producer/consumer, fan-out/fan-in
Data structures and algorithms in Go (standard LeetCode + systems flavor)
Debug a goroutine leak, deadlock, or race condition

Behavioral / Culture (Netflix Specifics)

Netflix's culture is "context not control" — they emphasize:

Freedom & Responsibility — autonomous decision-making
High performance — "adequate performance gets a generous severance"
Radical candor — direct feedback culture
Context over control — explain why, not how

Prepare stories that demonstrate you thriving with autonomy, making high-judgment calls, and giving/receiving direct feedback.

7. Weekly Schedule Template

Assuming ~15-20 hours/week of study time:

Day	Focus (2-3 hrs)	Activity
Monday	Go Programming	Read + exercises from current chapter/topic. Write code.
Tuesday	Container Runtime / K8s Internals	Source code reading, documentation, hands-on labs
Wednesday	Go Programming	Build project feature or solve problems in Go
Thursday	Linux Performance	Tool practice, debugging exercises, Brendan Gregg material
Friday	Project Work (KaaS Portal)	Implement features, write tests, push code
Saturday	Project Work + Open Source	Continue project or work on OSS contribution
Sunday	Review & System Design	Review week's material, practice one system design problem

8. Netflix Tech Blog — Required Reading

These posts give you insight into how this team thinks and what they've built:

Topic	What to Search For	Why
Titus	"Titus, the Netflix container management platform"	Their original container orchestrator — understand what they're migrating from
Container Runtime	"Netflix container runtime" and "Titus Executor"	Their custom runtime work — direct context for this role
Compute	"Auto Scaling Production Services on Titus"	How they think about compute at scale
Networking	"Networking in Titus" and container networking posts	Their approach to container networking on AWS
Performance	Brendan Gregg's Netflix posts on performance	Performance culture and tools they use
Linux	"Netflix and Linux" and kernel-related posts	Their Linux kernel customization approach

9. Success Metrics — How to Know You're Ready

Skill	You're Ready When You Can...
Go	Write a K8s controller from scratch, debug goroutine leaks with pprof, read K8s source code fluently
Container Runtime	Explain the full path from `kubectl run` to process executing in a container, including every component involved. Write an NRI plugin.
Kubelet	Explain pod lifecycle from kubelet's perspective, how it communicates with containerd via CRI, how resource managers work
Linux Perf	Given a "containers are slow" report, systematically diagnose whether the issue is CPU, memory, I/O, or network using perf/eBPF/bpftrace
System Design	Design a container orchestration platform on a whiteboard, covering scheduling, networking, storage, security, and monitoring
Netflix Culture	Articulate how you work with high autonomy, give examples of high-impact decisions you made with incomplete information

Timeline: 22 weeks (~5.5 months) at 15-20 hrs/week. You can compress this if you dedicate more time, or extend it if you want to go deeper on any area. The KaaS project is the crown jewel — it gives you a concrete artifact to talk about in interviews and doubles as a portfolio piece on GitHub.