Distinguished Engineer Study Guide - How the Internet Really Works
The Context: You earned your CCNA at 17 and worked at ISPs during the golden age of internet infrastructure (late 90s/early 2000s). That era built the foundation of today's internet - BGP peering, dialup to broadband transition, cable modem head-ends, and critical services like DNS and email that we take for granted.
For Distinguished Engineer Interviews: They're not testing if you remember ip route 0.0.0.0 0.0.0.0. They want to know:
Your Advantage: You've configured frame relay, BGP peering, and cable modem infrastructure. You understand the why behind these protocols because you've debugged them in production. This guide refreshes that knowledge at the architectural level.
| OSI Layer | TCP/IP Layer | Protocols | Key Function | Addressing |
|---|---|---|---|---|
| 7. Application | Application | HTTP, SMTP, DNS, FTP, TFTP, DHCP | User services & data | URLs, email addresses |
| 6. Presentation | SSL/TLS, ASCII, JPEG | Encryption, encoding | - | |
| 5. Session | NetBIOS, PPTP | Session management | - | |
| 4. Transport | Transport | TCP, UDP | End-to-end reliability | Port numbers (1-65535) |
| 3. Network | Internet | IP, ICMP, IGMP, IPsec | Routing between networks | IP addresses |
| 2. Data Link | Network Access | Ethernet, PPP, Frame Relay | Hop-to-hop delivery | MAC addresses (48-bit) |
| 1. Physical | Cables, fiber, wireless | Bits on the wire | - |
How it works: When you browse to https://example.com:
Why layering matters: Each layer is independent. You can swap Ethernet for WiFi (Layer 2) without changing IP (Layer 3). This modularity enables the internet to evolve - we added IPv6 at Layer 3 without rewriting all applications.
Key Concept: As data moves down the protocol stack (from application to wire), each layer adds its own header with control information. At the receiving end, headers are removed layer by layer (de-encapsulation).
You browse to https://example.com/page.html
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 7: Application Layer │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ HTTP Request: │ │
│ │ GET /page.html HTTP/1.1 │ │
│ │ Host: example.com │ │
│ │ User-Agent: Mozilla/5.0... │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
↓ Add TCP Header
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 4: Transport Layer (TCP Segment) │
│ ┌──────────────────┬────────────────────────────────────────────┐ │
│ │ TCP Header │ HTTP Request │ │
│ │ - Src Port: 54321│ │ │
│ │ - Dst Port: 443 │ │ │
│ │ - Seq: 1000 │ │ │
│ │ - Ack: 5000 │ │ │
│ │ - Flags: PSH,ACK │ │ │
│ │ - Window: 65535 │ │ │
│ │ - Checksum │ │ │
│ └──────────────────┴────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
↓ Add IP Header
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 3: Network Layer (IP Packet) │
│ ┌────────────────┬──────────────────────────────────────────────┐ │
│ │ IP Header │ TCP Segment │ │
│ │ - Ver: 4 │ │ │
│ │ - Src: 10.0.1.5│ │ │
│ │ - Dst: 93.184..│ │ │
│ │ - TTL: 64 │ │ │
│ │ - Protocol: 6 │ (TCP) │ │
│ │ - Checksum │ │ │
│ └────────────────┴──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
↓ Add Ethernet Header & Trailer
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 2: Data Link Layer (Ethernet Frame) │
│ ┌───────────┬────────────────────────────────────┬──────────────┐ │
│ │ Ethernet │ IP Packet │ Ethernet FCS │ │
│ │ Header │ │ (CRC-32) │ │
│ │ - Dst MAC │ │ │ │
│ │ - Src MAC │ │ │ │
│ │ - Type: │ │ │ │
│ │ 0x0800 │ (IPv4) │ │ │
│ └───────────┴────────────────────────────────────┴──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
↓ Convert to bits
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 1: Physical Layer │
│ 101010101011110000111100001111... (electrical/optical signals) │
└─────────────────────────────────────────────────────────────────────┘
Byte Position: 0 6 12 14 ┌───────────────────┬───────────────────┬──────────┬─────────────────────┬──────┐ │ Destination MAC │ Source MAC │ EtherType│ Payload (46-1500) │ FCS │ │ (6 bytes) │ (6 bytes) │ (2 bytes)│ │ (4B) │ └───────────────────┴───────────────────┴──────────┴─────────────────────┴──────┘ Example: Dst MAC: ff:ff:ff:ff:ff:ff (broadcast) or specific host MAC Src MAC: 00:1a:2b:3c:4d:5e (sender's MAC) EtherType: 0x0800 = IPv4, 0x0806 = ARP, 0x86DD = IPv6 FCS: CRC-32 checksum for error detection What changes at each hop: ✓ Source & Destination MACs change (rewritten by each router/switch for next hop) ✓ FCS recalculated ✗ EtherType stays same (identifies payload type)
Bit Position: 0 4 8 16 32 ┌───────┬───────┬───────────────┬───────────────────────────────┐ │Version│ IHL │ DSCP │ ECN │ Total Length │ ├───────┴───────┴───────────────┼───────────────┬───────────────┤ │ Identification │ Flags │ Fragment Offset │ ├───────────────┬───────────────┼───────────────────────────────┤ │ Time to Live │ Protocol │ Header Checksum │ ├───────────────┴───────────────┼───────────────────────────────┤ │ Source IP Address │ ├───────────────────────────────────────────────────────────────┤ │ Destination IP Address │ ├───────────────────────────────────────────────────────────────┤ │ Options (if any) │ └───────────────────────────────────────────────────────────────┘ Key Fields: - Version: 4 (IPv4) or 6 (IPv6) - IHL: Header length (5 = 20 bytes, can be up to 15 = 60 bytes with options) - DSCP: Quality of Service marking - Total Length: Entire packet size (header + data), max 65,535 bytes - TTL: Decremented by each router, prevents loops - Protocol: 6 = TCP, 17 = UDP, 1 = ICMP - Checksum: Covers header only (NOT data) - Source/Dest IP: 32-bit addresses What changes at each hop: ✓ TTL decremented by 1 at each router ✓ Header Checksum recalculated (because TTL changed) ✓ Fragmentation fields (if packet fragmented) ✗ Source & Destination IPs unchanged (end-to-end addressing) ✗ Protocol field unchanged
Bit Position: 0 16 32 ┌───────────────────────────────┬───────────────────────────────┐ │ Source Port │ Destination Port │ ├───────────────────────────────┴───────────────────────────────┤ │ Sequence Number │ ├───────────────────────────────────────────────────────────────┤ │ Acknowledgment Number │ ├────┬────┬─────────┬───────────────────────────────────────────┤ │Data│Rsvd│ Flags │ Window Size │ │Off │ │ │ │ ├────┴────┴─────────┼───────────────────────────────────────────┤ │ Checksum │ Urgent Pointer │ ├───────────────────┴───────────────────────────────────────────┤ │ Options (if any) │ └───────────────────────────────────────────────────────────────┘ Key Fields: - Source/Dest Port: 16-bit (0-65535), identifies application - Well-known: 0-1023 (HTTP=80, HTTPS=443, SSH=22) - Registered: 1024-49151 - Dynamic/Private: 49152-65535 - Sequence Number: Byte position in stream (for ordering, duplicate detection) - Acknowledgment: Next expected byte (cumulative ACK) - Flags: SYN, ACK, FIN, RST, PSH, URG - Window Size: Flow control (how much receiver can accept) - Checksum: Covers header AND data (pseudo-header includes IP addresses) What changes at each hop: ✗ NOTHING! TCP is end-to-end ✗ Routers don't modify TCP headers (they don't even look at them) ✓ NAT routers modify source/dest port (exception for address translation)
Bit Position:
0 16 32
┌───────────────────────────────┬───────────────────────────────┐
│ Source Port │ Destination Port │
├───────────────────────────────┼───────────────────────────────┤
│ Length │ Checksum │
└───────────────────────────────┴───────────────────────────────┘
(then data follows)
Key Fields:
- Source/Dest Port: Same as TCP
- Length: Size of UDP header + data (min 8 bytes)
- Checksum: Optional in IPv4 (mandatory in IPv6)
Why so simple?
UDP = "User Datagram Protocol" = thin wrapper around IP
No connection state, no reliability, no ordering
Network Topology:
[Your Laptop] --- [Home Router] --- [ISP Router] --- [Internet] --- [Web Server]
10.0.1.5 10.0.1.1/ 203.0.113.1 Multiple 93.184.216.34
203.0.113.5 hops
Step-by-Step Header Changes:
1. Your Laptop (10.0.1.5) creates HTTP request:
Layer 7: HTTP GET /page.html
Layer 4: TCP Src=54321 Dst=443 Seq=1000 Ack=5000
Layer 3: IP Src=10.0.1.5 Dst=93.184.216.34 TTL=64 Protocol=6
Layer 2: Eth Src=laptop_MAC Dst=router_MAC Type=0x0800
→ Sends to home router (default gateway)
2. Home Router receives frame:
- Layer 2: Strips Ethernet header (arrived at destination MAC)
- Layer 3: Checks IP dest (93.184.216.34) - not local, must forward
- Decrements TTL: 64 → 63
- Recalculates IP checksum
- NAT Translation: Changes IP Src 10.0.1.5 → 203.0.113.5 (router's public IP)
Changes TCP Src port 54321 → 50001 (NAT mapping)
Recalculates TCP checksum (pseudo-header includes IP)
- Layer 2: Looks up next hop MAC (ARP for ISP router)
- Adds new Ethernet header: Src=router_MAC Dst=ISP_router_MAC
→ Sends to ISP router
3. ISP Router receives frame:
- Layer 2: Strips Ethernet header
- Layer 3: Checks IP dest (93.184.216.34)
- Decrements TTL: 63 → 62
- Recalculates IP checksum
- Consults routing table, finds next hop
- Layer 2: Adds new Ethernet header with next hop's MAC
→ Forwards toward destination
[Multiple routers repeat this process...]
- Each router: TTL decremented, IP checksum recalculated
- Each router: Ethernet header completely replaced (new Src/Dst MACs)
- TCP and Application data: UNTOUCHED
4. Web Server (93.184.216.34) receives frame:
- Layer 2: Strips Ethernet header
- Layer 3: Checks IP dest - it's me! Remove IP header
- Layer 4: Checks TCP dest port 443 - HTTPS process listening
- Removes TCP header, passes HTTP data to web server
- Web server processes: GET /page.html
Response follows reverse path:
- Server → Client
- NAT router translates destination back: 203.0.113.5:50001 → 10.0.1.5:54321
| Layer | Header | Changes Each Hop? | Why |
|---|---|---|---|
| Layer 2 | Ethernet | ✓ Completely replaced | MAC addresses are link-local (next hop only). Each segment has different Src/Dst MACs. |
| Layer 3 | IP | ✓ TTL, Checksum modified | TTL prevents loops (decremented each hop). Checksum recalculated because TTL changed. |
| Layer 4 | TCP/UDP | ✗ Unchanged (mostly) | End-to-end protocol. Routers don't modify. Exception: NAT changes ports for address translation. |
| Layer 7 | Application | ✗ Unchanged | Delivered intact from source to destination. |
Outbound (Internal → External):
Original Packet:
┌─────────────────────────────────────────────────┐
│ IP: Src=10.0.1.5 Dst=93.184.216.34 TTL=64 │
│ TCP: Src=54321 Dst=443 Checksum=0xABCD │
│ HTTP: GET /page.html │
└─────────────────────────────────────────────────┘
After NAT Router:
┌─────────────────────────────────────────────────┐
│ IP: Src=203.0.113.5 Dst=93.184.216.34 TTL=63 │ ← Src IP changed
│ TCP: Src=50001 Dst=443 Checksum=0xXYZ │ ← Src port changed, checksum recalc
│ HTTP: GET /page.html │ ← Unchanged
└─────────────────────────────────────────────────┘
NAT Translation Table:
Internal ←→ External
10.0.1.5:54321 ←→ 203.0.113.5:50001
10.0.1.5:54322 ←→ 203.0.113.5:50002
10.0.1.8:33445 ←→ 203.0.113.5:50003
Inbound (External → Internal):
Router looks up 203.0.113.5:50001 → 10.0.1.5:54321
Rewrites IP dest and TCP dest port, recalculates checksums
Why TCP checksum must be recalculated: TCP checksum includes a pseudo-header with source/destination IP addresses. When NAT changes IP address, TCP checksum becomes invalid and must be recalculated.
A:
A: Router drops packet and sends ICMP "Time Exceeded" (Type 11) back to source. This is how traceroute works - sends packets with increasing TTL values, receives ICMP errors from each hop.
A: Incremental checksum update. IP checksum is additive. Decrementing TTL by 1 = subtract 1 from checksum (with carries). No need to recalculate entire header. Hardware optimized for this.
// Pseudo-code for incremental IP checksum update
old_ttl = 64; new_ttl = 63;
checksum = checksum - old_ttl + new_ttl; // Simplified
// Actual algorithm handles 16-bit arithmetic with one's complement
A: Defense in depth. Each layer protects against different error sources:
The Problem: Early networks (ARPANET) used unreliable protocols. Packets could be lost, duplicated, or arrive out of order. Applications had to handle this complexity.
The Solution: TCP provides reliable, ordered, error-checked delivery of a stream of bytes. Applications can treat the network as a reliable pipe.
How it works:
Why three-way? Two-way is insufficient. If SYN-ACK is lost and retransmitted, server needs to know client received it. Three-way prevents old duplicate SYNs from causing issues.
Window Size = Receiver's buffer space
Sender can transmit Window Size bytes before waiting for ACK
Receiver advertises window size in every ACK
Example:
Window Size = 4 KB
Sender transmits bytes 1000-4999 (4 KB)
Must wait for ACK before sending more
Receiver ACKs byte 5000, advertises window=8KB
Sender can now send bytes 5000-12999
How it works: TCP uses a sliding window for flow control. Receiver advertises how much buffer space it has. Sender doesn't overflow receiver by respecting window size. As receiver consumes data and frees buffer space, window "slides" forward, allowing more data.
Problem: Flow control prevents overwhelming the receiver, but what about the network? If sender blasts packets faster than routers can forward, queues overflow, packets drop.
Solution: TCP congestion control (invented 1988 after "congestion collapse"):
Why it matters: TCP's congestion control is why the internet doesn't collapse under load. Every TCP connection "backs off" when it detects congestion (packet loss), sharing bandwidth fairly.
Four-Way Termination:
Client: FIN (I'm done sending)
Server: ACK (OK, I received your FIN)
Server: FIN (I'm done sending too)
Client: ACK (OK, I received your FIN)
TIME_WAIT State:
After sending final ACK, client waits 2 x MSL (Maximum Segment Lifetime)
Why? In case final ACK is lost - server will retransmit FIN
Prevents old duplicate segments from interfering with new connections
Interview Question: "Why does client have to wait in TIME_WAIT?"
Answer: If the final ACK is lost, the server will retransmit its FIN. If the client closed immediately and a new connection reused the same port, the old FIN might confuse the new connection. TIME_WAIT ensures old segments die before port is reused.
Why UDP exists: TCP's reliability has overhead: handshakes, ACKs, retransmissions, ordering. Some applications prefer speed over reliability:
UDP Header (8 bytes vs TCP's 20+ bytes):
Source Port (16 bits) | Dest Port (16 bits)
Length (16 bits) | Checksum (16 bits)
Data...
How it works: UDP is a thin wrapper around IP. It adds port numbers (for multiplexing) and an optional checksum. No handshake, no acknowledgments, no retransmissions. Send-and-hope.
The Problem: Classful networking (Class A, B, C) wasted address space:
The Solution: CIDR eliminated classes. Use any prefix length needed. Aggregate routes to reduce table size.
192.168.1.0/24 means:
Network: 192.168.1.0
Subnet Mask: 255.255.255.0 (24 ones, 8 zeros)
Usable IPs: 192.168.1.1 - 192.168.1.254 (254 hosts)
Network Address: 192.168.1.0 (first address)
Broadcast: 192.168.1.255 (last address)
10.0.0.0/8 means:
Network: 10.0.0.0
Mask: 255.0.0.0
Usable: 10.0.0.1 - 10.255.255.254 (16,777,214 hosts!)
172.16.50.0/23 means:
Mask: 255.255.254.0 (23 ones)
Covers: 172.16.50.0 - 172.16.51.255 (510 hosts)
Problem: You have a large network (e.g., 192.168.0.0/16 = 65,536 addresses). Broadcasting to all hosts would flood the network. Need to divide into smaller broadcast domains.
Solution: Subnetting divides a large network into smaller, manageable networks.
Benefits:
/24 = 255.255.255.0 = 256 addresses (254 usable)
/25 = 255.255.255.128 = 128 addresses (126 usable)
/26 = 255.255.255.192 = 64 addresses (62 usable)
/27 = 255.255.255.224 = 32 addresses (30 usable)
/28 = 255.255.255.240 = 16 addresses (14 usable)
/29 = 255.255.255.248 = 8 addresses (6 usable)
/30 = 255.255.255.252 = 4 addresses (2 usable - point-to-point links)
/31 = 255.255.255.254 = 2 addresses (2 usable - RFC 3021, point-to-point)
/32 = 255.255.255.255 = 1 address (host route)
Formula:
- Total addresses = 2^(32 - prefix length)
- Usable hosts = Total - 2 (network address + broadcast address)
Exception: /31 has 2 usable (no network/broadcast for point-to-point)
Scenario: Divide 192.168.1.0/24 into 4 equal subnets
Step 1: Determine how many bits needed
- 4 subnets = 2^2, need 2 bits
- New prefix = /24 + 2 = /26
Step 2: Calculate subnet size
- /26 = 255.255.255.192
- 2^(32-26) = 64 addresses per subnet
- 64 - 2 = 62 usable hosts per subnet
Step 3: Calculate subnet ranges
Increment: 64 (subnet size)
Subnet 1: 192.168.1.0/26
Network: 192.168.1.0
First host: 192.168.1.1
Last host: 192.168.1.62
Broadcast: 192.168.1.63
Subnet 2: 192.168.1.64/26
Network: 192.168.1.64
First host: 192.168.1.65
Last host: 192.168.1.126
Broadcast: 192.168.1.127
Subnet 3: 192.168.1.128/26
Network: 192.168.1.128
First host: 192.168.1.129
Last host: 192.168.1.190
Broadcast: 192.168.1.191
Subnet 4: 192.168.1.192/26
Network: 192.168.1.192
First host: 192.168.1.193
Last host: 192.168.1.254
Broadcast: 192.168.1.255
Scenario: Given 192.168.10.0/24, create subnets for:
- Department A: 100 hosts
- Department B: 50 hosts
- Department C: 25 hosts
- 3 point-to-point links: 2 hosts each
Step 1: List requirements largest to smallest
1. Department A: 100 hosts → need 128 addresses → /25 (126 usable)
2. Department B: 50 hosts → need 64 addresses → /26 (62 usable)
3. Department C: 25 hosts → need 32 addresses → /27 (30 usable)
4. Link 1-3: 2 hosts each → need 4 addresses → /30 (2 usable)
Step 2: Allocate sequentially (start from base address)
Dept A: 192.168.10.0/25
Range: 192.168.10.0 - 192.168.10.127
Usable: 192.168.10.1 - 192.168.10.126
Broadcast: 192.168.10.127
Dept B: 192.168.10.128/26 (next available: 128)
Range: 192.168.10.128 - 192.168.10.191
Usable: 192.168.10.129 - 192.168.10.190
Broadcast: 192.168.10.191
Dept C: 192.168.10.192/27 (next available: 192)
Range: 192.168.10.192 - 192.168.10.223
Usable: 192.168.10.193 - 192.168.10.222
Broadcast: 192.168.10.223
Link 1: 192.168.10.224/30 (next available: 224)
Range: 192.168.10.224 - 192.168.10.227
Usable: 192.168.10.225 - 192.168.10.226
Broadcast: 192.168.10.227
Link 2: 192.168.10.228/30
Range: 192.168.10.228 - 192.168.10.231
Usable: 192.168.10.229 - 192.168.10.230
Link 3: 192.168.10.232/30
Range: 192.168.10.232 - 192.168.10.235
Usable: 192.168.10.233 - 192.168.10.234
Remaining: 192.168.10.236 - 192.168.10.255 (available for growth)
Quick host calculation:
Hosts needed → Prefix length
2 hosts → /30 (or /31 for point-to-point)
6 hosts → /29
14 hosts → /28
30 hosts → /27
62 hosts → /26
126 hosts → /25
254 hosts → /24
Magic number method (for /25-/30):
1. Find the interesting octet (where subnet mask isn't 0 or 255)
2. Magic number = 256 - subnet mask value
3. Subnets increment by magic number
Example: /26 (255.255.255.192)
Magic number = 256 - 192 = 64
Subnets: 0, 64, 128, 192
Example: /27 (255.255.255.224)
Magic number = 256 - 224 = 32
Subnets: 0, 32, 64, 96, 128, 160, 192, 224
Determine if IP is in subnet:
Question: Is 192.168.1.75 in 192.168.1.64/26?
Method 1: Calculate range
192.168.1.64/26 → 192.168.1.64 - 192.168.1.127
75 is between 64 and 127 → YES
Method 2: Binary AND
192.168.1.75 = 11000000.10101000.00000001.01001011
255.255.255.192 = 11111111.11111111.11111111.11000000 (/26 mask)
Result = 11000000.10101000.00000001.01000000 = 192.168.1.64
Matches network address → YES
Find broadcast address:
Given: 192.168.1.64/26
Method: Network address + (total addresses - 1)
192.168.1.64 + 63 = 192.168.1.127
Or: Next network - 1
Next network = 192.168.1.128
Broadcast = 192.168.1.128 - 1 = 192.168.1.127
Definition: Combine multiple contiguous networks into a single, larger network.
Why: Reduces routing table size, improves routing efficiency.
Example 1: Simple aggregation
Given: Four consecutive /24 networks
192.168.0.0/24
192.168.1.0/24
192.168.2.0/24
192.168.3.0/24
Can summarize as: 192.168.0.0/22
How to calculate:
1. Convert to binary:
192.168.0.0 = 11000000.10101000.00000000.00000000
192.168.1.0 = 11000000.10101000.00000001.00000000
192.168.2.0 = 11000000.10101000.00000010.00000000
192.168.3.0 = 11000000.10101000.00000011.00000000
2. Find common prefix (left to right):
First 22 bits are identical → /22
3. Summary: 192.168.0.0/22
Covers: 192.168.0.0 - 192.168.3.255
Example 2: Non-power-of-2
Can we aggregate?
10.1.0.0/24
10.1.1.0/24
10.1.2.0/24
Binary:
10.1.0.0 = 00001010.00000001.00000000.00000000
10.1.1.0 = 00001010.00000001.00000001.00000000
10.1.2.0 = 00001010.00000001.00000010.00000000
Common bits: 23 bits
But 10.1.0.0/23 also includes 10.1.3.0/24 (which we don't have)
Solution:
10.1.0.0/23 (covers 10.1.0.0 + 10.1.1.0)
10.1.2.0/24 (separate)
Or accept over-summarization: 10.1.0.0/23 (if 10.1.3.0 not used elsewhere)
Point-to-Point Links (Router to Router):
Use /30 (2 usable IPs) or /31 (RFC 3021)
Example: 10.0.0.0/30
10.0.0.0 - Network (can't use on /30)
10.0.0.1 - Router A
10.0.0.2 - Router B
10.0.0.3 - Broadcast (can't use on /30)
Or /31: 10.0.0.0/31
10.0.0.0 - Router A (no network address on /31)
10.0.0.1 - Router B (no broadcast on /31)
Loopback Interfaces:
Use /32 (single host)
Example: 10.1.1.1/32 (host-only route)
DMZ (Demilitarized Zone):
Small subnet for public-facing servers
Example: 192.168.100.0/28 (14 hosts)
Web server: 192.168.100.1
Mail server: 192.168.100.2
DNS server: 192.168.100.3
Office Departments:
Size based on employee count + growth
Engineering (200 people): 10.10.1.0/24 (254 hosts)
Sales (50 people): 10.10.2.0/26 (62 hosts)
Admin (20 people): 10.10.2.64/27 (30 hosts)
Why it matters: Efficient subnetting is critical for network design. Proper CIDR/VLSM reduces waste and simplifies routing. ISPs use route aggregation to keep global routing tables manageable.
10.0.0.0/8 (10.0.0.0 - 10.255.255.255) - 16.7M addresses
172.16.0.0/12 (172.16.0.0 - 172.31.255.255) - 1M addresses
192.168.0.0/16 (192.168.0.0 - 192.168.255.255) - 65K addresses
Why private addresses exist: IPv4 only has 4.3 billion addresses. Private addresses + NAT allow unlimited internal devices to share a few public IPs.
How NAT works:
Why NAT matters: Solved IPv4 exhaustion crisis. Downside: breaks end-to-end connectivity, complicates peer-to-peer apps, gaming, VoIP (requires NAT traversal techniques like STUN).
| Characteristic | Distance Vector (RIP, EIGRP) | Link State (OSPF, IS-IS) |
|---|---|---|
| Information Shared | Distance & direction to networks ("I can reach 10.1.0.0 in 3 hops via Router B") | Complete network topology (every router's links and costs) |
| Algorithm | Bellman-Ford | Dijkstra's Shortest Path |
| What Routers Know | Next hop & distance, not full path | Full topology map, calculates best path |
| Convergence | Slower (routing by rumor) | Faster (everyone has same database) |
| CPU/Memory | Lower | Higher (stores entire topology) |
| Scalability | Limited (RIP max 15 hops) | High (with areas/hierarchy) |
| Count to Infinity Problem | Yes (mitigated by split horizon, poison reverse) | No |
Distance Vector: Simpler, less overhead. Good for small networks. RIP was invented in 1988 when routers had minimal CPU/memory.
Link State: Better for large networks. OSPF (1989) was designed as routers became more powerful and networks grew larger. Faster convergence critical for enterprise/ISP networks.
How RIP works:
Count to Infinity Problem:
Network: A -- B -- C
C has network 10.1.0.0
C goes down
B's route to 10.1.0.0 expires (timeout)
A still thinks B can reach 10.1.0.0 (hasn't updated yet)
A advertises to B: "I can reach 10.1.0.0 in 2 hops"
B believes it, updates: "I can reach 10.1.0.0 in 3 hops via A"
Loop! Counts up to 16, then declares unreachable
Mitigation: Split horizon, poison reverse, hold-down timers
Why RIP is obsolete: 15-hop limit, slow convergence (minutes), inefficient (broadcasts full table every 30 sec). OSPF and EIGRP replaced it.
How OSPF works:
OSPF Areas:
Area 0 (Backbone)
├─ Area 1
├─ Area 2
└─ Area 3
Why areas? Scalability. Instead of every router knowing about every link in the entire network, routers only know details about their area. ABRs (Area Border Routers) summarize between areas.
OSPF Metrics: Cost = 100 Mbps / Interface Bandwidth
Serial 1.544 Mbps (T1): Cost = 64
Ethernet 10 Mbps: Cost = 10
Fast Ethernet 100 Mbps: Cost = 1
Gigabit 1000 Mbps: Cost = 1 (need to adjust reference bandwidth)
Why OSPF is used: Fast convergence (subsecond), no hop limit, efficient updates (only changes are flooded), supports VLSM/CIDR. Industry standard for enterprise networks.
Cisco's hybrid protocol (distance vector with link-state features):
Why EIGRP matters: Combines best of distance vector (simplicity) and link state (fast convergence). Proprietary to Cisco until 2013 (now open standard). Still widely deployed in Cisco-heavy environments.
Interior vs Exterior Routing:
Why BGP is path-vector, not distance-vector: Can't just use "shortest path." Need to implement routing policies, prevent loops across autonomous systems, and allow political/business decisions to override technical optimality.
BGP Session Establishment:
TCP port 179 (BGP runs over TCP for reliability)
Exchange OPEN messages (AS number, BGP version, hold time)
Exchange full routing table (all known prefixes)
Send incremental updates as changes occur
Send KEEPALIVE every 60 seconds (hold time 180 sec)
| Attribute | Purpose | Example |
|---|---|---|
| AS_PATH | List of ASes the route traversed. Loop prevention & path selection. | 65001 65002 65003 (originated in AS 65003) |
| NEXT_HOP | IP address of next router | 192.0.2.1 |
| LOCAL_PREF | Preference within AS (higher = better) | 200 (prefer customer routes over peers) |
| MED | Multi-Exit Discriminator. Suggest which entry point neighbor should use | 100 |
| ORIGIN | How route was learned (IGP > EGP > Incomplete) | IGP |
| COMMUNITY | Tags for policy application | NO_EXPORT (don't advertise to eBGP peers) |
BGP chooses best path using these criteria (in order):
Why this order matters: Business policy (LOCAL_PREF) trumps technical metrics. ISPs prefer sending traffic out the cheapest link, not the fastest.
Why iBGP full mesh? iBGP routers don't re-advertise routes learned from other iBGP peers (loop prevention). So every iBGP router must peer with every other. N routers = N(N-1)/2 sessions. Doesn't scale - use route reflectors.
Instead of full mesh:
RR (Route Reflector)
├─ Client 1
├─ Client 2
├─ Client 3
└─ Client 4
RR reflects routes learned from one client to other clients
Reduces sessions from N² to N
Scenario: A router or link is unstable - goes up, down, up, down (circuit flapping). Each change triggers BGP updates across the entire internet. In the 1990s, a single flapping route could cause global instability - routers spending all CPU processing updates, unable to forward traffic.
How Route Dampening Works:
Example:
T=0: Route flaps, penalty = 1000
T=1: Flaps again, penalty = 2000 → SUPPRESSED
T=5: No flaps, penalty decays to ~1700
T=15: No flaps, penalty decays to ~1000 (half-life)
T=30: Penalty decays to 500 → Un-suppressed (below 750 threshold)
If route continues flapping while suppressed, penalty keeps growing, extends suppression time.
Why it matters: Route dampening prevents unstable routes from destabilizing the entire internet. Trade-off: Legitimate route changes might be delayed. Modern dampening is more conservative than 1990s implementations.
Definition: Time from network change (link down, new route) until all routers have consistent routing information.
Convergence Components:
Typical Convergence Times:
Techniques to Improve Convergence:
Why convergence matters: During convergence, traffic is black-holed (routers have inconsistent routing tables, send packets in loops or to down interfaces). Fast convergence = less downtime.
Preamble (7 bytes) | SFD (1 byte) | Dest MAC (6) | Source MAC (6) | Type/Len (2) | Data (46-1500) | FCS (4) Preamble: 10101010... (clock synchronization) SFD (Start Frame Delimiter): 10101011 (frame starting) Type/Length: 0x0800 = IPv4, 0x0806 = ARP, 0x86DD = IPv6 FCS (Frame Check Sequence): CRC-32 for error detection
MAC Address Format: 48 bits = 12 hex digits (e.g., 00:1A:2B:3C:4D:5E)
Why CRCs exist: Detect errors introduced during transmission (electrical noise, interference, bad cables).
How CRC works:
Why not just checksum? CRCs catch more errors - all single-bit errors, all double-bit errors, any odd number of errors, most burst errors up to polynomial degree length.
CRC-32 Polynomial: x³² + x²⁶ + x²³ + ... + x + 1
Interview question: "Why doesn't Ethernet retransmit on CRC error?"
Answer: Ethernet is Layer 2 - best-effort delivery only. TCP (Layer 4) handles retransmissions end-to-end. Separating concerns allows Ethernet to be simple and fast.
Why VLANs exist:
802.1Q VLAN Tagging:
Original Ethernet Header: Dest MAC | Source MAC | Type | Data | FCS 802.1Q Tagged: Dest MAC | Source MAC | 0x8100 | VLAN Tag (12 bits) + Priority (3 bits) | Type | Data | FCS VLAN IDs: 1-4094 (VLAN 1 = default, 4095 reserved)
Trunk vs Access Ports:
How it works: Switch receives frame on access port, tags it with VLAN ID, forwards on trunk. Receiving switch checks tag, only forwards to ports in same VLAN. Removes tag before sending to access port.
Scenario: Three switches in a triangle, all connected. PC sends broadcast. Each switch floods broadcast out all ports. Frame goes in a loop forever, exponentially multiplying. Network melts down in seconds (broadcast storm).
Why loops exist: Redundancy. If one link fails, want alternate path. But Ethernet has no TTL (unlike IP) - loops are catastrophic.
How STP Works:
STP Port States:
Blocking (0 sec): Discards frames, listens to BPDUs
Listening (15 sec): Builds topology, no forwarding
Learning (15 sec): Learns MAC addresses, no forwarding
Forwarding: Normal operation
Disabled: Administratively down
Convergence time: ~50 seconds (too slow for modern networks)
RSTP (Rapid STP): Converges in < 6 seconds using proposal/agreement mechanism.
Why STP matters: Prevents catastrophic loops while maintaining redundancy. Every enterprise network uses it. Understanding STP is crucial for troubleshooting "why is my network slow?" (answer: often spanning tree reconvergence).
| Layer | Mechanism | Coverage | Action on Error |
|---|---|---|---|
| Layer 2 | CRC-32 (Ethernet FCS) | Single frame | Discard silently |
| Layer 3 | IP Header Checksum | IP header only | Discard, send ICMP error |
| Layer 4 | TCP/UDP Checksum | Header + data | TCP: discard, retransmit. UDP: discard (optional) |
| Application | MD5, SHA hashes | Entire file/message | Application-specific |
Why multiple layers? Defense in depth. Each layer catches different error types:
MTU (Maximum Transmission Unit): Largest packet size a link can carry
IP Fragmentation:
Large packet (3000 bytes) encounters link with MTU 1500:
Router fragments into:
Fragment 1: 1480 bytes data + 20 byte IP header (offset=0, MF=1)
Fragment 2: 1480 bytes data + 20 byte IP header (offset=1480, MF=1)
Fragment 3: 40 bytes data + 20 byte IP header (offset=2960, MF=0)
Reassembled at destination.
Problems with fragmentation:
Path MTU Discovery:
Why it matters: Fragmentation can cause mysterious performance problems. "Ping works but SSH doesn't" often means PMTUD is broken (ICMP blocked).