ISP Services & Internet Infrastructure

How Critical Internet Services Actually Work

The ISP Services You Built (Late 90s/Early 2000s)

Your Experience: Working at ISPs during the dialup-to-broadband transition, you configured the infrastructure that connected millions to the internet. These services haven't fundamentally changed - they've just scaled massively.

What This Guide Covers:

DNS: Domain Name System - The Internet's Phonebook

Why DNS Was Invented (1983)

The Problem

Pre-DNS (1970s-1983): HOSTS.TXT file maintained by SRI-NIC (Stanford Research Institute). Every computer downloaded this file to resolve names.

Problems:

  • Single point of failure (SRI server)
  • Doesn't scale (file grew too large, downloads every night)
  • No namespace management (name conflicts)
  • Manual updates (email changes to SRI admin)

Paul Mockapetris' Solution (1983): Hierarchical, distributed database. No single server knows everything. Delegation enables scalability.

How DNS Works - The Complete Flow

sequenceDiagram participant User as Browser participant Resolver as ISP DNS Resolver participant Root as Root Server (.) participant TLD as TLD Server (.com) participant Auth as Authoritative (example.com) User->>Resolver: Resolve www.example.com Note over Resolver: Check cache (miss) Resolver->>Root: Query www.example.com? Root->>Resolver: Ask .com server (192.5.6.30) Resolver->>TLD: Query www.example.com? TLD->>Resolver: Ask example.com NS (ns1.example.com, 93.184.216.34) Resolver->>Auth: Query www.example.com? Auth->>Resolver: A record: 93.184.216.34, TTL=3600 Resolver->>User: 93.184.216.34 (cache for 1 hour)

Step-by-Step Explanation:

  1. User types www.example.com: Browser asks OS resolver (or configured DNS like 8.8.8.8)
  2. Recursive resolver: Your ISP's DNS server (or public DNS). Does the heavy lifting.
  3. Query Root Server: 13 root servers (a.root-servers.net through m.root-servers.net, actually hundreds of anycast instances). Root doesn't know example.com, but knows who handles .com
  4. Query TLD Server: .com nameservers (managed by Verisign). Don't know example.com, but know its authoritative nameservers
  5. Query Authoritative NS: example.com's nameservers (might be AWS Route53, Cloudflare, or self-hosted). Has the actual record
  6. Return Answer: Resolver caches result (TTL=3600 = 1 hour), returns to user

DNS Record Types

Record Type Purpose Example
A IPv4 address example.com → 93.184.216.34
AAAA IPv6 address example.com → 2606:2800:220:1:248:1893:25c8:1946
CNAME Canonical name (alias) www.example.com → example.com
MX Mail exchanger example.com → mail.example.com (priority 10)
NS Nameserver example.com → ns1.example.com
TXT Arbitrary text (SPF, DKIM, verification) "v=spf1 include:_spf.google.com ~all"
PTR Reverse DNS (IP → name) 34.216.184.93.in-addr.arpa → example.com
SOA Start of Authority (zone metadata) Primary NS, admin email, serial, refresh times
SRV Service location _sip._tcp.example.com → sipserver.example.com:5060

DNS Caching & TTL

TTL (Time To Live) = How long to cache a record

Example: example.com A record, TTL=3600 (1 hour)

  T=0: First query, resolver asks authoritative, caches result
  T=30min: Second query, resolver returns cached result (fast!)
  T=70min: TTL expired, resolver queries authoritative again

Short TTL (60-300 sec): For records that change often (CDNs, failover)
Long TTL (86400 = 1 day): For stable records (reduces query load)

Trade-off: Short TTL = more queries but faster updates
           Long TTL = fewer queries but slow propagation of changes

DNS Protocol

UDP Port 53 (queries/responses)
TCP Port 53 (zone transfers, responses > 512 bytes)

Why UDP? Fast for small queries. Single request/response.
Why TCP fallback? Large responses (DNSSEC, many records), zone transfers (AXFR).

DNS Message Format:
  Header: ID, flags (query/response, recursion desired, authoritative)
  Question: What are you asking? (www.example.com, type A)
  Answer: The actual records
  Authority: NS records for the domain
  Additional: "Glue records" (A records for NS servers)

Glue Records - Solving the Chicken-and-Egg Problem

Problem:
  example.com NS → ns1.example.com
  To resolve example.com, need ns1.example.com's IP
  But ns1.example.com is IN example.com (circular dependency!)

Solution: Glue Records
  Parent (.com server) includes A record for ns1.example.com
  in "Additional" section when returning NS records

  Query: example.com
  Answer from .com server:
    Authority: ns1.example.com
    Additional: ns1.example.com → 93.184.216.1 (glue record)

DNS Security Issues & DNSSEC

DNS Vulnerabilities:

DNSSEC (DNS Security Extensions):

Modern DNS: DoH & DoT

DHCP: Dynamic Host Configuration Protocol

Why DHCP Was Invented (1993)

The Problem: Manually configuring IP, subnet mask, gateway, DNS on every device doesn't scale. Dialup/broadband ISPs needed to assign IPs dynamically to thousands of users.

Predecessor: BOOTP (Bootstrap Protocol) - simpler, no automatic reclamation of addresses.

How DHCP Works: DORA Process

sequenceDiagram participant Client participant Server as DHCP Server Note over Client,Server: DORA Process Client->>Server: DISCOVER (broadcast: who has IPs?) Server->>Client: OFFER (I have 192.168.1.100 for you) Client->>Server: REQUEST (I want 192.168.1.100) Server->>Client: ACK (OK, it's yours for 24 hours) Note over Client: Configured! IP, Gateway, DNS, etc.

Step-by-Step:

  1. DISCOVER: Client broadcasts (255.255.255.255) "I need an IP!" Uses UDP port 67 (server), 68 (client)
  2. OFFER: DHCP server(s) respond with available IP and config
  3. REQUEST: Client chooses one offer (if multiple servers), broadcasts acceptance
  4. ACK: Server confirms, client configures interface

DHCP Options - More Than Just IP

Option 1: Subnet Mask (255.255.255.0)
Option 3: Default Gateway (192.168.1.1)
Option 6: DNS Servers (8.8.8.8, 8.8.4.4)
Option 15: Domain Name (example.com)
Option 42: NTP Servers
Option 66: TFTP Server (for phone/cable modem config)
Option 150: Cisco TFTP Server
Option 51: Lease Time (86400 seconds = 24 hours)

You used Option 66 for cable modem head-ends!
  Modem boots, DHCP gives IP + TFTP server
  Modem downloads config file from TFTP
  Modem registers with CMTS

DHCP Relay Agent

Problem: DHCP uses broadcast, doesn't cross routers

Solution: DHCP Relay (ip helper-address)
  Client broadcasts DISCOVER
  Router receives it, converts to unicast
  Router forwards to DHCP server on different subnet
  Server's response relayed back to client

Why it matters: One central DHCP server can serve many subnets

Lease Management

Lease Time: How long client can use IP (typical: 24 hours - 7 days)

T1 (50% of lease): Client tries to renew with original server
T2 (87.5% of lease): If no response, client broadcasts renewal to any server
Lease expires: Client must stop using IP, restart DORA

Why leases matter: Reclaims IPs from devices that left network
  Without leases, IP pool exhaustion (especially dialup era)

Email: SMTP, POP3, IMAP

How Email Works - The Complete Journey

sequenceDiagram participant Sender as alice@company.com participant SendMTA as company.com MTA participant DNS participant RecvMTA as gmail.com MTA participant Mailbox as Gmail Mailbox participant Recipient as bob@gmail.com Sender->>SendMTA: SMTP (port 587): Send email to bob@gmail.com SendMTA->>DNS: MX lookup for gmail.com DNS->>SendMTA: MX: gmail-smtp-in.l.google.com (priority 5) SendMTA->>RecvMTA: SMTP (port 25): Deliver email RecvMTA->>Mailbox: Store in bob's mailbox Recipient->>Mailbox: POP3/IMAP: Retrieve email Mailbox->>Recipient: Email delivered

SMTP: Simple Mail Transfer Protocol

How SMTP Works:

Sender → Sending MTA (Mail Transfer Agent) → Receiving MTA → Mailbox

SMTP Conversation:
  Client: EHLO company.com
  Server: 250-gmail-smtp-in.l.google.com
  Server: 250-SIZE 35882577
  Server: 250 STARTTLS

  Client: MAIL FROM:
  Server: 250 OK

  Client: RCPT TO:
  Server: 250 OK

  Client: DATA
  Server: 354 Start mail input

  Client: From: alice@company.com
  Client: To: bob@gmail.com
  Client: Subject: Meeting tomorrow
  Client:
  Client: Hi Bob, let's meet at 10am.
  Client: .
  Server: 250 OK Message accepted

  Client: QUIT
  Server: 221 Bye

SMTP Ports:

MX Records & Mail Routing

MX Record: Specifies mail server for domain

$ dig gmail.com MX
gmail.com.  3600  IN  MX  5 gmail-smtp-in.l.google.com.
gmail.com.  3600  IN  MX  10 alt1.gmail-smtp-in.l.google.com.
gmail.com.  3600  IN  MX  20 alt2.gmail-smtp-in.l.google.com.

Lower priority number = higher priority
Try 5 first, if down, try 10, then 20 (fallback/redundancy)

SPF, DKIM, DMARC - Fighting Spam & Spoofing

SPF (Sender Policy Framework): TXT record listing authorized sending IPs

example.com TXT "v=spf1 ip4:192.0.2.0/24 include:_spf.google.com ~all"

Meaning: Emails from example.com should come from:
  - 192.0.2.0/24
  - Google's servers (G Suite/Workspace)
  - ~all = softfail (suspicious but don't reject)

DKIM (DomainKeys Identified Mail): Cryptographic signature in email headers

Sending server signs email with private key
Receiving server verifies with public key (in DNS TXT record)
Proves email hasn't been tampered with

DMARC (Domain-based Message Authentication): Policy for SPF/DKIM failures

example.com TXT "v=DMARC1; p=reject; rua=mailto:dmarc@example.com"

p=reject: Reject emails that fail SPF and DKIM
p=quarantine: Mark as spam
p=none: Just monitor (rua = aggregate reports)

POP3 vs IMAP

Feature POP3 (Port 110/995) IMAP (Port 143/993)
Email Storage Downloads to client, deletes from server (default) Stays on server, synced to clients
Multiple Devices Poor (email on one device only) Excellent (sync across all devices)
Folders Local only Server-side folders, synced
Offline Access Yes (email is local) Depends on client caching
Bandwidth Downloads entire mailbox Downloads headers first, body on demand
Server Storage Minimal (client stores email) High (server stores all email)

Modern Usage: IMAP dominates (Gmail, Outlook, etc.). POP3 mostly obsolete except for legacy systems.

FTP & TFTP: File Transfer

FTP: File Transfer Protocol

How FTP Works:

Active FTP (Original, Problematic)

Control: Client → Server port 21
Data: Server → Client (server initiates!)

Problem: Firewalls block incoming connections to clients
  Client behind NAT/firewall can't receive server's data connection

Passive FTP (Modern, Firewall-Friendly)

Control: Client → Server port 21
Client: PASV command
Server: Responds with IP:Port (e.g., 192.0.2.1:51234)
Data: Client → Server port 51234 (client initiates)

Why it matters: Works through firewalls/NAT (client initiates both connections)

FTP Security Issues

TFTP: Trivial File Transfer Protocol

Why TFTP Exists:

How TFTP Works

UDP Port 69

Read Request (RRQ): Client requests file
Data: Server sends 512-byte blocks
ACK: Client acknowledges each block
Last block < 512 bytes signals end

No authentication, no encryption
Used for: PXE boot, network device configs (routers, switches, cable modems)

Cable Modem Head-End Workflow (Your Experience):

1. Cable modem boots, no config
2. DHCP: Modem gets IP, gateway, DNS, TFTP server (Option 66)
3. TFTP: Modem downloads config file from TFTP server
   - Contains: upload/download speeds, QoS settings, etc.
4. Registration: Modem registers with CMTS (Cable Modem Termination System)
5. Online: Modem ready for customer use

TFTP perfect for this: Simple, fast, doesn't require complex TCP stack in modem firmware

LDAP & Active Directory

LDAP: Lightweight Directory Access Protocol

Why LDAP Exists:

LDAP Structure (DIT: Directory Information Tree)

dc=example,dc=com (root)
  ├─ ou=people
  │   ├─ cn=John Doe,ou=people,dc=example,dc=com
  │   └─ cn=Jane Smith,ou=people,dc=example,dc=com
  └─ ou=groups
      ├─ cn=engineers,ou=groups,dc=example,dc=com
      └─ cn=sales,ou=groups,dc=example,dc=com

Components:
  dc = domain component (example.com → dc=example,dc=com)
  ou = organizational unit (departments, containers)
  cn = common name (users, groups)
  dn = distinguished name (full path to object)

LDAP Operations

Active Directory - Microsoft's LDAP Implementation

AD = LDAP + Kerberos + DNS + SMB + Group Policy

Why AD Dominates Enterprise: Integrates authentication, authorization, and configuration management. Single pane of glass for IT admins.

Caching Techniques

Why Caching Matters

The Problem: Databases, APIs, and origin servers are slow and expensive. Serving every request from source doesn't scale.

The Solution: Cache frequently accessed data closer to users. Trade-off: Freshness vs Performance.

Caching Hierarchy

Layer Location Latency Use Case
Browser Cache Client ~1ms Static assets (CSS, JS, images)
CDN (Edge Cache) Globally distributed 10-50ms Static content, streaming video
Reverse Proxy (Varnish, Nginx) In front of app servers 1-5ms Full page cache, API responses
Application Cache (Redis, Memcached) Same datacenter as app 1-10ms Session data, query results
Database Query Cache Database server 10-100ms Repeated queries

Cache Strategies

1. Cache-Aside (Lazy Loading):

Application checks cache first:
  Hit: Return cached data
  Miss: Query database, store in cache, return data

Best for: Read-heavy workloads, data that changes infrequently

Trade-off: First request always hits database (cold cache)

2. Write-Through:

Write to cache AND database simultaneously:
  Application writes data
  → Write to cache
  → Write to database
  → Return success

Best for: Data that's read immediately after write

Trade-off: Write latency (waiting for both cache and DB)

3. Write-Behind (Write-Back):

Write to cache immediately, database later:
  Application writes data
  → Write to cache (fast!)
  → Async job writes to database later
  → Return success

Best for: Write-heavy workloads, acceptable data loss risk

Trade-off: Data loss if cache crashes before DB write

4. Refresh-Ahead:

Proactively refresh cache before expiration:
  Cache entry has TTL
  Before expiration, background job refreshes from DB
  Avoids cache miss latency

Best for: Predictable access patterns, expensive queries

Trade-off: Wastes resources refreshing unused data

When to Use Which Technology

Technology Best For Not Good For
Redis Session storage, pub/sub, leaderboards, real-time analytics. Supports complex data structures (lists, sets, sorted sets). Large objects (> 1MB), durable storage (primarily in-memory)
Memcached Simple key-value cache, multi-threaded (better CPU utilization), lower memory overhead Complex data structures, persistence, pub/sub
CDN (Cloudflare, AWS CloudFront) Static assets, images, videos, API responses (with proper cache headers), global users User-specific data (unless you use edge computing), real-time data
Varnish HTTP reverse proxy cache, full page cache, handling traffic spikes Complex application logic, user-specific content (without ESI)
Browser Cache Immutable assets (versioned CSS/JS), rarely-changing content Dynamic content, personalized data

Cache Invalidation - The Hard Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton

Strategies:

HTTP Cache Headers

Cache-Control: max-age=3600, public
  Browser/CDN can cache for 1 hour, shareable between users

Cache-Control: max-age=3600, private
  Only browser can cache (user-specific data)

Cache-Control: no-store
  Don't cache at all (sensitive data)

ETag: "abc123"
  Content hash. Browser sends If-None-Match, server returns 304 Not Modified if unchanged

Rate Limiting

Why Rate Limiting Exists

The Problem:

Rate Limiting Algorithms

1. Token Bucket (Most Common)

Bucket holds N tokens, refills at rate R tokens/second
Each request consumes 1 token
If tokens available: Allow request, decrement counter
If tokens = 0: Reject request (429 Too Many Requests)

Example: 100 tokens, refill 10/sec
  → Allows bursts of 100 requests
  → Sustained rate of 10 requests/sec

Implementation (Redis):
  INCR user:123:requests
  EXPIRE user:123:requests 60  (reset every minute)
  If counter > limit: Reject

2. Leaky Bucket

Requests enter bucket (queue) at any rate
Requests "leak" out at constant rate

Smooths traffic, no bursts allowed
Good for: QoS, traffic shaping
Bad for: Legitimate bursts (e.g., page load)

3. Fixed Window

Count requests in fixed time window (e.g., per minute)

Example: 100 requests per minute
  00:00 - 00:59 → 100 requests allowed
  01:00 - 01:59 → Counter resets

Problem: Burst at window boundary
  00:59 → 100 requests
  01:00 → 100 requests (200 in 1 second!)

Simplest to implement but least fair

4. Sliding Window

Track requests with timestamps, count within rolling window

Example: 100 requests per minute
  At 01:30, count requests from 00:30 - 01:30

More accurate than fixed window
More expensive (store timestamps, not just counter)

When to Use Which Algorithm

Algorithm Best For Trade-offs
Token Bucket APIs, web services (allows bursts) Most common, good balance
Leaky Bucket Traffic shaping, QoS, video streaming No bursts, can queue requests
Fixed Window Simple rate limiting, low precision OK Burst at boundaries, easy to implement
Sliding Window High-precision rate limiting More memory/CPU, better accuracy

Implementation Technologies

Redis (Recommended):

# Token bucket with Redis
SCRIPT:
  local key = KEYS[1]
  local limit = tonumber(ARGV[1])
  local window = tonumber(ARGV[2])

  local current = redis.call('INCR', key)
  if current == 1 then
    redis.call('EXPIRE', key, window)
  end

  if current > limit then
    return 0  -- Rate limited
  else
    return 1  -- Allowed
  end

# Sliding window with sorted sets
ZADD user:123:requests  
ZREMRANGEBYSCORE user:123:requests 0 
ZCARD user:123:requests  (if > limit: reject)

Application-Level (In-Memory):

API Gateway (Kong, AWS API Gateway):

Best Practices

Firewalls & Access Control

Types of Firewalls

1. Packet Filtering Firewall (Layer 3/4)

How it works: Examines IP header (source/dest IP) and TCP/UDP header (source/dest port). Allows or denies based on rules.

Example ACL (Access Control List):
  Rule 1: Allow TCP from 10.0.0.0/8 to any port 80 (HTTP)
  Rule 2: Allow TCP from any to 192.168.1.100 port 443 (HTTPS to web server)
  Rule 3: Allow UDP from any to 8.8.8.8 port 53 (DNS)
  Rule 4: Deny all

Pros: Fast, low overhead
Cons: No application awareness, easy to spoof source IP

2. Stateful Firewall

How it works: Tracks connection state (TCP handshake, established connections). Automatically allows reply traffic.

Connection Table:
  Source IP:Port | Dest IP:Port | State     | Timeout
  10.0.1.5:5000  | 93.184.2.1:80| ESTABLISHED | 3600
  10.0.1.6:5001  | 1.1.1.1:443  | SYN_SENT    | 60

Outbound SYN → Automatically allow SYN-ACK, ACK (return traffic)
Don't need explicit "allow inbound" rule for replies

Pros: More secure (tracks state), fewer rules needed
Cons: More memory/CPU (state table)

3. Application Layer Firewall (Layer 7)

How it works: Deep packet inspection (DPI). Understands HTTP, SMTP, FTP protocols. Can block based on URL, SQL injection patterns, etc.

Examples:
  Block HTTP requests with "SELECT * FROM" in URL (SQL injection)
  Block access to *.facebook.com
  Allow SMTP but block attachments > 10MB
  Inspect SSL/TLS traffic (decrypt, inspect, re-encrypt)

Pros: Blocks application-specific attacks
Cons: High CPU (decrypt, inspect), privacy concerns (TLS inspection)

4. Web Application Firewall (WAF)

Specialized for HTTP/HTTPS:

Examples: Cloudflare WAF, AWS WAF, ModSecurity

Access Control Lists (ACLs)

Standard ACL (Source IP Only):

access-list 10 permit 10.0.0.0 0.255.255.255
access-list 10 deny any

Applied to interface:
  interface GigabitEthernet0/0
  ip access-group 10 in

Extended ACL (Source/Dest IP, Ports, Protocol):

access-list 100 permit tcp 10.0.0.0 0.255.255.255 any eq 80
access-list 100 permit tcp 10.0.0.0 0.255.255.255 any eq 443
access-list 100 deny ip any any log

More granular control, can specify:
  - Protocol (TCP, UDP, ICMP)
  - Source/dest IP and subnet
  - Source/dest ports
  - Flags (SYN, ACK, etc.)

Firewall Best Practices

NAT vs Firewall

Common misconception: NAT provides security

Reality: NAT hides internal IPs but isn't a firewall. Once a connection is established (port forwarding, UPnP), NAT allows all traffic through. Need firewall rules for actual security.

SSL/TLS: Secure Communications

Why SSL/TLS Was Invented

The Problem (Early Internet - 1990s)

HTTP was plaintext: Anyone on the network path could intercept usernames, passwords, credit cards, emails - everything.

Evolution:

  • 1994: Netscape creates SSL 1.0 (never released due to security flaws)
  • 1995: SSL 2.0 released (flawed, quickly deprecated)
  • 1996: SSL 3.0 (widely adopted, but vulnerable to POODLE attack)
  • 1999: TLS 1.0 (upgrade of SSL 3.0, standardized by IETF)
  • 2006: TLS 1.1 (fixes CBC attacks)
  • 2008: TLS 1.2 (modern standard, SHA-256, better cipher suites)
  • 2018: TLS 1.3 (current standard, faster handshake, removed weak ciphers)

Today: TLS 1.2 and 1.3 are standard. SSL is deprecated but name stuck ("SSL certificate" really means TLS).

What SSL/TLS Provides

Security Goal How TLS Achieves It
Encryption Symmetric encryption (AES-256) for data transfer. Keys exchanged via asymmetric crypto (RSA, ECDHE).
Authentication Server proves identity with certificate signed by trusted CA (Certificate Authority).
Integrity HMAC (Hash-based Message Authentication Code) prevents tampering. Each message authenticated.

TLS Handshake (TLS 1.2) - How It Works

sequenceDiagram participant Client as Browser participant Server as Web Server Note over Client,Server: TLS 1.2 Handshake (4 round trips) Client->>Server: 1. ClientHello
(Supported cipher suites, TLS version, random) Server->>Client: 2. ServerHello
(Chosen cipher, server random, certificate) Note over Client: Verify certificate
(Check CA signature, expiry, hostname) Client->>Server: 3. ClientKeyExchange
(Pre-master secret, encrypted with server's public key) Note over Client,Server: Both derive session key from:
client random + server random + pre-master secret Client->>Server: 4. ChangeCipherSpec + Finished
(Switch to encrypted, verify handshake) Server->>Client: 5. ChangeCipherSpec + Finished Note over Client,Server: Encrypted Application Data Client->>Server: HTTP Request (encrypted) Server->>Client: HTTP Response (encrypted)

Step-by-Step Explanation:

  1. ClientHello: Browser says "I support TLS 1.2, TLS 1.3, cipher suites X, Y, Z" + sends random nonce
  2. ServerHello: Server picks TLS 1.2, cipher suite (e.g., TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384), sends certificate (contains server's public key) + random nonce
  3. Certificate Verification: Browser checks:
    • Is certificate signed by trusted CA? (Chain of trust)
    • Is certificate expired?
    • Does certificate match hostname? (example.com in cert vs URL)
    • Has certificate been revoked? (OCSP/CRL check)
  4. Key Exchange: Browser generates "pre-master secret", encrypts with server's public key (from certificate), sends to server. Only server's private key can decrypt it.
  5. Session Key Derivation: Both sides derive symmetric encryption key from: client random + server random + pre-master secret (same on both sides, but never sent over network!)
  6. Finished: Both send encrypted "Finished" message with hash of all handshake messages (proves nothing was tampered with)
  7. Encrypted Communication: All HTTP data now encrypted with AES-256 using session key

TLS 1.3 Handshake - Faster (1-RTT)

TLS 1.3 improvements:
  - 1 round trip instead of 2 (faster)
  - Removed weak ciphers (RC4, SHA-1, MD5)
  - Forward secrecy required (ECDHE)
  - 0-RTT resumption for returning clients (instant)

Handshake:
  Client → Server: ClientHello + KeyShare (send public key immediately)
  Server → Client: ServerHello + Certificate + KeyShare + Finished
  (Encrypted application data can start immediately)

Why it's faster: Client sends key material in first message (speculative)
instead of waiting for server's certificate first

Certificate Chain of Trust

How browsers trust certificates:

Root CA (e.g., DigiCert Global Root)
  ↓ Signs
Intermediate CA (e.g., DigiCert TLS RSA SHA256 2020 CA1)
  ↓ Signs
Leaf Certificate (www.example.com)

Browser has ~100 Root CAs built-in (hardcoded trust store)
Server sends: Leaf + Intermediate certificates
Browser verifies:
  1. Leaf signed by Intermediate? ✓
  2. Intermediate signed by Root? ✓
  3. Root in browser's trust store? ✓
  → Chain validated, connection trusted

Why intermediates? Root CA private keys are kept offline (air-gapped, HSMs). Compromising root = disaster (every cert issued by that root becomes untrustworthy). Intermediates handle day-to-day signing.

Certificate Components

X.509 Certificate contains:
  - Subject: CN=www.example.com (who owns it)
  - Issuer: CN=DigiCert TLS RSA SHA256 2020 CA1 (who signed it)
  - Public Key: RSA 2048-bit or ECDSA P-256
  - Validity: Not Before / Not After (expiry date)
  - Serial Number: Unique identifier
  - Signature: Issuer's signature (proves certificate hasn't been tampered)
  - SAN (Subject Alternative Names): www.example.com, example.com, api.example.com
  - Key Usage: Digital Signature, Key Encipherment
  - Extended Validation: Organization details (for EV certs, green bar in old browsers)

View certificate:
  $ openssl s_client -connect example.com:443 -showcerts
  $ openssl x509 -in cert.pem -text -noout

Certificate Validation - OCSP & CRL

Problem: What if a certificate is compromised before expiry?

Solution 1: CRL (Certificate Revocation List):

CA publishes list of revoked certificates (serial numbers)
Browser downloads CRL periodically
Problem: Lists get huge, slow to download, privacy leak (who you're connecting to)

Solution 2: OCSP (Online Certificate Status Protocol):

Browser asks CA: "Is certificate serial# XYZ still valid?"
CA responds: Good / Revoked / Unknown
Problem: Privacy leak (CA sees every site you visit), latency, OCSP server downtime

Solution 3: OCSP Stapling:

Server queries OCSP, caches signed response
Server "staples" OCSP response to TLS handshake
Browser validates stapled response (signed by CA, recent timestamp)
Benefits: No client→CA query (privacy!), faster, CA downtime doesn't break sites

Cipher Suites Explained

Example: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

Component Meaning
TLS Protocol
ECDHE Key Exchange: Elliptic Curve Diffie-Hellman Ephemeral (forward secrecy - session keys not recoverable even if private key compromised)
RSA Authentication: Server's certificate uses RSA signature
AES_256_GCM Encryption: AES 256-bit in Galois/Counter Mode (authenticated encryption)
SHA384 Hash: SHA-384 for PRF (Pseudo-Random Function) and HMAC

Weak ciphers to avoid: RC4, DES, 3DES, MD5, SHA-1, Export ciphers (512-bit keys)

Modern strong ciphers: AES-256-GCM, ChaCha20-Poly1305, ECDHE/DHE for forward secrecy

Common SSL/TLS Issues

Getting a Certificate

Free: Let's Encrypt (automated, 90-day certs, auto-renewal via certbot)

$ certbot certonly --webroot -w /var/www/html -d example.com -d www.example.com
Certificate saved: /etc/letsencrypt/live/example.com/fullchain.pem
Private key: /etc/letsencrypt/live/example.com/privkey.pem

Auto-renewal: certbot renew (cron job every 12 hours)

Paid: DigiCert, Sectigo, GlobalSign (EV certs, wildcard certs, support, insurance)

Testing TLS Configuration

JWT: JSON Web Tokens for API Authentication

Why JWT Was Invented

The Problem (Pre-2010s)

Session-based authentication: Server stores session data (user ID, roles, etc.) in memory or database. Client gets session ID cookie.

Issues with sessions in distributed systems:

  • State on server: Doesn't scale horizontally (sticky sessions or shared session store required)
  • Mobile apps: Cookies don't work well with native apps
  • Microservices: Every service needs access to session store (tight coupling)
  • Cross-domain: Sessions don't work across different domains (api.example.com vs app.example.com)

JWT Solution (RFC 7519, 2015): Self-contained tokens. All user info in token itself. Stateless - server doesn't store anything.

What is JWT?

JWT = JSON Web Token: A compact, URL-safe token format for securely transmitting information between parties. Digitally signed to prevent tampering.

JWT Structure: Three Parts (Header.Payload.Signature)

Example JWT:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Decoded:

Part 1: HEADER (Base64URL encoded)
{
  "alg": "HS256",    // Algorithm: HMAC-SHA256
  "typ": "JWT"       // Type: JWT
}

Part 2: PAYLOAD (Base64URL encoded)
{
  "sub": "1234567890",           // Subject (user ID)
  "name": "John Doe",            // Custom claim
  "email": "john@example.com",   // Custom claim
  "role": "admin",               // Custom claim
  "iat": 1516239022,             // Issued At (Unix timestamp)
  "exp": 1516242622              // Expiration (1 hour later)
}

Part 3: SIGNATURE
HMACSHA256(
  base64UrlEncode(header) + "." + base64UrlEncode(payload),
  secret  // Server's secret key
)

The signature proves:
  1. Token hasn't been tampered with
  2. Token was issued by someone who knows the secret

Why Signing Exists - The Core Concept

Critical Understanding: JWT is NOT Encrypted

Anyone can decode and read a JWT (it's just Base64, not encryption). The signature doesn't hide the contents - it prevents tampering.

What signing achieves:

  • Integrity: If attacker changes payload (e.g., change role: "user" to role: "admin"), signature becomes invalid. Server detects tampering.
  • Authentication: Only someone with the secret key can create valid signatures. Proves token was issued by your server, not an attacker.

What signing does NOT do:

  • ❌ Confidentiality: Payload is readable by anyone. Don't put passwords, SSNs, credit cards in JWT.
  • ✓ Solution: Use JWE (JSON Web Encryption) if you need encrypted tokens, or just don't put sensitive data in JWT.

Analogy: JWT signature is like a tamper-evident seal on a glass bottle. You can see what's inside (it's not hidden), but if someone opens it and changes the contents, the seal breaks.

How JWT Signing Works

Symmetric Signing (HS256 - HMAC with SHA-256):

Server has secret key: "my-super-secret-key-12345"

Creating JWT:
1. Create header: {"alg":"HS256","typ":"JWT"}
2. Create payload: {"sub":"123","role":"admin","exp":1700000000}
3. Encode both as Base64URL
4. Compute signature:
   signature = HMAC-SHA256(header + "." + payload, secret)
5. Concatenate: header.payload.signature

Verifying JWT:
1. Split token into header, payload, signature
2. Recompute signature using header + payload + secret
3. Compare recomputed signature with token's signature
4. If match: Token is valid ✓
5. If different: Token was tampered ✗

Only someone with the secret can create valid signatures.

Asymmetric Signing (RS256 - RSA with SHA-256):

Server has:
  - Private key (signs tokens, kept secret)
  - Public key (verifies tokens, can be shared)

Creating JWT:
signature = RSA-Sign(header + "." + payload, privateKey)

Verifying JWT:
valid = RSA-Verify(header + "." + payload, signature, publicKey)

Advantage: API servers can verify tokens without knowing signing key
  Auth server: Signs with private key
  API servers: Verify with public key (can't create tokens, only verify)

Use case: Microservices - only auth service has private key

Standard JWT Claims (Payload)

Claim Meaning Example
iss Issuer (who created token) "https://auth.example.com"
sub Subject (user ID) "user123"
aud Audience (who should accept token) "https://api.example.com"
exp Expiration (Unix timestamp) 1700000000 (Nov 14, 2023)
nbf Not Before (token not valid until) 1699999000
iat Issued At (when token created) 1699999000
jti JWT ID (unique identifier) "abc-123-def"

Custom claims: Add anything you need (role, permissions, email, etc.)

Webserver to Backend API Authentication with JWT

The Complete Flow

Architecture: Frontend (React/Vue) → Backend API (Node/Python/Go)

sequenceDiagram participant User as Browser participant Frontend as Web Server
(Frontend App) participant Auth as Auth API
(Login Service) participant API as Backend API
(Protected Resource) Note over User,API: 1. Login Flow User->>Frontend: Navigate to /login Frontend->>User: Display login form User->>Frontend: Submit credentials
(username, password) Frontend->>Auth: POST /api/auth/login
{username, password} Note over Auth: Verify credentials
(check database) Auth->>Auth: Generate JWT
(sign with secret) Auth->>Frontend: 200 OK
{token: "eyJhbG...", user: {...}} Frontend->>Frontend: Store token
(localStorage or httpOnly cookie) Frontend->>User: Redirect to dashboard Note over User,API: 2. Accessing Protected API User->>Frontend: Click "View Profile" Frontend->>API: GET /api/user/profile
Authorization: Bearer eyJhbG... Note over API: Extract token from header Note over API: Verify signature
(using secret key) Note over API: Check expiration Note over API: Extract user ID from payload API->>API: Fetch user data from DB API->>Frontend: 200 OK
{user: {...}} Frontend->>User: Display profile Note over User,API: 3. Token Expired User->>Frontend: Request after 1 hour Frontend->>API: GET /api/data
Authorization: Bearer eyJhbG... Note over API: Verify token
(exp claim expired) API->>Frontend: 401 Unauthorized
{error: "Token expired"} Frontend->>Auth: POST /api/auth/refresh
{refreshToken} Auth->>Frontend: 200 OK
{token: "new-token"} Frontend->>API: Retry GET /api/data
(with new token) API->>Frontend: 200 OK
{data: [...]}

Implementation Example

1. Login Endpoint (Auth Service)

# Python (Flask) - Auth Service
from flask import Flask, request, jsonify
import jwt
import datetime
from werkzeug.security import check_password_hash

app = Flask(__name__)
SECRET_KEY = "your-secret-key-keep-this-safe"  # Store in env var!

@app.route('/api/auth/login', methods=['POST'])
def login():
    data = request.get_json()
    username = data.get('username')
    password = data.get('password')

    # Verify credentials (pseudo-code)
    user = db.query("SELECT * FROM users WHERE username = ?", username)
    if not user or not check_password_hash(user.password_hash, password):
        return jsonify({"error": "Invalid credentials"}), 401

    # Generate JWT
    payload = {
        "sub": str(user.id),           # User ID
        "email": user.email,
        "role": user.role,             # "admin" or "user"
        "iat": datetime.datetime.utcnow(),
        "exp": datetime.datetime.utcnow() + datetime.timedelta(hours=1)
    }

    token = jwt.encode(payload, SECRET_KEY, algorithm="HS256")

    return jsonify({
        "token": token,
        "user": {
            "id": user.id,
            "email": user.email,
            "role": user.role
        }
    }), 200

2. Protected API Endpoint (Backend Service)

# Python (Flask) - Backend API
from flask import Flask, request, jsonify
from functools import wraps
import jwt

app = Flask(__name__)
SECRET_KEY = "your-secret-key-keep-this-safe"  # Same secret!

def require_jwt(f):
    """Decorator to protect routes with JWT"""
    @wraps(f)
    def decorated(*args, **kwargs):
        # Extract token from Authorization header
        auth_header = request.headers.get('Authorization')
        if not auth_header:
            return jsonify({"error": "Missing token"}), 401

        # Expected format: "Bearer eyJhbGciOiJ..."
        try:
            token = auth_header.split(" ")[1]  # Get token after "Bearer "
        except IndexError:
            return jsonify({"error": "Invalid token format"}), 401

        # Verify JWT
        try:
            payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])

            # Token is valid, attach user info to request
            request.user_id = payload['sub']
            request.user_role = payload.get('role')

        except jwt.ExpiredSignatureError:
            return jsonify({"error": "Token expired"}), 401
        except jwt.InvalidTokenError:
            return jsonify({"error": "Invalid token"}), 401

        return f(*args, **kwargs)

    return decorated

@app.route('/api/user/profile', methods=['GET'])
@require_jwt  # This route requires valid JWT
def get_profile():
    user_id = request.user_id  # From JWT payload

    # Fetch user from database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    return jsonify({
        "id": user.id,
        "email": user.email,
        "name": user.name,
        "role": user.role
    }), 200

@app.route('/api/admin/users', methods=['GET'])
@require_jwt
def get_all_users():
    # Check if user has admin role
    if request.user_role != 'admin':
        return jsonify({"error": "Forbidden - admin only"}), 403

    users = db.query("SELECT * FROM users")
    return jsonify({"users": users}), 200

3. Frontend Implementation (JavaScript)

// React/Vue/Vanilla JS - Frontend

// Login function
async function login(username, password) {
  const response = await fetch('https://auth.example.com/api/auth/login', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password })
  });

  if (response.ok) {
    const data = await response.json();

    // Store token (Option 1: localStorage)
    localStorage.setItem('token', data.token);

    // Store token (Option 2: httpOnly cookie - more secure)
    // Server sets: Set-Cookie: token=...; HttpOnly; Secure; SameSite=Strict

    return data;
  } else {
    throw new Error('Login failed');
  }
}

// Make authenticated API request
async function fetchUserProfile() {
  const token = localStorage.getItem('token');

  const response = await fetch('https://api.example.com/api/user/profile', {
    method: 'GET',
    headers: {
      'Authorization': `Bearer ${token}`  // Send JWT in header
    }
  });

  if (response.status === 401) {
    // Token expired or invalid, redirect to login
    window.location.href = '/login';
  }

  if (response.ok) {
    const profile = await response.json();
    return profile;
  }
}

// Axios interceptor (automatic token attachment)
axios.interceptors.request.use(config => {
  const token = localStorage.getItem('token');
  if (token) {
    config.headers.Authorization = `Bearer ${token}`;
  }
  return config;
});

// Handle 401 responses globally
axios.interceptors.response.use(
  response => response,
  error => {
    if (error.response?.status === 401) {
      localStorage.removeItem('token');
      window.location.href = '/login';
    }
    return Promise.reject(error);
  }
);

JWT Security Best Practices

Critical Security Considerations

1. Short Expiration Times

  • Access tokens: 15 minutes - 1 hour
  • Refresh tokens: 7 days - 30 days (stored securely, revocable)
  • Why: Stolen JWT can be used until expiration. Short expiry limits damage.

2. Never Store Sensitive Data in JWT

❌ BAD: {"password": "secret123", "ssn": "123-45-6789"}
✓ GOOD: {"sub": "user123", "role": "admin"}

JWT is Base64-encoded, not encrypted. Anyone can decode and read it.

3. Use Strong Secret Keys

❌ BAD: SECRET_KEY = "secret"
✓ GOOD: SECRET_KEY = "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6"

Use: openssl rand -base64 32
Store in environment variables, never hardcode

4. Validate Everything

  • Signature (always)
  • Expiration (exp claim)
  • Issuer (iss claim - prevent token from dev env used in prod)
  • Audience (aud claim - prevent token for API A used for API B)

5. Storage Location

Storage Pros Cons
localStorage Easy to use, persists across tabs Vulnerable to XSS (JavaScript can read it)
sessionStorage Cleared on tab close Still vulnerable to XSS
httpOnly Cookie Not accessible to JavaScript (XSS protection) Vulnerable to CSRF (need CSRF tokens), can't access from different domain
Memory (Redux/Vuex) Cleared on page refresh, XSS resistant User logs out on refresh (bad UX)

Recommendation: httpOnly cookie with SameSite=Strict for web apps. localStorage for mobile/SPA if you trust your XSS protection.

6. HTTPS Only

Always use HTTPS. JWT in HTTP = plaintext password.

7. Token Revocation

Problem: JWT is stateless, can't be revoked before expiry.

Solutions:

  • Short expiry + refresh tokens: Refresh token stored in DB, can be revoked
  • Blacklist: Store revoked token IDs (jti claim) in Redis, check on verify
  • Whitelist: Store active sessions in Redis (defeats stateless purpose, but gives control)

Refresh Token Flow

Why refresh tokens?
  Access token: Short-lived (15 min), sent with every request
  Refresh token: Long-lived (30 days), only sent to refresh endpoint

If access token stolen: Expires in 15 min (limited damage)
If refresh token stolen: Can revoke in database (kill all sessions)

Flow:
1. Login: Get access token (15 min) + refresh token (30 days)
2. API requests: Send access token
3. Access token expires: Frontend gets 401
4. Frontend sends refresh token to /api/auth/refresh
5. Server checks refresh token in database (not revoked?)
6. Server issues new access token
7. Frontend retries request with new access token

Refresh token storage: Database with user_id, token_hash, expires_at
Revoke on logout: DELETE FROM refresh_tokens WHERE user_id = ?

Debugging JWTs

📺 Video Resources

Recommended YouTube Channels & Videos

DNS & Internet Infrastructure

Email (SMTP, POP3, IMAP)

Caching Strategies

Rate Limiting

System Design & Distributed Systems