When to Use What

A Decision Guide for Data Structures, Algorithms & System Design Patterns

Data Structures

Hash Table / HashMap

IF YOU SEE: Need O(1) lookup by key, frequency counting, deduplication, grouping by key, two-sum style problems, caching key-value pairs
USE: Hash Table / HashMap / Dictionary
WHY: O(1) average insert, lookup, delete. Perfect for key-based access when you don't need ordering.
TRADE-OFF: No ordering, O(n) worst case with collisions, more memory than arrays
EXAMPLES: LRU cache, symbol tables, database indexes (hash index), counting word frequency, detecting duplicates

Array / Dynamic Array

IF YOU SEE: Need O(1) access by index, sequential iteration, cache-friendly processing, storing ordered elements, sliding window problems
USE: Array / ArrayList / Vector
WHY: O(1) random access, excellent cache locality, simple and memory-efficient
TRADE-OFF: O(n) insert/delete in middle, fixed size (or amortized resize cost)
EXAMPLES: Pixel buffers, lookup tables, ring buffers, sliding window algorithms

Linked List

IF YOU SEE: Frequent insertions/deletions at arbitrary positions, need to maintain insertion order with O(1) removal, implementing LRU cache (with hash map)
USE: Doubly Linked List
WHY: O(1) insert/delete when you have a pointer to the node, no shifting of elements
TRADE-OFF: O(n) search, poor cache locality, extra memory for pointers
EXAMPLES: LRU cache (HashMap + DLL), undo history, music playlist navigation

Stack (LIFO)

IF YOU SEE: Matching parentheses, undo/redo, DFS traversal, expression evaluation, backtracking, "most recent" access pattern, nested structures
USE: Stack
WHY: O(1) push/pop, naturally handles nested/recursive structures, last-in-first-out
TRADE-OFF: Only access top element, O(n) to search
EXAMPLES: Function call stack, browser back button, parsing expressions, valid parentheses check

Queue (FIFO)

IF YOU SEE: BFS traversal, task scheduling, maintaining order of arrival, producer-consumer pattern, level-order tree traversal
USE: Queue / Deque
WHY: O(1) enqueue/dequeue, first-in-first-out guarantees fairness
TRADE-OFF: Only access front/back, O(n) to search
EXAMPLES: BFS, job queues (Celery, SQS), print queue, request buffering

Heap / Priority Queue

IF YOU SEE: "K largest/smallest", scheduling by priority, median finding, merge K sorted lists, Dijkstra's algorithm, "top K" problems
USE: Min-Heap or Max-Heap (Priority Queue)
WHY: O(log n) insert, O(1) peek min/max, O(log n) extract min/max. Efficient for maintaining extremes.
TRADE-OFF: O(n) search for arbitrary element, not sorted (only root is min/max)
EXAMPLES: Task scheduler, Dijkstra's shortest path, find K closest points, running median

Binary Search Tree (BST) / Balanced Tree

IF YOU SEE: Need sorted order + fast lookup, range queries, predecessor/successor queries, sorted iteration
USE: Balanced BST (Red-Black Tree, AVL Tree) or TreeMap/TreeSet
WHY: O(log n) insert/search/delete + maintains sorted order. Supports range queries.
TRADE-OFF: Slower than hash table for simple lookups, more complex implementation
EXAMPLES: Database indexes (B-Tree), in-memory sorted collections, interval trees

B-Tree / B+ Tree

IF YOU SEE: Disk-based storage, database indexes, need to minimize disk I/O, range scans on large datasets
USE: B-Tree or B+ Tree
WHY: High branching factor minimizes tree height → fewer disk reads. Leaf linking enables efficient range scans.
TRADE-OFF: More complex than BST, overhead for small datasets
EXAMPLES: PostgreSQL/MySQL indexes, filesystem directories (ext4, NTFS), key-value stores

Trie (Prefix Tree)

IF YOU SEE: Autocomplete, prefix matching, spell checking, IP routing (longest prefix match), word search in dictionary
USE: Trie
WHY: O(k) lookup where k = key length. Shared prefixes save space. Natural for prefix queries.
TRADE-OFF: High memory overhead (many pointers), only useful for string/sequence data
EXAMPLES: Search autocomplete, DNS lookup, phone directory, word games

Graph (Adjacency List)

IF YOU SEE: Relationships between entities, networks, paths/routes, dependencies, social connections, sparse connections
USE: Adjacency List
WHY: O(V + E) space, efficient for sparse graphs (most real-world graphs), fast neighbor iteration
TRADE-OFF: O(degree) to check if edge exists, slower than matrix for dense graphs
EXAMPLES: Social networks, road maps, web crawling, dependency resolution

Bloom Filter

IF YOU SEE: "Probably exists" checks, avoiding expensive lookups, membership testing at scale, can tolerate false positives but NOT false negatives
USE: Bloom Filter
WHY: O(k) lookup, extremely space-efficient (~10 bits per element for 1% false positive). "No" is definitive.
TRADE-OFF: False positives possible, cannot delete elements, cannot retrieve stored values
EXAMPLES: Database query optimization (avoid disk reads), malicious URL checking, CDN cache checks, spell checkers

Skip List

IF YOU SEE: Need sorted data with simpler implementation than balanced tree, concurrent access, probabilistic balance
USE: Skip List
WHY: O(log n) average operations, simpler than red-black trees, easy to make lock-free
TRADE-OFF: Probabilistic (not guaranteed O(log n)), more space than BST
EXAMPLES: Redis sorted sets, LevelDB/RocksDB memtables, concurrent data structures

Data Structure Quick Reference

Problem Pattern Data Structure Key Operations
O(1) key lookup Hash Table get, put, delete
Sorted data + range queries BST / TreeMap floor, ceiling, range
Min/max tracking Heap insert, extractMin/Max
LIFO / nested structures Stack push, pop, peek
FIFO / level-by-level Queue enqueue, dequeue
Prefix matching Trie insert, startsWith
Relationships / paths Graph addEdge, neighbors, BFS/DFS
Membership at scale Bloom Filter add, mightContain
Disk-based indexes B-Tree search, insert, range
O(1) insert + O(1) remove HashMap + DLL (LRU) get, put, evict

Algorithms

Binary Search

IF YOU SEE: Sorted array, "find position", "find first/last", monotonic function, "minimum/maximum that satisfies condition"
USE: Binary Search
WHY: O(log n) - eliminates half the search space each iteration
TRADE-OFF: Requires sorted data or monotonic property
EXAMPLES: Search in sorted array, find insert position, search rotated array, find peak element, capacity scheduling

Two Pointers

IF YOU SEE: Sorted array, find pairs summing to target, remove duplicates, partition array, palindrome check, container with most water
USE: Two Pointers (start/end or slow/fast)
WHY: O(n) single pass, O(1) space, exploits sorted property or positional relationships
TRADE-OFF: Usually requires sorted input or specific structure
EXAMPLES: Two sum (sorted), 3sum, remove duplicates, linked list cycle detection, merge sorted arrays

Sliding Window

IF YOU SEE: Contiguous subarray/substring, "maximum/minimum of size K", "longest substring with condition", stream processing
USE: Sliding Window (fixed or variable size)
WHY: O(n) - each element enters and exits window once, avoids recomputation
TRADE-OFF: Only works for contiguous sequences
EXAMPLES: Max sum subarray of size K, longest substring without repeating chars, minimum window substring

BFS (Breadth-First Search)

IF YOU SEE: Shortest path (unweighted), level-order traversal, "minimum steps", nearest neighbor, spreading/infection simulation
USE: BFS with Queue
WHY: Explores level by level, guarantees shortest path in unweighted graphs
TRADE-OFF: O(V + E) time, O(V) space for queue (can be large for wide graphs)
EXAMPLES: Shortest path in maze, word ladder, binary tree level order, rotten oranges, social network degrees

DFS (Depth-First Search)

IF YOU SEE: Explore all paths, detect cycles, topological sort, connected components, backtracking, tree traversals (in/pre/post-order)
USE: DFS with Stack/Recursion
WHY: Natural for exhaustive search, uses O(height) stack space, simple recursive implementation
TRADE-OFF: Doesn't find shortest path, can stack overflow on deep graphs
EXAMPLES: Detect cycle, topological sort, number of islands, path sum, generate permutations

Dijkstra's Algorithm

IF YOU SEE: Shortest path with non-negative weights, single-source shortest path, network routing
USE: Dijkstra's Algorithm with Min-Heap
WHY: O((V + E) log V) with heap, optimal for graphs with non-negative weights
TRADE-OFF: Doesn't work with negative weights (use Bellman-Ford instead)
EXAMPLES: GPS navigation, network routing (OSPF), cheapest flights

Dynamic Programming

IF YOU SEE: Optimal substructure + overlapping subproblems, "minimum/maximum cost", "number of ways", "is it possible", sequence alignment
USE: Dynamic Programming (top-down memoization or bottom-up tabulation)
WHY: Avoids recomputation of subproblems, reduces exponential to polynomial time
TRADE-OFF: Extra space for memoization table, need to identify state and transitions
EXAMPLES: Fibonacci, knapsack, longest common subsequence, edit distance, coin change

Topological Sort

IF YOU SEE: Dependencies, prerequisites, build order, task scheduling with constraints, course schedule
USE: Topological Sort (Kahn's BFS or DFS)
WHY: O(V + E), produces valid ordering respecting dependencies
TRADE-OFF: Only works on DAGs (directed acyclic graphs)
EXAMPLES: Build systems (Make, Gradle), course scheduling, package dependencies

Union-Find (Disjoint Set)

IF YOU SEE: "Are X and Y connected?", grouping/clustering, connected components, cycle detection in undirected graph, Kruskal's MST
USE: Union-Find with path compression + union by rank
WHY: Nearly O(1) per operation (amortized), efficient for dynamic connectivity
TRADE-OFF: Can't efficiently split groups, only tracks connectivity (not paths)
EXAMPLES: Number of connected components, accounts merge, redundant connection

Greedy

IF YOU SEE: Local optimal choice leads to global optimal, interval scheduling, Huffman coding, "minimum number of X to cover Y"
USE: Greedy Algorithm
WHY: Simple, often O(n log n), makes locally optimal choices
TRADE-OFF: Must prove greedy choice property, doesn't always give optimal solution
EXAMPLES: Activity selection, Huffman encoding, minimum platforms, jump game

Backtracking

IF YOU SEE: Generate all combinations/permutations, constraint satisfaction (Sudoku, N-Queens), "find all solutions", decision tree exploration
USE: Backtracking (DFS + pruning)
WHY: Systematically explores solution space, prunes invalid branches early
TRADE-OFF: Can be exponential, effectiveness depends on pruning quality
EXAMPLES: N-Queens, Sudoku solver, generate parentheses, word search, subsets

Algorithm Selection Quick Reference

Problem Type Algorithm Time Complexity
Search in sorted data Binary Search O(log n)
Shortest path (unweighted) BFS O(V + E)
Shortest path (weighted, non-negative) Dijkstra O((V+E) log V)
Shortest path (negative weights) Bellman-Ford O(V × E)
All-pairs shortest path Floyd-Warshall O(V³)
Minimum spanning tree Kruskal / Prim O(E log E)
Contiguous subarray Sliding Window O(n)
Optimal substructure Dynamic Programming Varies (polynomial)
Dependencies/ordering Topological Sort O(V + E)
Dynamic connectivity Union-Find O(α(n)) ≈ O(1)
All combinations/permutations Backtracking O(2ⁿ) or O(n!)

Rate Limiting

Token Bucket

IF YOU SEE: Need to allow bursts, smooth out traffic, API rate limiting with burst capacity, network traffic shaping
USE: Token Bucket Algorithm
WHY: Allows temporary bursts up to bucket capacity while maintaining average rate. Tokens refill at constant rate.
TRADE-OFF: Allows bursts (might not want this), need to tune bucket size and refill rate
EXAMPLES: AWS API Gateway, network traffic shaping, Stripe API rate limiting
Bucket: [●●●●●○○○○○]  capacity=10, tokens=5
Request → take token → [●●●●○○○○○○]
Refill every 100ms → [●●●●●○○○○○]
Burst of 5 requests → [○○○○○○○○○○] (allowed!)
Next request → DENIED (no tokens)

Leaky Bucket

IF YOU SEE: Need constant output rate, smoothing bursty input, network congestion control, queue-based processing
USE: Leaky Bucket Algorithm
WHY: Processes requests at constant rate regardless of input rate. Smooths traffic perfectly.
TRADE-OFF: Doesn't allow bursts, requests may wait in queue, can drop requests if queue full
EXAMPLES: Network traffic shaping, video streaming bitrate control
Input (bursty): ●●●●●...........●●●●●
Leaky Bucket:   ●.●.●.●.●.●.●.●.●.●.●
Output:         Constant rate processing

Fixed Window Counter

IF YOU SEE: Simple rate limiting, "X requests per minute/hour", don't need precision at window boundaries
USE: Fixed Window Counter
WHY: Simple to implement, O(1) space per user, easy to understand
TRADE-OFF: Boundary problem - 2x burst possible at window edges (100 at :59, 100 at :00)
EXAMPLES: Basic API rate limiting, simple request counters
Window: [00:00 - 01:00] limit=100
Requests at 00:59: 100 (allowed)
Requests at 01:00: 100 (allowed - new window!)
= 200 requests in 2 seconds (boundary burst)

Sliding Window Log

IF YOU SEE: Need precise rate limiting, no boundary bursts, exact "last N minutes" counting
USE: Sliding Window Log (store timestamps)
WHY: Most accurate - stores timestamp of each request, counts requests in exact sliding window
TRADE-OFF: O(n) space per user (stores all timestamps), expensive for high-volume
EXAMPLES: High-precision rate limiting, security-critical limits

Sliding Window Counter

IF YOU SEE: Need smooth rate limiting without boundary bursts, memory-efficient, good approximation
USE: Sliding Window Counter (weighted average of windows)
WHY: Combines fixed window simplicity with sliding window smoothness. O(1) space.
TRADE-OFF: Approximation (not exact), slightly more complex than fixed window
EXAMPLES: Most production rate limiters (Redis-based), Cloudflare
Current window: 70% through
Previous window count: 40
Current window count: 30
Weighted: 40 × 0.3 + 30 = 42 requests in sliding window

Rate Limiting Selection

Requirement Algorithm Memory
Allow bursts, smooth average Token Bucket O(1)
Constant output rate Leaky Bucket O(queue size)
Simple, approximate Fixed Window O(1)
Precise, no bursts Sliding Window Log O(requests)
Smooth, memory-efficient Sliding Window Counter O(1)

Caching Strategies

Cache-Aside (Lazy Loading)

IF YOU SEE: Read-heavy workload, can tolerate cache miss latency, data that's expensive to compute/fetch
USE: Cache-Aside Pattern
WHY: Only caches data that's actually requested, application controls caching logic
TRADE-OFF: Cache miss is slow (DB read + cache write), potential stale data, cache stampede risk
EXAMPLES: User profile caching, product catalog, session storage
Read:
1. Check cache → HIT? Return
2. MISS? Read from DB
3. Write to cache
4. Return data

Write-Through

IF YOU SEE: Need strong consistency, can't tolerate stale reads, write latency acceptable
USE: Write-Through Cache
WHY: Cache always consistent with database, no stale reads
TRADE-OFF: Higher write latency (write to both cache + DB), may cache unused data
EXAMPLES: User settings, shopping cart, inventory counts
Write:
1. Write to cache
2. Cache writes to DB (synchronously)
3. Return success

Write-Behind (Write-Back)

IF YOU SEE: Write-heavy workload, can tolerate eventual consistency, need low write latency
USE: Write-Behind Cache
WHY: Very fast writes (only to cache), batches DB writes, reduces DB load
TRADE-OFF: Risk of data loss if cache fails before DB write, complexity, eventual consistency
EXAMPLES: Gaming leaderboards, analytics counters, activity feeds
Write:
1. Write to cache → Return immediately
2. Background: Batch writes to DB periodically

Read-Through

IF YOU SEE: Want cache to manage DB reads, simpler application code, read-heavy
USE: Read-Through Cache
WHY: Cache handles DB reads transparently, application only talks to cache
TRADE-OFF: First request always slow, cache must understand data model
EXAMPLES: ORM-level caching, CDN origin fetching

LRU (Least Recently Used)

IF YOU SEE: Limited cache size, recent items more likely to be accessed, general-purpose eviction
USE: LRU Eviction Policy
WHY: Keeps frequently accessed items, good for most workloads, O(1) with HashMap + DLL
TRADE-OFF: Can evict items about to be reused (scan resistance), doesn't consider frequency
EXAMPLES: Redis (approximated LRU), Memcached, CPU caches, browser cache

LFU (Least Frequently Used)

IF YOU SEE: Some items much more popular than others, want to keep hot items, frequency matters more than recency
USE: LFU Eviction Policy
WHY: Keeps popular items in cache, better hit rate when popularity varies significantly
TRADE-OFF: Slow to adapt to changing popularity, more complex than LRU
EXAMPLES: CDN caching for viral content, DNS caching

TTL (Time-To-Live)

IF YOU SEE: Data has known staleness tolerance, need automatic expiration, time-based invalidation
USE: TTL-based Expiration
WHY: Simple, automatic invalidation, bounds staleness
TRADE-OFF: May evict still-valid data, need to choose TTL carefully
EXAMPLES: Session tokens, API responses, DNS records

Caching Strategy Selection

Workload Strategy Consistency
Read-heavy, occasional writes Cache-Aside Eventual
Must be consistent Write-Through Strong
Write-heavy, latency-sensitive Write-Behind Eventual
General eviction LRU N/A
Hot/cold data LFU N/A
Known staleness tolerance TTL Time-bounded

Database Selection

Relational Database (PostgreSQL, MySQL)

IF YOU SEE: ACID transactions needed, complex queries with JOINs, structured data with relationships, data integrity critical
USE: Relational Database (PostgreSQL, MySQL)
WHY: ACID guarantees, powerful SQL queries, mature tooling, enforced schema
TRADE-OFF: Harder to scale horizontally, schema changes can be painful, vertical scaling limits
EXAMPLES: Financial systems, e-commerce orders, user accounts, inventory management

Document Database (MongoDB, CouchDB)

IF YOU SEE: Flexible/evolving schema, document-oriented data, rapid development, denormalized data
USE: Document Database (MongoDB)
WHY: Schema flexibility, horizontal scaling, JSON-native, fast development iteration
TRADE-OFF: No JOINs (denormalize), eventual consistency (often), data duplication
EXAMPLES: Content management, user profiles, product catalogs, event logging

Key-Value Store (Redis, DynamoDB)

IF YOU SEE: Simple key → value access, extreme performance needed, caching, session storage, real-time data
USE: Key-Value Store (Redis, DynamoDB)
WHY: O(1) operations, sub-millisecond latency (Redis), massive horizontal scale (DynamoDB)
TRADE-OFF: No complex queries, limited data modeling, no relationships
EXAMPLES: Session storage, caching, rate limiting counters, leaderboards

Wide-Column Store (Cassandra, HBase)

IF YOU SEE: Write-heavy workload, time-series data, high availability priority, known query patterns, massive scale
USE: Wide-Column Store (Cassandra, HBase)
WHY: Optimized for writes, horizontal scaling, tunable consistency, great for time-series
TRADE-OFF: Limited query flexibility, must design for access patterns, eventual consistency
EXAMPLES: IoT sensor data, messaging apps (Discord), analytics, activity logs

Graph Database (Neo4j, Neptune)

IF YOU SEE: Highly connected data, relationship queries, traversals, "friends of friends", recommendation engines
USE: Graph Database (Neo4j)
WHY: Native graph storage, efficient traversals, intuitive for relationship-heavy domains
TRADE-OFF: Not good for non-graph queries, scaling can be complex, smaller ecosystem
EXAMPLES: Social networks, fraud detection, knowledge graphs, recommendation engines

Search Engine (Elasticsearch)

IF YOU SEE: Full-text search, fuzzy matching, faceted search, log aggregation, relevance scoring
USE: Search Engine (Elasticsearch, Solr)
WHY: Inverted index for fast text search, relevance ranking, aggregations, near real-time
TRADE-OFF: Not a primary database, eventual consistency, complex to operate
EXAMPLES: Product search, log analysis (ELK stack), autocomplete

Time-Series Database (InfluxDB, TimescaleDB)

IF YOU SEE: Metrics, monitoring data, IoT sensors, timestamps are primary key, downsampling needed
USE: Time-Series Database
WHY: Optimized for time-based writes and queries, automatic downsampling, compression
TRADE-OFF: Limited for non-time-series queries, specialized use case
EXAMPLES: Application metrics, infrastructure monitoring, stock prices, IoT telemetry

Database Selection Quick Reference

Use Case Database Type Examples
Transactions, complex queries Relational PostgreSQL, MySQL
Flexible schema, documents Document MongoDB, CouchDB
Caching, sessions Key-Value Redis, Memcached
Massive scale, simple access Key-Value (managed) DynamoDB, Cassandra
Time-series, IoT Wide-Column / TSDB Cassandra, InfluxDB
Relationships, traversals Graph Neo4j, Neptune
Full-text search Search Engine Elasticsearch, Solr

Messaging & Queues

Message Queue (RabbitMQ, SQS)

IF YOU SEE: Task distribution, work queues, one consumer per message, decoupling services, job processing
USE: Message Queue (RabbitMQ, SQS)
WHY: Reliable delivery, message deleted after consumption, good for task distribution
TRADE-OFF: Messages consumed once, no replay, limited retention
EXAMPLES: Email sending, image processing, order fulfillment, background jobs

Event Streaming (Kafka)

IF YOU SEE: Event sourcing, multiple consumers need same events, replay needed, high throughput, log aggregation
USE: Event Streaming (Kafka, Kinesis)
WHY: Persistent log, multiple consumer groups, replay from any offset, massive throughput
TRADE-OFF: More complex to operate, consumers manage their offsets
EXAMPLES: Event sourcing, activity streams, log aggregation, real-time analytics, CDC

Pub/Sub (Redis, Google Pub/Sub)

IF YOU SEE: Broadcast to multiple subscribers, real-time notifications, fan-out pattern, fire-and-forget
USE: Pub/Sub System
WHY: Simple pub/sub semantics, broadcast to all subscribers, real-time
TRADE-OFF: No persistence (messages lost if no subscriber), at-most-once delivery
EXAMPLES: Chat room broadcasts, live updates, cache invalidation

Messaging Selection

Pattern System Delivery Replay
Work queue (one consumer) RabbitMQ, SQS At-least-once No
Event log (multi-consumer) Kafka, Kinesis At-least-once Yes
Broadcast (fan-out) Redis Pub/Sub At-most-once No
Managed, serverless SQS, SNS, Pub/Sub Configurable Limited

Distributed Systems Patterns

Consistent Hashing

IF YOU SEE: Distributing data across nodes, need to add/remove nodes without reshuffling all data, load balancing
USE: Consistent Hashing
WHY: Only K/n keys move when adding/removing nodes (vs ALL keys with modulo hashing)
TRADE-OFF: More complex than modulo, need virtual nodes for even distribution
EXAMPLES: Distributed caches (Memcached), CDNs, database sharding, load balancers

Sharding

IF YOU SEE: Database too large for single server, need to scale writes, data can be partitioned by key
USE: Horizontal Sharding
WHY: Linear write scalability, each shard handles subset of data
TRADE-OFF: Cross-shard queries expensive, rebalancing complex, application complexity
EXAMPLES: Shard by user_id, tenant_id, geographic region

Leader Election

IF YOU SEE: Need single coordinator, distributed lock, scheduled job runner, avoiding split-brain
USE: Leader Election (Raft, Zookeeper, etcd)
WHY: Ensures single leader for coordination, handles leader failure
TRADE-OFF: Leader is bottleneck, failover takes time, complexity
EXAMPLES: Database primary election, job scheduler, distributed locks

Circuit Breaker

IF YOU SEE: Cascading failures, failing service calls, need to fail fast, protect downstream services
USE: Circuit Breaker Pattern
WHY: Stops calling failing service, allows recovery time, fails fast
TRADE-OFF: Need to tune thresholds, may reject valid requests during half-open state
EXAMPLES: Hystrix, Resilience4j, service mesh circuit breaking
States:
CLOSED → (failures > threshold) → OPEN
OPEN → (timeout) → HALF-OPEN
HALF-OPEN → (success) → CLOSED
HALF-OPEN → (failure) → OPEN

Saga Pattern

IF YOU SEE: Distributed transactions across services, need compensation on failure, long-running transactions
USE: Saga Pattern (choreography or orchestration)
WHY: Maintains consistency without distributed locks, each step has compensating action
TRADE-OFF: Eventually consistent, complex failure handling, need idempotent operations
EXAMPLES: Order → Payment → Inventory → Shipping (rollback on failure)

CQRS (Command Query Responsibility Segregation)

IF YOU SEE: Different read/write patterns, need to optimize reads and writes separately, complex queries on different model
USE: CQRS Pattern
WHY: Optimize read and write models independently, scale reads/writes separately
TRADE-OFF: Eventual consistency between models, more complex architecture
EXAMPLES: E-commerce (write orders, read product views), reporting systems

Event Sourcing

IF YOU SEE: Need complete audit trail, time-travel queries, rebuild state from events, domain-driven design
USE: Event Sourcing
WHY: Complete history, can replay events, natural audit log
TRADE-OFF: Eventual consistency, event schema evolution, higher storage
EXAMPLES: Banking transactions, version control, shopping cart history

B-Tree Index

IF YOU SEE: Range queries, ordered access, equality + comparison operators, most database queries
USE: B-Tree Index (default in most databases)
WHY: Efficient for both equality and range queries, sorted order
TRADE-OFF: Slower than hash for pure equality, write overhead to maintain
EXAMPLES: WHERE date BETWEEN x AND y, ORDER BY, WHERE status = 'active'

Hash Index

IF YOU SEE: Only equality queries, no range queries needed, exact key lookup
USE: Hash Index
WHY: O(1) lookup for equality, slightly faster than B-Tree for exact matches
TRADE-OFF: Cannot do range queries, ORDER BY, or partial matching
EXAMPLES: WHERE id = 123, primary key lookups

Inverted Index

IF YOU SEE: Full-text search, "find documents containing word X", text search
USE: Inverted Index (Elasticsearch, full-text search)
WHY: Maps terms → documents, enables fast text search
TRADE-OFF: Storage overhead, index maintenance, eventual consistency
EXAMPLES: Search engines, document search, log analysis

Bitmap Index

IF YOU SEE: Low cardinality columns, analytics queries, many AND/OR conditions
USE: Bitmap Index
WHY: Efficient for low cardinality, fast bitwise operations for complex conditions
TRADE-OFF: Poor for high cardinality, write-heavy workloads
EXAMPLES: WHERE status IN ('A', 'B') AND region = 'US', data warehouses

GIN Index (Generalized Inverted)

IF YOU SEE: JSONB queries, array contains, full-text search in PostgreSQL
USE: GIN Index
WHY: Handles composite values (arrays, JSONB), efficient containment queries
TRADE-OFF: Slower writes, larger index size
EXAMPLES: WHERE tags @> '{"featured"}', WHERE data->>'type' = 'x'

Real-Time Communication

Polling

IF YOU SEE: Simple requirements, infrequent updates, can tolerate delay, legacy systems
USE: Polling (Short Polling)
WHY: Simple to implement, works everywhere, no special server support
TRADE-OFF: Wasteful (many empty responses), higher latency, server load
EXAMPLES: Checking for email, simple status updates

Long Polling

IF YOU SEE: Need lower latency than polling, can't use WebSockets, moderate real-time needs
USE: Long Polling
WHY: Server holds request until data available, lower latency than polling
TRADE-OFF: Ties up connections, more complex than polling, still one-way initiated
EXAMPLES: Chat applications (legacy), notifications

Server-Sent Events (SSE)

IF YOU SEE: Server → client only, live feeds, notifications, simple real-time updates
USE: Server-Sent Events (SSE)
WHY: Simple, auto-reconnect, event IDs for resumption, HTTP-based
TRADE-OFF: One-way only (server to client), limited browser connections per domain
EXAMPLES: Live scores, stock tickers, social media feeds, notifications

WebSockets

IF YOU SEE: Bidirectional real-time, chat, gaming, collaborative editing, low latency critical
USE: WebSockets
WHY: Full-duplex, low overhead after handshake, true real-time
TRADE-OFF: More complex, need sticky sessions or pub/sub for scaling, firewall issues
EXAMPLES: Chat apps, multiplayer games, collaborative docs, live trading

Real-Time Selection

Requirement Technology Direction
Simple, tolerates delay Polling Client → Server
Lower latency, fallback Long Polling Client → Server
Server push, simple SSE Server → Client
Bidirectional, low latency WebSockets Both directions

Consistency Models

Strong Consistency

IF YOU SEE: Financial transactions, inventory counts, can't tolerate stale reads, read-after-write must work
USE: Strong Consistency (synchronous replication, consensus)
WHY: All reads see latest write, linearizable operations
TRADE-OFF: Higher latency, reduced availability during partitions (CAP theorem)
EXAMPLES: Bank transfers, ticket booking, primary database

Eventual Consistency

IF YOU SEE: High availability priority, can tolerate temporary staleness, global distribution
USE: Eventual Consistency
WHY: Higher availability, lower latency, better for geo-distribution
TRADE-OFF: May read stale data, conflict resolution needed, harder to reason about
EXAMPLES: Social media likes, DNS, shopping cart, user preferences

Causal Consistency

IF YOU SEE: Need to preserve cause-effect ordering, comments after posts, replies in order
USE: Causal Consistency (vector clocks, logical timestamps)
WHY: Preserves causality without full strong consistency overhead
TRADE-OFF: More complex than eventual, less available than eventual
EXAMPLES: Comment threads, collaborative editing, messaging

Read-Your-Writes Consistency

IF YOU SEE: User must see their own updates immediately, profile updates, settings changes
USE: Read-Your-Writes (session consistency, read from leader)
WHY: User sees their changes, acceptable for others to see eventual state
TRADE-OFF: May need sticky sessions, read from leader increases load
EXAMPLES: Profile updates, post submission, settings changes

Consistency Selection

Requirement Model CAP Trade-off
Must see latest data Strong (CP) Sacrifice availability
High availability priority Eventual (AP) Sacrifice consistency
Preserve causality Causal Middle ground
User sees own writes Read-Your-Writes Per-user consistency

Quick Decision Flowchart

Data Structure Selection

Need O(1) lookup by key? → Hash Table
Need sorted data? → BST / TreeMap
Need min/max quickly? → Heap
Need LIFO? → Stack
Need FIFO? → Queue
Need prefix matching? → Trie
Need relationships? → Graph
Need membership test at scale? → Bloom Filter
            

Database Selection

Need ACID transactions? → RDBMS (PostgreSQL)
Need flexible schema? → Document DB (MongoDB)
Need extreme speed, simple access? → Key-Value (Redis)
Need massive write throughput? → Wide-Column (Cassandra)
Need relationship queries? → Graph DB (Neo4j)
Need full-text search? → Search Engine (Elasticsearch)
            

Caching Strategy

Read-heavy, can tolerate miss? → Cache-Aside
Must be consistent? → Write-Through
Write-heavy, eventual OK? → Write-Behind
General eviction? → LRU
Hot/cold data? → LFU
            

Rate Limiting

Allow bursts? → Token Bucket
Constant rate output? → Leaky Bucket
Simple, approximate? → Fixed Window
Smooth, memory-efficient? → Sliding Window Counter
            

Real-Time Communication

Server → Client only? → SSE
Bidirectional needed? → WebSockets
Simple, tolerates delay? → Polling
Fallback for WebSocket? → Long Polling