Arrays Stacks Queues Hash Tables Trees Heaps Suffix Trees Graphs Advanced Structures

Data Structures - Deep Dive

Interview Tip: For each data structure, understand: (1) Time/space complexity, (2) Real-world use cases, (3) When to use vs alternatives, (4) Implementation details. Be ready to explain trade-offs.

1. Arrays

The most fundamental data structure - contiguous block of memory storing elements of the same type.

Array in Memory: ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ 5 │ 2 │ 8 │ 1 │ 9 │ 3 │ 7 │ 4 │ └───┴───┴───┴───┴───┴───┴───┴───┘ 0 1 2 3 4 5 6 7 ← indices Base Address: 1000 Element Size: 4 bytes Address of arr[3] = 1000 + (3 × 4) = 1012

Key Properties

Random Access: O(1) - can access any element directly via index
Cache Friendly: Elements stored contiguously, excellent spatial locality
Fixed Size: Size determined at creation (in most languages)
Memory Efficient: No overhead beyond the data itself

Time Complexity:
Access: O(1) | Search: O(n) | Insert: O(n) | Delete: O(n)

Real-World Usage

// 1. Cache-friendly processing (video/image data)
void processPixels(int* pixels, int width, int height) {
    for (int i = 0; i < width * height; i++) {
        pixels[i] = transform(pixels[i]);  // Sequential access
    }
}

// 2. Lookup tables (fast constant-time mappings)
const char* statusMessages[5] = {
    "Success", "Error", "Pending", "Timeout", "Cancelled"
};
printf("%s", statusMessages[statusCode]);  // O(1) lookup

// 3. Dynamic arrays (ArrayList, vector)
// Automatically resize when full (amortized O(1) append)
vector<int> v;
v.push_back(42);  // Amortized O(1)
        

Dynamic Arrays: When array is full, allocate new array (typically 2x size), copy elements, free old array. This gives amortized O(1) append despite occasional O(n) resize.

2. Stacks (LIFO - Last In, First Out)

A collection where elements are added and removed from the same end (top).

Stack Operations: push(5) push(2) pop() ↓ ┌───────┐ ┌───────┐ ┌───────┐ │ 2 │ │ 2 │ │ │ ├───────┤ ├───────┤ ├───────┤ │ 5 │ │ 5 │ │ 5 │ ├───────┤ ├───────┤ ├───────┤ │ 3 │ │ 3 │ │ 3 │ └───────┘ └───────┘ └───────┘ ↑ top ↑ top ↑ top

Operations

push(item): Add item to top - O(1)
pop(): Remove and return top item - O(1)
peek(): View top item without removing - O(1)
isEmpty(): Check if stack is empty - O(1)

Time Complexity:
Push: O(1) | Pop: O(1) | Peek: O(1) | Search: O(n)

Real-World Usage

1. Function Call Stack

void functionC() {
    int z = 30;
    // Return pops this frame
}

void functionB() {
    int y = 20;
    functionC();  // Push new frame
}

void functionA() {
    int x = 10;
    functionB();  // Push new frame
}

Call Stack Evolution:
┌────────────┐    ┌────────────┐    ┌────────────┐
│            │    │            │    │ functionC  │
├────────────┤    ├────────────┤    ├────────────┤
│            │    │ functionB  │    │ functionB  │
├────────────┤    ├────────────┤    ├────────────┤
│ functionA  │    │ functionA  │    │ functionA  │
└────────────┘    └────────────┘    └────────────┘
   After A            After B           After C
        

2. Undo/Redo Functionality

class TextEditor {
    Stack<Command> undoStack;
    Stack<Command> redoStack;

    void type(String text) {
        Command cmd = new TypeCommand(text);
        cmd.execute();
        undoStack.push(cmd);
        redoStack.clear();  // Clear redo history
    }

    void undo() {
        if (!undoStack.isEmpty()) {
            Command cmd = undoStack.pop();
            cmd.undo();
            redoStack.push(cmd);
        }
    }

    void redo() {
        if (!redoStack.isEmpty()) {
            Command cmd = redoStack.pop();
            cmd.execute();
            undoStack.push(cmd);
        }
    }
}
        

3. Expression Evaluation

// Evaluate postfix: 3 4 + 2 *
// (3 + 4) * 2 = 14

Stack<int> stack;
for (token : tokens) {
    if (isNumber(token)) {
        stack.push(parseInt(token));
    } else {
        int b = stack.pop();
        int a = stack.pop();
        stack.push(evaluate(a, token, b));
    }
}
return stack.pop();

Process:
push 3:     [3]
push 4:     [3, 4]
'+':        pop 4, pop 3, push 7     → [7]
push 2:     [7, 2]
'*':        pop 2, pop 7, push 14    → [14]
        

4. Browser History (Back Button)

Stack<URL> backStack;
Stack<URL> forwardStack;

void visit(URL url) {
    backStack.push(currentURL);
    currentURL = url;
    forwardStack.clear();
}

void goBack() {
    forwardStack.push(currentURL);
    currentURL = backStack.pop();
}

void goForward() {
    backStack.push(currentURL);
    currentURL = forwardStack.pop();
}
        

5. Depth-First Search (DFS)

void dfs(Node start) {
    Stack<Node> stack;
    Set<Node> visited;

    stack.push(start);

    while (!stack.isEmpty()) {
        Node node = stack.pop();
        if (visited.contains(node)) continue;

        visited.add(node);
        process(node);

        for (Node neighbor : node.neighbors) {
            if (!visited.contains(neighbor)) {
                stack.push(neighbor);
            }
        }
    }
}
        

3. Queues (FIFO - First In, First Out)

A collection where elements are added at the back (enqueue) and removed from the front (dequeue).

Queue Operations: enqueue(5) enqueue(2) dequeue() → returns 3 ┌───┬───┬───┐ ┌───┬───┬───┬───┐ ┌───┬───┬───┐ │ 7 │ 4 │ 3 │ │ 5 │ 7 │ 4 │ 3 │ │ 5 │ 7 │ 4 │ └───┴───┴───┘ └───┴───┴───┴───┘ └───┴───┴───┘ ↑ ↑ ↑ ↑ ↑ ↑ back front back front back front

Operations

enqueue(item): Add item to back - O(1)
dequeue(): Remove and return front item - O(1)
peek(): View front item without removing - O(1)

Time Complexity:
Enqueue: O(1) | Dequeue: O(1) | Peek: O(1) | Search: O(n)

Circular Queue Implementation

class CircularQueue {
    int[] data;
    int front, rear, size;

    void enqueue(int val) {
        data[rear] = val;
        rear = (rear + 1) % capacity;  // Wrap around
        size++;
    }

    int dequeue() {
        int val = data[front];
        front = (front + 1) % capacity;  // Wrap around
        size--;
        return val;
    }
}

Circular Buffer:
┌───┬───┬───┬───┬───┐
│ 5 │ 6 │   │   │ 4 │
└───┴───┴───┴───┴───┘
      ↑           ↑
    rear        front

Enqueue 7: rear wraps to index 2
┌───┬───┬───┬───┬───┐
│ 5 │ 6 │ 7 │   │ 4 │
└───┴───┴───┴───┴───┘
          ↑       ↑
        rear    front
        

Real-World Usage

1. Task Scheduling / Job Queues

// Background job processing
class JobQueue {
    Queue<Job> pending = new LinkedList<>();

    void submitJob(Job job) {
        pending.enqueue(job);
    }

    void processJobs() {
        while (!pending.isEmpty()) {
            Job job = pending.dequeue();  // FIFO order
            job.execute();
        }
    }
}

// Examples:
// - Print queue (first submitted, first printed)
// - Email queue (send in order received)
// - HTTP request handling
// - Async task processing (Celery, RabbitMQ, Kafka)
        

2. Breadth-First Search (BFS)

void bfs(Node start) {
    Queue<Node> queue = new LinkedList<>();
    Set<Node> visited = new HashSet<>();

    queue.enqueue(start);
    visited.add(start);

    while (!queue.isEmpty()) {
        Node node = queue.dequeue();
        process(node);

        for (Node neighbor : node.neighbors) {
            if (!visited.contains(neighbor)) {
                visited.add(neighbor);
                queue.enqueue(neighbor);
            }
        }
    }
}

// Level-by-level traversal:
//       1
//      / \
//     2   3
//    / \
//   4   5
//
// BFS visits: 1 → 2 → 3 → 4 → 5
        

3. Message Queues (Distributed Systems)

// Producer-Consumer pattern
// Producer
rabbitMQ.send(queue: "orders", message: orderData);

// Consumer (processes in FIFO order)
while (true) {
    message = rabbitMQ.receive(queue: "orders");
    processOrder(message);
    rabbitMQ.acknowledge(message);
}

// Tools: RabbitMQ, Kafka, SQS, Redis Pub/Sub
// Use cases:
// - Decoupling microservices
// - Load leveling (smooth traffic spikes)
// - Async processing
// - Event streaming
        

4. Request Buffering

// Rate limiting with sliding window
class RateLimiter {
    Queue<Long> timestamps;
    int maxRequests = 100;
    long windowMs = 60000;  // 1 minute

    boolean allowRequest() {
        long now = System.currentTimeMillis();

        // Remove old timestamps
        while (!timestamps.isEmpty() &&
               timestamps.peek() < now - windowMs) {
            timestamps.dequeue();
        }

        if (timestamps.size() < maxRequests) {
            timestamps.enqueue(now);
            return true;
        }
        return false;
    }
}
        

4. Hash Tables (Hash Maps)

Key-value store with O(1) average-case lookup using a hash function.

Hash Table Structure: Hash Function: hash(key) % tableSize ┌────┬─────────────────────────────────────┐ │ 0 │ null │ ├────┼─────────────────────────────────────┤ │ 1 │ ("alice", 25) → ("charlie", 35) │ ← Collision chain ├────┼─────────────────────────────────────┤ │ 2 │ null │ ├────┼─────────────────────────────────────┤ │ 3 │ ("bob", 30) │ ├────┼─────────────────────────────────────┤ │ 4 │ ("dave", 40) │ └────┴─────────────────────────────────────┘ hash("alice") % 5 = 1 hash("charlie") % 5 = 1 ← Collision!

Collision Resolution

1. Chaining (Linked Lists)

class HashTable {
    LinkedList<Entry>[] buckets;

    void put(K key, V value) {
        int index = hash(key) % buckets.length;

        // Search chain for existing key
        for (Entry e : buckets[index]) {
            if (e.key.equals(key)) {
                e.value = value;  // Update
                return;
            }
        }

        // Add new entry to chain
        buckets[index].add(new Entry(key, value));
    }

    V get(K key) {
        int index = hash(key) % buckets.length;
        for (Entry e : buckets[index]) {
            if (e.key.equals(key)) return e.value;
        }
        return null;
    }
}

// Worst case: All keys hash to same bucket → O(n) lookup
// Average case: Even distribution → O(1) lookup
        

2. Open Addressing (Linear Probing)

void put(K key, V value) {
    int index = hash(key) % size;

    // Find next available slot
    while (table[index] != null && !table[index].key.equals(key)) {
        index = (index + 1) % size;  // Linear probe
    }

    table[index] = new Entry(key, value);
}

Insertion with Linear Probing:
Insert "alice" (hash=1), "bob" (hash=3), "charlie" (hash=1)

┌────┬─────────┬──────┬───────┬──────┐
│ 0  │         │      │       │      │
├────┼─────────┼──────┼───────┼──────┤
│ 1  │ "alice" │      │       │      │  hash("alice")=1
├────┼─────────┼──────┼───────┼──────┤
│ 2  │"charlie"│      │       │      │  hash("charlie")=1→collision→probe to 2
├────┼─────────┼──────┼───────┼──────┤
│ 3  │  "bob"  │      │       │      │  hash("bob")=3
└────┴─────────┴──────┴───────┴──────┘
        

Load Factor: ratio = elements / buckets. Keep < 0.75 for good performance. When exceeded, rehash (create larger table, reinsert all elements).

Hash Functions

// Good hash function properties:
// 1. Deterministic (same input → same output)
// 2. Uniform distribution
// 3. Fast to compute

// Simple string hash (Java's hashCode)
int hash(String s) {
    int h = 0;
    for (int i = 0; i < s.length(); i++) {
        h = 31 * h + s.charAt(i);  // 31 is prime
    }
    return h;
}

// Cryptographic hash (slower, more uniform)
SHA-256, MD5, etc.
        

Time Complexity:
Average: Insert O(1) | Search O(1) | Delete O(1)
Worst: Insert O(n) | Search O(n) | Delete O(n)

Real-World Usage

1. Database Indexes (Hash Indexes)

-- PostgreSQL hash index
CREATE INDEX idx_user_email_hash ON users USING HASH (email);

-- Fast equality lookups:
SELECT * FROM users WHERE email = 'alice@example.com';  -- O(1)

-- But cannot do:
SELECT * FROM users WHERE email LIKE 'alice%';  -- No range queries
SELECT * FROM users ORDER BY email;              -- No ordering

// In-memory: Redis uses hash tables for key-value storage
SET user:1000 "alice"     // O(1)
GET user:1000             // O(1)
        

2. Caches

// LRU Cache using HashMap + Doubly Linked List
class LRUCache {
    HashMap<Integer, Node> cache;
    DoublyLinkedList lruList;
    int capacity;

    int get(int key) {
        if (!cache.containsKey(key)) return -1;

        Node node = cache.get(key);
        lruList.moveToFront(node);  // Mark as recently used
        return node.value;
    }

    void put(int key, int value) {
        if (cache.containsKey(key)) {
            // Update existing
            Node node = cache.get(key);
            node.value = value;
            lruList.moveToFront(node);
        } else {
            // Add new
            if (cache.size() >= capacity) {
                // Evict LRU
                Node lru = lruList.removeLast();
                cache.remove(lru.key);
            }
            Node node = new Node(key, value);
            lruList.addToFront(node);
            cache.put(key, node);
        }
    }
}

// Used in: Browser caches, CDNs, Redis, Memcached
        

3. Deduplication

// Find duplicates in array
Set<Integer> seen = new HashSet<>();
for (int num : array) {
    if (seen.contains(num)) {
        System.out.println("Duplicate: " + num);
    }
    seen.add(num);
}

// Symbol table in compiler
HashMap<String, Symbol> symbolTable;
        

4. Counting / Frequency Analysis

// Word frequency
HashMap<String, Integer> wordCount = new HashMap<>();
for (String word : document.split("\\s+")) {
    wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}

// Character frequency (anagram detection)
boolean areAnagrams(String s1, String s2) {
    HashMap<Character, Integer> freq1 = getFrequency(s1);
    HashMap<Character, Integer> freq2 = getFrequency(s2);
    return freq1.equals(freq2);
}
        

5. Trees

Hierarchical data structure with nodes connected by edges.

Binary Tree: 1 / \ 2 3 / \ \ 4 5 6 Terminology: - Root: 1 - Parent of 4: 2 - Children of 2: 4, 5 - Leaf nodes: 4, 5, 6 - Height: 2 (longest path from root to leaf) - Depth of node 4: 2

Binary Search Tree (BST)

Left subtree < node < right subtree

BST Example: 8 / \ 3 10 / \ \ 1 6 14 / \ / 4 7 13 Search for 6: 8 (go left, 6 < 8) 3 (go right, 6 > 3) 6 (found!)

class TreeNode {
    int val;
    TreeNode left, right;
}

TreeNode search(TreeNode root, int target) {
    if (root == null || root.val == target) return root;

    if (target < root.val) {
        return search(root.left, target);   // Go left
    } else {
        return search(root.right, target);  // Go right
    }
}

void insert(TreeNode root, int val) {
    if (val < root.val) {
        if (root.left == null) {
            root.left = new TreeNode(val);
        } else {
            insert(root.left, val);
        }
    } else {
        if (root.right == null) {
            root.right = new TreeNode(val);
        } else {
            insert(root.right, val);
        }
    }
}
        

BST Time Complexity:
Average: Search O(log n) | Insert O(log n) | Delete O(log n)
Worst (unbalanced): O(n) - degenerates to linked list

B-Trees (Database Indexes)

Self-balancing tree optimized for disk I/O. Each node can have many children (high branching factor).

Graph Structure: A B-Tree is a specialized graph — specifically a rooted, balanced tree (connected acyclic graph). Each circle/box represents a vertex (node), and each line connecting them is an edge. Trees are graphs where any two vertices are connected by exactly one path.

B-Tree of order 3 (max 3 keys per node): [30|60] ← Root vertex / | \ ← Edges to children [10|20] [40|50] [70|80|90] ← Internal vertices / | \ | | \ | | | \ ← Edges ...leaf vertices... ← Leaf vertices (no outgoing edges) Graph Properties: - Vertices: Each node containing keys - Edges: Parent-child connections (directed, root → leaves) - Acyclic: No cycles (it's a tree) - Connected: All vertices reachable from root - Balanced: All leaf vertices at same depth - Height stays logarithmic: O(log n)

Why B-Trees for Databases?

Disk I/O optimization: Each node = 1 disk page (4KB). High branching factor means fewer disk reads
Balanced: All leaf nodes at same depth → predictable performance
Range queries: Leaf nodes often linked for sequential scan
Insertions maintain order: No need to rebuild entire index

-- PostgreSQL B-Tree index (default)
CREATE INDEX idx_user_created ON users (created_at);

-- Range query uses B-Tree efficiently:
SELECT * FROM users
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';

-- Scan: Navigate to '2024-01-01', then scan sequentially through leaves

// B-Tree depth calculation:
// With 200 keys per node, 1 billion records:
// log_200(1,000,000,000) ≈ 4 levels
// Only 4 disk reads to find any record!
        

File Systems (Tree Structure)

File System Hierarchy: / ├── home/ │ ├── alice/ │ │ ├── documents/ │ │ │ └── report.pdf │ │ └── photos/ │ └── bob/ │ └── code/ ├── etc/ │ ├── nginx/ │ └── ssh/ └── var/ └── log/ Stored as tree: - Directories are internal nodes - Files are leaf nodes - Path traversal: O(depth) // inode structure (Unix) struct inode { mode_t mode; // Permissions uid_t owner; off_t size; time_t mtime; block_t blocks[15]; // Direct + indirect pointers }; Direct blocks: Point to data Indirect blocks: Point to blocks of pointers (tree structure!)

// Directory lookup using B-tree (modern filesystems)
// ext4, btrfs use HTree (hash tree variant)

// Lookup /home/alice/documents/report.pdf:
1. Start at root inode
2. Read root directory, find "home" entry → inode 123
3. Read inode 123 directory, find "alice" → inode 456
4. Read inode 456 directory, find "documents" → inode 789
5. Read inode 789 directory, find "report.pdf" → inode 1000
6. Read inode 1000 → file data blocks

// Large directories use B-tree index for O(log n) lookup
// Small directories use linear search
        

Trie (Prefix Tree)

Graph Structure: A Trie is a specialized graph — specifically a rooted tree (connected acyclic graph) where each vertex represents a character, and edges connect characters in sequence. Like all trees, a Trie has exactly one path between any two vertices. Used for efficient prefix-based searches.

Trie for words: "cat", "car", "dog", "dodge" (root) ← Root vertex (empty) / \ ← Edges labeled implicitly by child vertex (c) (d) ← Vertices (each stores a character) | | ← Edges (a) (o) / \ | (t) (r) (g) ← Leaf vertices marked as "end of word" / (d) ← Internal vertex | (e) ← Leaf vertex Graph Properties: - Vertices: Each node stores a character - Edges: Connect consecutive characters in words - Root: Empty vertex (starting point) - Path: Root → leaf = complete word - Shared prefixes share vertex paths (c→a shared by "cat", "car")

class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
    boolean isEndOfWord = false;
}

void insert(String word) {
    TrieNode node = root;
    for (char c : word.toCharArray()) {
        node.children.putIfAbsent(c, new TrieNode());
        node = node.children.get(c);
    }
    node.isEndOfWord = true;
}

boolean search(String word) {
    TrieNode node = root;
    for (char c : word.toCharArray()) {
        if (!node.children.containsKey(c)) return false;
        node = node.children.get(c);
    }
    return node.isEndOfWord;
}

List<String> autocomplete(String prefix) {
    TrieNode node = navigateToPrefix(prefix);
    return collectAllWords(node, prefix);  // DFS from prefix node
}

// Used in: Autocomplete, spell checkers, IP routing tables
        

6. Heaps

Complete binary tree where parent is always greater (max-heap) or smaller (min-heap) than children.

Min-Heap: 1 / \ 3 2 / \ / \ 7 5 8 6 Properties: - Root is minimum element - Parent ≤ both children - Complete tree (filled left-to-right) - Height: O(log n) Array Representation: [1, 3, 2, 7, 5, 8, 6] 0 1 2 3 4 5 6 For node at index i: - Left child: 2i + 1 - Right child: 2i + 2 - Parent: (i - 1) / 2

Heap Operations

// Insert: Add to end, bubble up
void insert(int val) {
    heap.add(val);           // Add to end
    bubbleUp(heap.size() - 1);
}

void bubbleUp(int i) {
    while (i > 0) {
        int parent = (i - 1) / 2;
        if (heap[i] >= heap[parent]) break;
        swap(heap[i], heap[parent]);
        i = parent;
    }
}

// Extract-Min: Remove root, move last to root, bubble down
int extractMin() {
    int min = heap[0];
    heap[0] = heap[heap.size() - 1];
    heap.remove(heap.size() - 1);
    bubbleDown(0);
    return min;
}

void bubbleDown(int i) {
    while (true) {
        int left = 2 * i + 1;
        int right = 2 * i + 2;
        int smallest = i;

        if (left < size && heap[left] < heap[smallest])
            smallest = left;
        if (right < size && heap[right] < heap[smallest])
            smallest = right;

        if (smallest == i) break;

        swap(heap[i], heap[smallest]);
        i = smallest;
    }
}
        

Time Complexity:
Insert: O(log n) | Extract-Min/Max: O(log n) | Peek: O(1) | Build heap: O(n)

Real-World Usage

1. Priority Queue

// Task scheduling by priority
PriorityQueue<Task> queue = new PriorityQueue<>(
    (a, b) -> a.priority - b.priority  // Min-heap by priority
);

queue.add(new Task("Email", priority=3));
queue.add(new Task("Bug Fix", priority=1));    // Higher priority
queue.add(new Task("Feature", priority=5));

while (!queue.isEmpty()) {
    Task task = queue.poll();  // Always gets highest priority
    execute(task);
}

// Output: Bug Fix → Email → Feature

// Used in:
// - OS process scheduling
// - Dijkstra's algorithm
// - Event-driven simulation
// - Huffman coding
        

2. Finding K Largest/Smallest Elements

// Find K largest elements using min-heap of size K
int[] findKLargest(int[] nums, int k) {
    PriorityQueue<Integer> minHeap = new PriorityQueue<>();

    for (int num : nums) {
        minHeap.add(num);
        if (minHeap.size() > k) {
            minHeap.poll();  // Remove smallest
        }
    }

    // Heap contains K largest elements
    return minHeap.toArray();
}

// Time: O(n log k) - better than sorting O(n log n) when k << n

// Example: Find top 10 scores from 1 million entries
// Heap: O(1,000,000 × log 10) vs Sort: O(1,000,000 × log 1,000,000)
        

3. Heap Sort

void heapSort(int[] arr) {
    // 1. Build max-heap: O(n)
    buildMaxHeap(arr);

    // 2. Extract max n times: O(n log n)
    for (int i = arr.length - 1; i > 0; i--) {
        swap(arr[0], arr[i]);  // Move max to end
        heapSize--;
        bubbleDown(0);         // Restore heap
    }
}

// Result: Sorted array
// Time: O(n log n), Space: O(1) - in-place
// Not stable, but good for large datasets
        

4. Median Finding (Running Median)

// Maintain median as elements arrive
class MedianFinder {
    PriorityQueue<Integer> maxHeap;  // Lower half
    PriorityQueue<Integer> minHeap;  // Upper half

    void addNum(int num) {
        // Add to appropriate heap
        if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
            maxHeap.add(num);
        } else {
            minHeap.add(num);
        }

        // Balance heaps (sizes differ by at most 1)
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.add(maxHeap.poll());
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.add(minHeap.poll());
        }
    }

    double findMedian() {
        if (maxHeap.size() == minHeap.size()) {
            return (maxHeap.peek() + minHeap.peek()) / 2.0;
        }
        return maxHeap.peek();
    }
}

// Used in: Real-time statistics, streaming data analysis
        

8. Graphs

Set of vertices (nodes) connected by edges. Can be directed or undirected, weighted or unweighted.

Trees are Graphs: Many data structures in this guide are specialized graphs. A tree is a connected acyclic graph. This includes:

Binary Search Trees (BST) — Binary tree with ordering property
B-Trees — Balanced multi-way trees for databases (see Trees section)
Tries (Prefix Trees) — Character-based trees for string operations (see Trees section)
Heaps — Complete binary trees with heap property

Understanding graph theory fundamentals helps you reason about all tree-based structures.

Undirected Graph: Directed Graph: A --- B A → B | | ↓ ↓ C --- D C → D Weighted Graph: A --5-- B | | 2 3 | | C --1-- D

Graph Representations

1. Adjacency Matrix

// Graph with 4 vertices (A, B, C, D)
// Edge A-B, A-C, B-D, C-D

int[][] adjMatrix = {
//  A  B  C  D
   {0, 1, 1, 0},  // A
   {1, 0, 0, 1},  // B
   {1, 0, 0, 1},  // C
   {0, 1, 1, 0}   // D
};

// Check if edge exists: O(1)
boolean hasEdge(int u, int v) {
    return adjMatrix[u][v] == 1;
}

// Get neighbors: O(V) - must scan entire row
List<Integer> getNeighbors(int v) {
    List<Integer> neighbors = new ArrayList<>();
    for (int i = 0; i < V; i++) {
        if (adjMatrix[v][i] == 1) {
            neighbors.add(i);
        }
    }
    return neighbors;
}

// Space: O(V²)
// Good for: Dense graphs, checking if edge exists
        

2. Adjacency List

// Same graph as adjacency list
Map<Integer, List<Integer>> adjList = new HashMap<>();

adjList.put(A, Arrays.asList(B, C));
adjList.put(B, Arrays.asList(A, D));
adjList.put(C, Arrays.asList(A, D));
adjList.put(D, Arrays.asList(B, C));

// Check if edge exists: O(degree(v))
boolean hasEdge(int u, int v) {
    return adjList.get(u).contains(v);
}

// Get neighbors: O(1)
List<Integer> getNeighbors(int v) {
    return adjList.get(v);
}

// Space: O(V + E)
// Good for: Sparse graphs, iterating neighbors
        

Choosing Representation:
- Adjacency Matrix: Dense graphs, frequent edge existence checks, small graphs
- Adjacency List: Sparse graphs (most real-world graphs), memory-constrained, iterating neighbors

Real-World Usage

1. Social Networks

Social Network Graph: Alice / | \ Bob Carol Dave | | / \ Eve Frank Grace Helen Vertices = People Edges = Friendship Queries: - Find friends: O(1) with adjacency list - Friend recommendations: Friends of friends - Shortest path: Degrees of separation - Community detection: Graph clustering - Influence propagation: BFS/DFS from influencer

// Friend recommendations (friends of friends)
Set<User> recommendFriends(User user) {
    Set<User> friends = graph.getNeighbors(user);
    Set<User> recommendations = new HashSet<>();

    for (User friend : friends) {
        Set<User> friendsOfFriend = graph.getNeighbors(friend);
        for (User fof : friendsOfFriend) {
            if (fof != user && !friends.contains(fof)) {
                recommendations.add(fof);
            }
        }
    }

    return recommendations;
}

// Degrees of separation (BFS)
int degreesOfSeparation(User source, User target) {
    Queue<User> queue = new LinkedList<>();
    Map<User, Integer> distance = new HashMap<>();

    queue.add(source);
    distance.put(source, 0);

    while (!queue.isEmpty()) {
        User user = queue.poll();
        if (user == target) return distance.get(user);

        for (User friend : graph.getNeighbors(user)) {
            if (!distance.containsKey(friend)) {
                distance.put(friend, distance.get(user) + 1);
                queue.add(friend);
            }
        }
    }

    return -1;  // Not connected
}
        

2. Path Finding (Dijkstra's Algorithm)

// Find shortest path in weighted graph
// Used in: GPS navigation, network routing, game AI

int[] dijkstra(Graph graph, int source) {
    int[] dist = new int[V];
    Arrays.fill(dist, Integer.MAX_VALUE);
    dist[source] = 0;

    PriorityQueue<Node> pq = new PriorityQueue<>(
        (a, b) -> a.distance - b.distance
    );
    pq.add(new Node(source, 0));

    while (!pq.isEmpty()) {
        Node node = pq.poll();
        int u = node.vertex;

        if (node.distance > dist[u]) continue;  // Stale entry

        for (Edge edge : graph.getEdges(u)) {
            int v = edge.to;
            int weight = edge.weight;

            if (dist[u] + weight < dist[v]) {
                dist[v] = dist[u] + weight;
                pq.add(new Node(v, dist[v]));
            }
        }
    }

    return dist;  // Shortest distance to all vertices
}

// Time: O((V + E) log V) with min-heap
// Space: O(V)

Example:
    A --1-- B
    |       |
    4       2
    |       |
    C --1-- D

Shortest path A → D: A → B → D (cost 3)
Not A → C → D (cost 5)
        

9. Advanced & Probabilistic Data Structures

Specialized data structures that trade perfect accuracy for space efficiency or provide solutions for distributed systems.

Bloom Filters

Space-efficient probabilistic data structure for testing set membership. Can have false positives, but never false negatives.

Bloom Filter Structure: Bit Array (m bits): ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐ │0│0│0│0│0│0│0│0│0│0│0│0│0│0│0│0│ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ Add "alice": hash1("alice") % 16 = 3 hash2("alice") % 16 = 7 hash3("alice") % 16 = 12 ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬──┬──┬─┬─┬─┬─┐ │0│0│0│1│0│0│0│1│0│0│0 │1 │0│0│0│0│ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴──┴──┴─┴─┴─┴─┘ ↑ ↑ ↑ Set to 1 by hash functions Check "alice": All 3 bits are 1 → Probably present Check "bob": If any bit is 0 → Definitely not present

class BloomFilter {
    BitSet bits;
    int size;
    int numHashFunctions;

    void add(String item) {
        for (int i = 0; i < numHashFunctions; i++) {
            int hash = hash(item, i) % size;
            bits.set(hash);
        }
    }

    boolean mightContain(String item) {
        for (int i = 0; i < numHashFunctions; i++) {
            int hash = hash(item, i) % size;
            if (!bits.get(hash)) {
                return false;  // Definitely not present
            }
        }
        return true;  // Probably present (might be false positive)
    }
}

// False positive rate: (1 - e^(-kn/m))^k
// k = number of hash functions
// n = number of elements
// m = bit array size
        

Key Properties:

False Positives: Possible (says "yes" when should say "no")
False Negatives: Never (if says "no", definitely not present)
Cannot Remove: Clearing a bit might affect other elements
Space Efficient: ~10 bits per element for 1% false positive rate

Real-World Usage

// 1. Database Query Optimization (avoid disk reads)
// Check Bloom filter before hitting disk
if (bloomFilter.mightContain(key)) {
    // Might be on disk, check there
    value = diskRead(key);  // Expensive
} else {
    // Definitely not on disk
    return null;  // Saved a disk read!
}

// Used in:
// - RocksDB, Cassandra, HBase (avoid reading non-existent keys)
// - PostgreSQL (check if value exists before index scan)

// 2. Web Caching (Chrome, Squid)
// Check if malicious URL before full lookup
if (maliciousUrlBloomFilter.mightContain(url)) {
    // Do expensive full check
    if (isMalicious(url)) {
        block(url);
    }
}
// Most URLs are safe, bloom filter eliminates them quickly
        

Consistent Hashing

Distributes keys across nodes minimizing reorganization when nodes are added/removed.

Traditional Hashing: server = hash(key) % N Problem: Adding/removing server rehashes ALL keys! Consistent Hashing: Hash both keys and servers to ring [0, 2^32) 0 | S1 --|-- S2 | S3 Keys hash to positions, assigned to next clockwise server: K1 → S2 K2 → S3 K3 → S1 Add S4: Only keys between S3 and S4 move (not all keys!)

class ConsistentHash {
    TreeMap<Long, String> ring = new TreeMap<>();
    int virtualNodes = 150;  // Replicas per server

    void addServer(String server) {
        for (int i = 0; i < virtualNodes; i++) {
            long hash = hash(server + ":" + i);
            ring.put(hash, server);
        }
    }

    void removeServer(String server) {
        for (int i = 0; i < virtualNodes; i++) {
            long hash = hash(server + ":" + i);
            ring.remove(hash);
        }
    }

    String getServer(String key) {
        long hash = hash(key);

        // Find next server clockwise
        Map.Entry<Long, String> entry = ring.ceilingEntry(hash);
        if (entry == null) {
            entry = ring.firstEntry();  // Wrap around
        }

        return entry.getValue();
    }
}

// Virtual Nodes: Each physical server appears multiple times
// Improves load distribution (avoids hot spots)
        

Complexity Comparison

Data Structure	Access	Search	Insert	Delete	Space
Array	O(1)	O(n)	O(n)	O(n)	O(n)
Stack	O(n)	O(n)	O(1)	O(1)	O(n)
Queue	O(n)	O(n)	O(1)	O(1)	O(n)
Hash Table	-	O(1)*	O(1)*	O(1)*	O(n)
BST (balanced)	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
B-Tree	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
Heap	O(n)	O(n)	O(log n)	O(log n)	O(n)
Trie	O(k)	O(k)	O(k)	O(k)	O(ALPHABET_SIZE x N x k)

* Average case. Worst case O(n) for hash table.

k = key length for Trie

Common Interview Questions

Q: When would you use a hash table vs a balanced tree?
A: Hash table for O(1) average lookup when you only need equality checks and don't need ordering. Balanced tree (BST, B-tree) for O(log n) lookup when you need: (1) Range queries, (2) Sorted iteration, (3) Min/max operations, (4) Predecessor/successor. Example: Database uses B-tree for indexed columns to support both equality and range queries.

Q: Explain LIFO vs FIFO and give real examples.
A: LIFO (Last In, First Out) = Stack. Like a stack of plates - last one placed is first removed. Used in: function call stack, undo operations, DFS, expression evaluation. FIFO (First In, First Out) = Queue. Like a line at a store - first person in line is served first. Used in: task scheduling, message queues, BFS, printer queue.

Q: Why do databases use B-trees instead of binary search trees?
A: B-trees optimize disk I/O: (1) High branching factor (100s of keys per node) matches disk page size (4KB), (2) Fewer disk reads - O(log_200(N)) vs O(log_2(N)), (3) All leaves at same level = predictable performance, (4) Sequential leaf access for range scans. For 1 billion records: B-tree needs ~4 disk reads, BST needs ~30.

Q: How would you implement an LRU cache?
A: Use HashMap + Doubly Linked List. HashMap provides O(1) lookup (key → node). Doubly linked list maintains access order (head = most recent, tail = least recent). On access: move node to head. On insert when full: remove tail node. Both operations are O(1). The combination gives O(1) get and O(1) put.

Q: When would you use a graph vs a tree?
A: Tree is a special graph (connected, acyclic, N-1 edges for N nodes). Use tree when: (1) Clear hierarchy (filesystem, org chart), (2) One parent per node. Use general graph when: (1) Multiple paths between nodes (road network), (2) Cycles exist (social network - mutual friends), (3) No clear hierarchy. Trees are easier to traverse and reason about; graphs are more flexible.

Key Takeaway: Choose data structures based on access patterns. Arrays for sequential access, hash tables for key lookups, trees for ordered data, graphs for relationships. Understand trade-offs between time complexity, space complexity, and real-world constraints (disk I/O, cache locality).