Data Structures - Deep Dive
Interview Tip: For each data structure, understand: (1) Time/space complexity, (2) Real-world use cases, (3) When to use vs alternatives, (4) Implementation details. Be ready to explain trade-offs.
1. Arrays
The most fundamental data structure - contiguous block of memory storing elements of the same type.
Array in Memory:
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 5 │ 2 │ 8 │ 1 │ 9 │ 3 │ 7 │ 4 │
└───┴───┴───┴───┴───┴───┴───┴───┘
0 1 2 3 4 5 6 7 ← indices
Base Address: 1000
Element Size: 4 bytes
Address of arr[3] = 1000 + (3 × 4) = 1012
Key Properties
- Random Access: O(1) - can access any element directly via index
- Cache Friendly: Elements stored contiguously, excellent spatial locality
- Fixed Size: Size determined at creation (in most languages)
- Memory Efficient: No overhead beyond the data itself
Time Complexity:
Access: O(1) | Search: O(n) | Insert: O(n) | Delete: O(n)
Real-World Usage
// 1. Cache-friendly processing (video/image data)
void processPixels(int* pixels, int width, int height) {
for (int i = 0; i < width * height; i++) {
pixels[i] = transform(pixels[i]); // Sequential access
}
}
// 2. Lookup tables (fast constant-time mappings)
const char* statusMessages[5] = {
"Success", "Error", "Pending", "Timeout", "Cancelled"
};
printf("%s", statusMessages[statusCode]); // O(1) lookup
// 3. Dynamic arrays (ArrayList, vector)
// Automatically resize when full (amortized O(1) append)
vector<int> v;
v.push_back(42); // Amortized O(1)
Dynamic Arrays: When array is full, allocate new array (typically 2x size), copy elements, free old array. This gives amortized O(1) append despite occasional O(n) resize.
2. Stacks (LIFO - Last In, First Out)
A collection where elements are added and removed from the same end (top).
Stack Operations:
push(5) push(2) pop()
↓
┌───────┐ ┌───────┐ ┌───────┐
│ 2 │ │ 2 │ │ │
├───────┤ ├───────┤ ├───────┤
│ 5 │ │ 5 │ │ 5 │
├───────┤ ├───────┤ ├───────┤
│ 3 │ │ 3 │ │ 3 │
└───────┘ └───────┘ └───────┘
↑ top ↑ top ↑ top
Operations
- push(item): Add item to top - O(1)
- pop(): Remove and return top item - O(1)
- peek(): View top item without removing - O(1)
- isEmpty(): Check if stack is empty - O(1)
Time Complexity:
Push: O(1) | Pop: O(1) | Peek: O(1) | Search: O(n)
Real-World Usage
1. Function Call Stack
void functionC() {
int z = 30;
// Return pops this frame
}
void functionB() {
int y = 20;
functionC(); // Push new frame
}
void functionA() {
int x = 10;
functionB(); // Push new frame
}
Call Stack Evolution:
┌────────────┐ ┌────────────┐ ┌────────────┐
│ │ │ │ │ functionC │
├────────────┤ ├────────────┤ ├────────────┤
│ │ │ functionB │ │ functionB │
├────────────┤ ├────────────┤ ├────────────┤
│ functionA │ │ functionA │ │ functionA │
└────────────┘ └────────────┘ └────────────┘
After A After B After C
2. Undo/Redo Functionality
class TextEditor {
Stack<Command> undoStack;
Stack<Command> redoStack;
void type(String text) {
Command cmd = new TypeCommand(text);
cmd.execute();
undoStack.push(cmd);
redoStack.clear(); // Clear redo history
}
void undo() {
if (!undoStack.isEmpty()) {
Command cmd = undoStack.pop();
cmd.undo();
redoStack.push(cmd);
}
}
void redo() {
if (!redoStack.isEmpty()) {
Command cmd = redoStack.pop();
cmd.execute();
undoStack.push(cmd);
}
}
}
3. Expression Evaluation
// Evaluate postfix: 3 4 + 2 *
// (3 + 4) * 2 = 14
Stack<int> stack;
for (token : tokens) {
if (isNumber(token)) {
stack.push(parseInt(token));
} else {
int b = stack.pop();
int a = stack.pop();
stack.push(evaluate(a, token, b));
}
}
return stack.pop();
Process:
push 3: [3]
push 4: [3, 4]
'+': pop 4, pop 3, push 7 → [7]
push 2: [7, 2]
'*': pop 2, pop 7, push 14 → [14]
4. Browser History (Back Button)
Stack<URL> backStack;
Stack<URL> forwardStack;
void visit(URL url) {
backStack.push(currentURL);
currentURL = url;
forwardStack.clear();
}
void goBack() {
forwardStack.push(currentURL);
currentURL = backStack.pop();
}
void goForward() {
backStack.push(currentURL);
currentURL = forwardStack.pop();
}
5. Depth-First Search (DFS)
void dfs(Node start) {
Stack<Node> stack;
Set<Node> visited;
stack.push(start);
while (!stack.isEmpty()) {
Node node = stack.pop();
if (visited.contains(node)) continue;
visited.add(node);
process(node);
for (Node neighbor : node.neighbors) {
if (!visited.contains(neighbor)) {
stack.push(neighbor);
}
}
}
}
3. Queues (FIFO - First In, First Out)
A collection where elements are added at the back (enqueue) and removed from the front (dequeue).
Queue Operations:
enqueue(5) enqueue(2) dequeue()
→ returns 3
┌───┬───┬───┐ ┌───┬───┬───┬───┐ ┌───┬───┬───┐
│ 7 │ 4 │ 3 │ │ 5 │ 7 │ 4 │ 3 │ │ 5 │ 7 │ 4 │
└───┴───┴───┘ └───┴───┴───┴───┘ └───┴───┴───┘
↑ ↑ ↑ ↑ ↑ ↑
back front back front back front
Operations
- enqueue(item): Add item to back - O(1)
- dequeue(): Remove and return front item - O(1)
- peek(): View front item without removing - O(1)
Time Complexity:
Enqueue: O(1) | Dequeue: O(1) | Peek: O(1) | Search: O(n)
Circular Queue Implementation
class CircularQueue {
int[] data;
int front, rear, size;
void enqueue(int val) {
data[rear] = val;
rear = (rear + 1) % capacity; // Wrap around
size++;
}
int dequeue() {
int val = data[front];
front = (front + 1) % capacity; // Wrap around
size--;
return val;
}
}
Circular Buffer:
┌───┬───┬───┬───┬───┐
│ 5 │ 6 │ │ │ 4 │
└───┴───┴───┴───┴───┘
↑ ↑
rear front
Enqueue 7: rear wraps to index 2
┌───┬───┬───┬───┬───┐
│ 5 │ 6 │ 7 │ │ 4 │
└───┴───┴───┴───┴───┘
↑ ↑
rear front
Real-World Usage
1. Task Scheduling / Job Queues
// Background job processing
class JobQueue {
Queue<Job> pending = new LinkedList<>();
void submitJob(Job job) {
pending.enqueue(job);
}
void processJobs() {
while (!pending.isEmpty()) {
Job job = pending.dequeue(); // FIFO order
job.execute();
}
}
}
// Examples:
// - Print queue (first submitted, first printed)
// - Email queue (send in order received)
// - HTTP request handling
// - Async task processing (Celery, RabbitMQ, Kafka)
2. Breadth-First Search (BFS)
void bfs(Node start) {
Queue<Node> queue = new LinkedList<>();
Set<Node> visited = new HashSet<>();
queue.enqueue(start);
visited.add(start);
while (!queue.isEmpty()) {
Node node = queue.dequeue();
process(node);
for (Node neighbor : node.neighbors) {
if (!visited.contains(neighbor)) {
visited.add(neighbor);
queue.enqueue(neighbor);
}
}
}
}
// Level-by-level traversal:
// 1
// / \
// 2 3
// / \
// 4 5
//
// BFS visits: 1 → 2 → 3 → 4 → 5
3. Message Queues (Distributed Systems)
// Producer-Consumer pattern
// Producer
rabbitMQ.send(queue: "orders", message: orderData);
// Consumer (processes in FIFO order)
while (true) {
message = rabbitMQ.receive(queue: "orders");
processOrder(message);
rabbitMQ.acknowledge(message);
}
// Tools: RabbitMQ, Kafka, SQS, Redis Pub/Sub
// Use cases:
// - Decoupling microservices
// - Load leveling (smooth traffic spikes)
// - Async processing
// - Event streaming
4. Request Buffering
// Rate limiting with sliding window
class RateLimiter {
Queue<Long> timestamps;
int maxRequests = 100;
long windowMs = 60000; // 1 minute
boolean allowRequest() {
long now = System.currentTimeMillis();
// Remove old timestamps
while (!timestamps.isEmpty() &&
timestamps.peek() < now - windowMs) {
timestamps.dequeue();
}
if (timestamps.size() < maxRequests) {
timestamps.enqueue(now);
return true;
}
return false;
}
}
4. Hash Tables (Hash Maps)
Key-value store with O(1) average-case lookup using a hash function.
Hash Table Structure:
Hash Function: hash(key) % tableSize
┌────┬─────────────────────────────────────┐
│ 0 │ null │
├────┼─────────────────────────────────────┤
│ 1 │ ("alice", 25) → ("charlie", 35) │ ← Collision chain
├────┼─────────────────────────────────────┤
│ 2 │ null │
├────┼─────────────────────────────────────┤
│ 3 │ ("bob", 30) │
├────┼─────────────────────────────────────┤
│ 4 │ ("dave", 40) │
└────┴─────────────────────────────────────┘
hash("alice") % 5 = 1
hash("charlie") % 5 = 1 ← Collision!
Collision Resolution
1. Chaining (Linked Lists)
class HashTable {
LinkedList<Entry>[] buckets;
void put(K key, V value) {
int index = hash(key) % buckets.length;
// Search chain for existing key
for (Entry e : buckets[index]) {
if (e.key.equals(key)) {
e.value = value; // Update
return;
}
}
// Add new entry to chain
buckets[index].add(new Entry(key, value));
}
V get(K key) {
int index = hash(key) % buckets.length;
for (Entry e : buckets[index]) {
if (e.key.equals(key)) return e.value;
}
return null;
}
}
// Worst case: All keys hash to same bucket → O(n) lookup
// Average case: Even distribution → O(1) lookup
2. Open Addressing (Linear Probing)
void put(K key, V value) {
int index = hash(key) % size;
// Find next available slot
while (table[index] != null && !table[index].key.equals(key)) {
index = (index + 1) % size; // Linear probe
}
table[index] = new Entry(key, value);
}
Insertion with Linear Probing:
Insert "alice" (hash=1), "bob" (hash=3), "charlie" (hash=1)
┌────┬─────────┬──────┬───────┬──────┐
│ 0 │ │ │ │ │
├────┼─────────┼──────┼───────┼──────┤
│ 1 │ "alice" │ │ │ │ hash("alice")=1
├────┼─────────┼──────┼───────┼──────┤
│ 2 │"charlie"│ │ │ │ hash("charlie")=1→collision→probe to 2
├────┼─────────┼──────┼───────┼──────┤
│ 3 │ "bob" │ │ │ │ hash("bob")=3
└────┴─────────┴──────┴───────┴──────┘
Load Factor: ratio = elements / buckets. Keep < 0.75 for good performance. When exceeded, rehash (create larger table, reinsert all elements).
Hash Functions
// Good hash function properties:
// 1. Deterministic (same input → same output)
// 2. Uniform distribution
// 3. Fast to compute
// Simple string hash (Java's hashCode)
int hash(String s) {
int h = 0;
for (int i = 0; i < s.length(); i++) {
h = 31 * h + s.charAt(i); // 31 is prime
}
return h;
}
// Cryptographic hash (slower, more uniform)
SHA-256, MD5, etc.
Time Complexity:
Average: Insert O(1) | Search O(1) | Delete O(1)
Worst: Insert O(n) | Search O(n) | Delete O(n)
Real-World Usage
1. Database Indexes (Hash Indexes)
-- PostgreSQL hash index
CREATE INDEX idx_user_email_hash ON users USING HASH (email);
-- Fast equality lookups:
SELECT * FROM users WHERE email = 'alice@example.com'; -- O(1)
-- But cannot do:
SELECT * FROM users WHERE email LIKE 'alice%'; -- No range queries
SELECT * FROM users ORDER BY email; -- No ordering
// In-memory: Redis uses hash tables for key-value storage
SET user:1000 "alice" // O(1)
GET user:1000 // O(1)
2. Caches
// LRU Cache using HashMap + Doubly Linked List
class LRUCache {
HashMap<Integer, Node> cache;
DoublyLinkedList lruList;
int capacity;
int get(int key) {
if (!cache.containsKey(key)) return -1;
Node node = cache.get(key);
lruList.moveToFront(node); // Mark as recently used
return node.value;
}
void put(int key, int value) {
if (cache.containsKey(key)) {
// Update existing
Node node = cache.get(key);
node.value = value;
lruList.moveToFront(node);
} else {
// Add new
if (cache.size() >= capacity) {
// Evict LRU
Node lru = lruList.removeLast();
cache.remove(lru.key);
}
Node node = new Node(key, value);
lruList.addToFront(node);
cache.put(key, node);
}
}
}
// Used in: Browser caches, CDNs, Redis, Memcached
3. Deduplication
// Find duplicates in array
Set<Integer> seen = new HashSet<>();
for (int num : array) {
if (seen.contains(num)) {
System.out.println("Duplicate: " + num);
}
seen.add(num);
}
// Symbol table in compiler
HashMap<String, Symbol> symbolTable;
4. Counting / Frequency Analysis
// Word frequency
HashMap<String, Integer> wordCount = new HashMap<>();
for (String word : document.split("\\s+")) {
wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
}
// Character frequency (anagram detection)
boolean areAnagrams(String s1, String s2) {
HashMap<Character, Integer> freq1 = getFrequency(s1);
HashMap<Character, Integer> freq2 = getFrequency(s2);
return freq1.equals(freq2);
}
5. Trees
Hierarchical data structure with nodes connected by edges.
Binary Tree:
1
/ \
2 3
/ \ \
4 5 6
Terminology:
- Root: 1
- Parent of 4: 2
- Children of 2: 4, 5
- Leaf nodes: 4, 5, 6
- Height: 2 (longest path from root to leaf)
- Depth of node 4: 2
Binary Search Tree (BST)
Left subtree < node < right subtree
BST Example:
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
Search for 6:
8 (go left, 6 < 8)
3 (go right, 6 > 3)
6 (found!)
class TreeNode {
int val;
TreeNode left, right;
}
TreeNode search(TreeNode root, int target) {
if (root == null || root.val == target) return root;
if (target < root.val) {
return search(root.left, target); // Go left
} else {
return search(root.right, target); // Go right
}
}
void insert(TreeNode root, int val) {
if (val < root.val) {
if (root.left == null) {
root.left = new TreeNode(val);
} else {
insert(root.left, val);
}
} else {
if (root.right == null) {
root.right = new TreeNode(val);
} else {
insert(root.right, val);
}
}
}
BST Time Complexity:
Average: Search O(log n) | Insert O(log n) | Delete O(log n)
Worst (unbalanced): O(n) - degenerates to linked list
B-Trees (Database Indexes)
Self-balancing tree optimized for disk I/O. Each node can have many children (high branching factor).
Graph Structure: A B-Tree is a specialized
graph — specifically a rooted, balanced tree (connected acyclic graph). Each circle/box represents a
vertex (node), and each line connecting them is an
edge. Trees are graphs where any two vertices are connected by exactly one path.
B-Tree of order 3 (max 3 keys per node):
[30|60] ← Root vertex
/ | \ ← Edges to children
[10|20] [40|50] [70|80|90] ← Internal vertices
/ | \ | | \ | | | \ ← Edges
...leaf vertices... ← Leaf vertices (no outgoing edges)
Graph Properties:
- Vertices: Each node containing keys
- Edges: Parent-child connections (directed, root → leaves)
- Acyclic: No cycles (it's a tree)
- Connected: All vertices reachable from root
- Balanced: All leaf vertices at same depth
- Height stays logarithmic: O(log n)
Why B-Trees for Databases?
- Disk I/O optimization: Each node = 1 disk page (4KB). High branching factor means fewer disk reads
- Balanced: All leaf nodes at same depth → predictable performance
- Range queries: Leaf nodes often linked for sequential scan
- Insertions maintain order: No need to rebuild entire index
-- PostgreSQL B-Tree index (default)
CREATE INDEX idx_user_created ON users (created_at);
-- Range query uses B-Tree efficiently:
SELECT * FROM users
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
-- Scan: Navigate to '2024-01-01', then scan sequentially through leaves
// B-Tree depth calculation:
// With 200 keys per node, 1 billion records:
// log_200(1,000,000,000) ≈ 4 levels
// Only 4 disk reads to find any record!
File Systems (Tree Structure)
File System Hierarchy:
/
├── home/
│ ├── alice/
│ │ ├── documents/
│ │ │ └── report.pdf
│ │ └── photos/
│ └── bob/
│ └── code/
├── etc/
│ ├── nginx/
│ └── ssh/
└── var/
└── log/
Stored as tree:
- Directories are internal nodes
- Files are leaf nodes
- Path traversal: O(depth)
// inode structure (Unix)
struct inode {
mode_t mode; // Permissions
uid_t owner;
off_t size;
time_t mtime;
block_t blocks[15]; // Direct + indirect pointers
};
Direct blocks: Point to data
Indirect blocks: Point to blocks of pointers (tree structure!)
// Directory lookup using B-tree (modern filesystems)
// ext4, btrfs use HTree (hash tree variant)
// Lookup /home/alice/documents/report.pdf:
1. Start at root inode
2. Read root directory, find "home" entry → inode 123
3. Read inode 123 directory, find "alice" → inode 456
4. Read inode 456 directory, find "documents" → inode 789
5. Read inode 789 directory, find "report.pdf" → inode 1000
6. Read inode 1000 → file data blocks
// Large directories use B-tree index for O(log n) lookup
// Small directories use linear search
Trie (Prefix Tree)
Graph Structure: A Trie is a specialized
graph — specifically a rooted tree (connected acyclic graph) where each vertex represents a character, and edges connect characters in sequence. Like all trees, a Trie has exactly one path between any two vertices. Used for efficient prefix-based searches.
Trie for words: "cat", "car", "dog", "dodge"
(root) ← Root vertex (empty)
/ \ ← Edges labeled implicitly by child vertex
(c) (d) ← Vertices (each stores a character)
| | ← Edges
(a) (o)
/ \ |
(t) (r) (g) ← Leaf vertices marked as "end of word"
/
(d) ← Internal vertex
|
(e) ← Leaf vertex
Graph Properties:
- Vertices: Each node stores a character
- Edges: Connect consecutive characters in words
- Root: Empty vertex (starting point)
- Path: Root → leaf = complete word
- Shared prefixes share vertex paths (c→a shared by "cat", "car")
class TrieNode {
Map<Character, TrieNode> children = new HashMap<>();
boolean isEndOfWord = false;
}
void insert(String word) {
TrieNode node = root;
for (char c : word.toCharArray()) {
node.children.putIfAbsent(c, new TrieNode());
node = node.children.get(c);
}
node.isEndOfWord = true;
}
boolean search(String word) {
TrieNode node = root;
for (char c : word.toCharArray()) {
if (!node.children.containsKey(c)) return false;
node = node.children.get(c);
}
return node.isEndOfWord;
}
List<String> autocomplete(String prefix) {
TrieNode node = navigateToPrefix(prefix);
return collectAllWords(node, prefix); // DFS from prefix node
}
// Used in: Autocomplete, spell checkers, IP routing tables
6. Heaps
Complete binary tree where parent is always greater (max-heap) or smaller (min-heap) than children.
Min-Heap:
1
/ \
3 2
/ \ / \
7 5 8 6
Properties:
- Root is minimum element
- Parent ≤ both children
- Complete tree (filled left-to-right)
- Height: O(log n)
Array Representation:
[1, 3, 2, 7, 5, 8, 6]
0 1 2 3 4 5 6
For node at index i:
- Left child: 2i + 1
- Right child: 2i + 2
- Parent: (i - 1) / 2
Heap Operations
// Insert: Add to end, bubble up
void insert(int val) {
heap.add(val); // Add to end
bubbleUp(heap.size() - 1);
}
void bubbleUp(int i) {
while (i > 0) {
int parent = (i - 1) / 2;
if (heap[i] >= heap[parent]) break;
swap(heap[i], heap[parent]);
i = parent;
}
}
// Extract-Min: Remove root, move last to root, bubble down
int extractMin() {
int min = heap[0];
heap[0] = heap[heap.size() - 1];
heap.remove(heap.size() - 1);
bubbleDown(0);
return min;
}
void bubbleDown(int i) {
while (true) {
int left = 2 * i + 1;
int right = 2 * i + 2;
int smallest = i;
if (left < size && heap[left] < heap[smallest])
smallest = left;
if (right < size && heap[right] < heap[smallest])
smallest = right;
if (smallest == i) break;
swap(heap[i], heap[smallest]);
i = smallest;
}
}
Time Complexity:
Insert: O(log n) | Extract-Min/Max: O(log n) | Peek: O(1) | Build heap: O(n)
Real-World Usage
1. Priority Queue
// Task scheduling by priority
PriorityQueue<Task> queue = new PriorityQueue<>(
(a, b) -> a.priority - b.priority // Min-heap by priority
);
queue.add(new Task("Email", priority=3));
queue.add(new Task("Bug Fix", priority=1)); // Higher priority
queue.add(new Task("Feature", priority=5));
while (!queue.isEmpty()) {
Task task = queue.poll(); // Always gets highest priority
execute(task);
}
// Output: Bug Fix → Email → Feature
// Used in:
// - OS process scheduling
// - Dijkstra's algorithm
// - Event-driven simulation
// - Huffman coding
2. Finding K Largest/Smallest Elements
// Find K largest elements using min-heap of size K
int[] findKLargest(int[] nums, int k) {
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
for (int num : nums) {
minHeap.add(num);
if (minHeap.size() > k) {
minHeap.poll(); // Remove smallest
}
}
// Heap contains K largest elements
return minHeap.toArray();
}
// Time: O(n log k) - better than sorting O(n log n) when k << n
// Example: Find top 10 scores from 1 million entries
// Heap: O(1,000,000 × log 10) vs Sort: O(1,000,000 × log 1,000,000)
3. Heap Sort
void heapSort(int[] arr) {
// 1. Build max-heap: O(n)
buildMaxHeap(arr);
// 2. Extract max n times: O(n log n)
for (int i = arr.length - 1; i > 0; i--) {
swap(arr[0], arr[i]); // Move max to end
heapSize--;
bubbleDown(0); // Restore heap
}
}
// Result: Sorted array
// Time: O(n log n), Space: O(1) - in-place
// Not stable, but good for large datasets
4. Median Finding (Running Median)
// Maintain median as elements arrive
class MedianFinder {
PriorityQueue<Integer> maxHeap; // Lower half
PriorityQueue<Integer> minHeap; // Upper half
void addNum(int num) {
// Add to appropriate heap
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.add(num);
} else {
minHeap.add(num);
}
// Balance heaps (sizes differ by at most 1)
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.add(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.add(minHeap.poll());
}
}
double findMedian() {
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.peek() + minHeap.peek()) / 2.0;
}
return maxHeap.peek();
}
}
// Used in: Real-time statistics, streaming data analysis
8. Graphs
Set of vertices (nodes) connected by edges. Can be directed or undirected, weighted or unweighted.
Trees are Graphs: Many data structures in this guide are specialized graphs. A
tree is a connected acyclic graph. This includes:
Understanding graph theory fundamentals helps you reason about all tree-based structures.
Undirected Graph: Directed Graph:
A --- B A → B
| | ↓ ↓
C --- D C → D
Weighted Graph:
A --5-- B
| |
2 3
| |
C --1-- D
Graph Representations
1. Adjacency Matrix
// Graph with 4 vertices (A, B, C, D)
// Edge A-B, A-C, B-D, C-D
int[][] adjMatrix = {
// A B C D
{0, 1, 1, 0}, // A
{1, 0, 0, 1}, // B
{1, 0, 0, 1}, // C
{0, 1, 1, 0} // D
};
// Check if edge exists: O(1)
boolean hasEdge(int u, int v) {
return adjMatrix[u][v] == 1;
}
// Get neighbors: O(V) - must scan entire row
List<Integer> getNeighbors(int v) {
List<Integer> neighbors = new ArrayList<>();
for (int i = 0; i < V; i++) {
if (adjMatrix[v][i] == 1) {
neighbors.add(i);
}
}
return neighbors;
}
// Space: O(V²)
// Good for: Dense graphs, checking if edge exists
2. Adjacency List
// Same graph as adjacency list
Map<Integer, List<Integer>> adjList = new HashMap<>();
adjList.put(A, Arrays.asList(B, C));
adjList.put(B, Arrays.asList(A, D));
adjList.put(C, Arrays.asList(A, D));
adjList.put(D, Arrays.asList(B, C));
// Check if edge exists: O(degree(v))
boolean hasEdge(int u, int v) {
return adjList.get(u).contains(v);
}
// Get neighbors: O(1)
List<Integer> getNeighbors(int v) {
return adjList.get(v);
}
// Space: O(V + E)
// Good for: Sparse graphs, iterating neighbors
Choosing Representation:
- Adjacency Matrix: Dense graphs, frequent edge existence checks, small graphs
- Adjacency List: Sparse graphs (most real-world graphs), memory-constrained, iterating neighbors
Real-World Usage
1. Social Networks
Social Network Graph:
Alice
/ | \
Bob Carol Dave
| | / \
Eve Frank Grace Helen
Vertices = People
Edges = Friendship
Queries:
- Find friends: O(1) with adjacency list
- Friend recommendations: Friends of friends
- Shortest path: Degrees of separation
- Community detection: Graph clustering
- Influence propagation: BFS/DFS from influencer
// Friend recommendations (friends of friends)
Set<User> recommendFriends(User user) {
Set<User> friends = graph.getNeighbors(user);
Set<User> recommendations = new HashSet<>();
for (User friend : friends) {
Set<User> friendsOfFriend = graph.getNeighbors(friend);
for (User fof : friendsOfFriend) {
if (fof != user && !friends.contains(fof)) {
recommendations.add(fof);
}
}
}
return recommendations;
}
// Degrees of separation (BFS)
int degreesOfSeparation(User source, User target) {
Queue<User> queue = new LinkedList<>();
Map<User, Integer> distance = new HashMap<>();
queue.add(source);
distance.put(source, 0);
while (!queue.isEmpty()) {
User user = queue.poll();
if (user == target) return distance.get(user);
for (User friend : graph.getNeighbors(user)) {
if (!distance.containsKey(friend)) {
distance.put(friend, distance.get(user) + 1);
queue.add(friend);
}
}
}
return -1; // Not connected
}
2. Path Finding (Dijkstra's Algorithm)
// Find shortest path in weighted graph
// Used in: GPS navigation, network routing, game AI
int[] dijkstra(Graph graph, int source) {
int[] dist = new int[V];
Arrays.fill(dist, Integer.MAX_VALUE);
dist[source] = 0;
PriorityQueue<Node> pq = new PriorityQueue<>(
(a, b) -> a.distance - b.distance
);
pq.add(new Node(source, 0));
while (!pq.isEmpty()) {
Node node = pq.poll();
int u = node.vertex;
if (node.distance > dist[u]) continue; // Stale entry
for (Edge edge : graph.getEdges(u)) {
int v = edge.to;
int weight = edge.weight;
if (dist[u] + weight < dist[v]) {
dist[v] = dist[u] + weight;
pq.add(new Node(v, dist[v]));
}
}
}
return dist; // Shortest distance to all vertices
}
// Time: O((V + E) log V) with min-heap
// Space: O(V)
Example:
A --1-- B
| |
4 2
| |
C --1-- D
Shortest path A → D: A → B → D (cost 3)
Not A → C → D (cost 5)
9. Advanced & Probabilistic Data Structures
Specialized data structures that trade perfect accuracy for space efficiency or provide solutions for distributed systems.
Bloom Filters
Space-efficient probabilistic data structure for testing set membership. Can have false positives, but never false negatives.
Bloom Filter Structure:
Bit Array (m bits):
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│0│0│0│0│0│0│0│0│0│0│0│0│0│0│0│0│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
Add "alice":
hash1("alice") % 16 = 3
hash2("alice") % 16 = 7
hash3("alice") % 16 = 12
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬──┬──┬─┬─┬─┬─┐
│0│0│0│1│0│0│0│1│0│0│0 │1 │0│0│0│0│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴──┴──┴─┴─┴─┴─┘
↑ ↑ ↑
Set to 1 by hash functions
Check "alice": All 3 bits are 1 → Probably present
Check "bob": If any bit is 0 → Definitely not present
class BloomFilter {
BitSet bits;
int size;
int numHashFunctions;
void add(String item) {
for (int i = 0; i < numHashFunctions; i++) {
int hash = hash(item, i) % size;
bits.set(hash);
}
}
boolean mightContain(String item) {
for (int i = 0; i < numHashFunctions; i++) {
int hash = hash(item, i) % size;
if (!bits.get(hash)) {
return false; // Definitely not present
}
}
return true; // Probably present (might be false positive)
}
}
// False positive rate: (1 - e^(-kn/m))^k
// k = number of hash functions
// n = number of elements
// m = bit array size
Key Properties:
- False Positives: Possible (says "yes" when should say "no")
- False Negatives: Never (if says "no", definitely not present)
- Cannot Remove: Clearing a bit might affect other elements
- Space Efficient: ~10 bits per element for 1% false positive rate
Real-World Usage
// 1. Database Query Optimization (avoid disk reads)
// Check Bloom filter before hitting disk
if (bloomFilter.mightContain(key)) {
// Might be on disk, check there
value = diskRead(key); // Expensive
} else {
// Definitely not on disk
return null; // Saved a disk read!
}
// Used in:
// - RocksDB, Cassandra, HBase (avoid reading non-existent keys)
// - PostgreSQL (check if value exists before index scan)
// 2. Web Caching (Chrome, Squid)
// Check if malicious URL before full lookup
if (maliciousUrlBloomFilter.mightContain(url)) {
// Do expensive full check
if (isMalicious(url)) {
block(url);
}
}
// Most URLs are safe, bloom filter eliminates them quickly
Consistent Hashing
Distributes keys across nodes minimizing reorganization when nodes are added/removed.
Traditional Hashing:
server = hash(key) % N
Problem: Adding/removing server rehashes ALL keys!
Consistent Hashing:
Hash both keys and servers to ring [0, 2^32)
0
|
S1 --|-- S2
|
S3
Keys hash to positions, assigned to next clockwise server:
K1 → S2
K2 → S3
K3 → S1
Add S4: Only keys between S3 and S4 move (not all keys!)
class ConsistentHash {
TreeMap<Long, String> ring = new TreeMap<>();
int virtualNodes = 150; // Replicas per server
void addServer(String server) {
for (int i = 0; i < virtualNodes; i++) {
long hash = hash(server + ":" + i);
ring.put(hash, server);
}
}
void removeServer(String server) {
for (int i = 0; i < virtualNodes; i++) {
long hash = hash(server + ":" + i);
ring.remove(hash);
}
}
String getServer(String key) {
long hash = hash(key);
// Find next server clockwise
Map.Entry<Long, String> entry = ring.ceilingEntry(hash);
if (entry == null) {
entry = ring.firstEntry(); // Wrap around
}
return entry.getValue();
}
}
// Virtual Nodes: Each physical server appears multiple times
// Improves load distribution (avoids hot spots)
Complexity Comparison
| Data Structure |
Access |
Search |
Insert |
Delete |
Space |
| Array |
O(1) |
O(n) |
O(n) |
O(n) |
O(n) |
| Stack |
O(n) |
O(n) |
O(1) |
O(1) |
O(n) |
| Queue |
O(n) |
O(n) |
O(1) |
O(1) |
O(n) |
| Hash Table |
- |
O(1)* |
O(1)* |
O(1)* |
O(n) |
| BST (balanced) |
O(log n) |
O(log n) |
O(log n) |
O(log n) |
O(n) |
| B-Tree |
O(log n) |
O(log n) |
O(log n) |
O(log n) |
O(n) |
| Heap |
O(n) |
O(n) |
O(log n) |
O(log n) |
O(n) |
| Trie |
O(k) |
O(k) |
O(k) |
O(k) |
O(ALPHABET_SIZE x N x k) |
* Average case. Worst case O(n) for hash table.
k = key length for Trie
Common Interview Questions
Q: When would you use a hash table vs a balanced tree?
A: Hash table for O(1) average lookup when you only need equality checks and don't need ordering. Balanced tree (BST, B-tree) for O(log n) lookup when you need: (1) Range queries, (2) Sorted iteration, (3) Min/max operations, (4) Predecessor/successor. Example: Database uses B-tree for indexed columns to support both equality and range queries.
Q: Explain LIFO vs FIFO and give real examples.
A: LIFO (Last In, First Out) = Stack. Like a stack of plates - last one placed is first removed. Used in: function call stack, undo operations, DFS, expression evaluation. FIFO (First In, First Out) = Queue. Like a line at a store - first person in line is served first. Used in: task scheduling, message queues, BFS, printer queue.
Q: Why do databases use B-trees instead of binary search trees?
A: B-trees optimize disk I/O: (1) High branching factor (100s of keys per node) matches disk page size (4KB), (2) Fewer disk reads - O(log_200(N)) vs O(log_2(N)), (3) All leaves at same level = predictable performance, (4) Sequential leaf access for range scans. For 1 billion records: B-tree needs ~4 disk reads, BST needs ~30.
Q: How would you implement an LRU cache?
A: Use HashMap + Doubly Linked List. HashMap provides O(1) lookup (key → node). Doubly linked list maintains access order (head = most recent, tail = least recent). On access: move node to head. On insert when full: remove tail node. Both operations are O(1). The combination gives O(1) get and O(1) put.
Q: When would you use a graph vs a tree?
A: Tree is a special graph (connected, acyclic, N-1 edges for N nodes). Use tree when: (1) Clear hierarchy (filesystem, org chart), (2) One parent per node. Use general graph when: (1) Multiple paths between nodes (road network), (2) Cycles exist (social network - mutual friends), (3) No clear hierarchy. Trees are easier to traverse and reason about; graphs are more flexible.
Key Takeaway: Choose data structures based on access patterns. Arrays for sequential access, hash tables for key lookups, trees for ordered data, graphs for relationships. Understand trade-offs between time complexity, space complexity, and real-world constraints (disk I/O, cache locality).