Understanding the right data structure for the job - and why it matters at scale
Iterable<E>
└── Collection<E>
├── List<E> // Ordered, allows duplicates
│ ├── ArrayList
│ ├── LinkedList
│ └── Vector (legacy)
│
├── Set<E> // No duplicates
│ ├── HashSet
│ ├── LinkedHashSet
│ └── TreeSet (SortedSet)
│
└── Queue<E> // FIFO (usually)
├── LinkedList
├── PriorityQueue
└── Deque
├── ArrayDeque
└── LinkedList
Map<K,V> // Key-Value pairs (not Collection!)
├── HashMap
├── LinkedHashMap
├── TreeMap (SortedMap)
├── Hashtable (legacy)
└── ConcurrentHashMap
# Python list - dynamic array, like ArrayList names = ["Alice", "Bob", "Charlie"] names.append("Diana") # Add at end: O(1) amortized names.insert(0, "Eve") # Add at start: O(n) names[2] # Access by index: O(1) names.pop(0) # Remove first: O(n) "Alice" in names # Search: O(n) # Python list is always mutable
// ArrayList - dynamic array (most common) // Best for: random access, iteration, add at end List<String> names = new ArrayList<>(); names.add("Diana"); // Add at end: O(1) amortized names.add(0, "Eve"); // Add at start: O(n) - shifts all elements names.get(2); // Access by index: O(1) names.remove(0); // Remove first: O(n) names.contains("Alice"); // Search: O(n) // Immutable list (Java 9+) List<String> immutable = List.of("Alice", "Bob"); immutable.add("Charlie"); // UnsupportedOperationException! // Unmodifiable wrapper List<String> readOnly = Collections.unmodifiableList(names);
// LinkedList - doubly-linked list // Best for: frequent add/remove at beginning/end, queue operations LinkedList<String> names = new LinkedList<>(); names.addFirst("Eve"); // Add at start: O(1) names.addLast("Diana"); // Add at end: O(1) names.get(2); // Access by index: O(n) - must traverse! names.removeFirst(); // Remove first: O(1) names.removeLast(); // Remove last: O(1) // Also implements Deque (double-ended queue) names.push("stack item"); // Stack push names.pop(); // Stack pop names.offer("queue item"); // Queue offer names.poll(); // Queue poll
| Operation | ArrayList | LinkedList | Winner |
|---|---|---|---|
| Get by index | O(1) | O(n) | ArrayList |
| Add at end | O(1)* | O(1) | Tie |
| Add at beginning | O(n) | O(1) | LinkedList |
| Add in middle | O(n) | O(n)** | Depends |
| Memory overhead | Low | High (node objects) | ArrayList |
| Cache locality | Excellent | Poor | ArrayList |
* Amortized - occasionally O(n) for resize. ** O(1) for insert, but O(n) to find position.
Use ArrayList by default. LinkedList's theoretical advantages rarely matter in practice due to cache effects and memory overhead. Only use LinkedList when you need Deque operations or genuinely add/remove frequently at the head.
# Python set - unordered, unique elements names = {"Alice", "Bob", "Alice"} # {"Alice", "Bob"} names.add("Charlie") # Add: O(1) "Alice" in names # Contains: O(1) names.remove("Bob") # Remove: O(1) # Set operations a = {1, 2, 3} b = {2, 3, 4} a | b # Union: {1, 2, 3, 4} a & b # Intersection: {2, 3} a - b # Difference: {1}
// HashSet - unordered, backed by HashMap // Best for: fast membership testing, deduplication Set<String> names = new HashSet<>(); names.add("Alice"); names.add("Bob"); names.add("Alice"); // Ignored - already exists // names = {"Alice", "Bob"} - order not guaranteed names.add("Charlie"); // Add: O(1) names.contains("Alice"); // Contains: O(1) names.remove("Bob"); // Remove: O(1) // Set operations (from Guava or manual) Set<Integer> a = Set.of(1, 2, 3); Set<Integer> b = Set.of(2, 3, 4); // Union Set<Integer> union = new HashSet<>(a); union.addAll(b); // {1, 2, 3, 4} // Intersection Set<Integer> intersection = new HashSet<>(a); intersection.retainAll(b); // {2, 3} // Difference Set<Integer> difference = new HashSet<>(a); difference.removeAll(b); // {1}
// TreeSet - sorted, backed by Red-Black tree // Best for: sorted data, range queries Set<String> names = new TreeSet<>(); names.add("Charlie"); names.add("Alice"); names.add("Bob"); // Iteration order: Alice, Bob, Charlie (sorted!) // O(log n) for all operations (balanced tree) names.add("Diana"); // O(log n) names.contains("Alice"); // O(log n) // NavigableSet operations (TreeSet specific) TreeSet<Integer> numbers = new TreeSet<>(List.of(1, 5, 10, 15, 20)); numbers.first(); // 1 (smallest) numbers.last(); // 20 (largest) numbers.floor(12); // 10 (largest <= 12) numbers.ceiling(12); // 15 (smallest >= 12) numbers.subSet(5, 15); // [5, 10) - range view numbers.headSet(10); // [1, 5) - elements < 10 numbers.tailSet(10); // [10, 15, 20] - elements >= 10
// Maintains insertion order (unlike HashSet) Set<String> names = new LinkedHashSet<>(); names.add("Charlie"); names.add("Alice"); names.add("Bob"); // Iteration order: Charlie, Alice, Bob (insertion order preserved)
# Python dict - ordered (3.7+), unique keys ages = {"Alice": 30, "Bob": 25} ages["Charlie"] = 35 # Add/update: O(1) ages.get("Diana", 0) # Get with default: O(1) ages["Alice"] # Get (KeyError if missing): O(1) del ages["Bob"] # Remove: O(1) "Alice" in ages # Contains key: O(1) # Iteration for key in ages: print(key, ages[key]) for key, value in ages.items(): print(key, value)
// HashMap - unordered, null keys/values allowed Map<String, Integer> ages = new HashMap<>(); ages.put("Alice", 30); // Add: O(1) ages.put("Bob", 25); ages.put("Alice", 31); // Update: replaces value ages.get("Alice"); // Get (null if missing): O(1) ages.getOrDefault("Diana", 0); // Get with default: O(1) ages.remove("Bob"); // Remove: O(1) ages.containsKey("Alice"); // Contains key: O(1) ages.containsValue(30); // Contains value: O(n)! // Iteration for (String key : ages.keySet()) { System.out.println(key + ": " + ages.get(key)); } for (Map.Entry<String, Integer> entry : ages.entrySet()) { System.out.println(entry.getKey() + ": " + entry.getValue()); } // Java 8+ forEach ages.forEach((key, value) -> System.out.println(key + ": " + value)); // Compute methods (atomic update) ages.compute("Alice", (k, v) -> v == null ? 1 : v + 1); // Increment ages.computeIfAbsent("Diana", k -> 0); // Default if missing ages.merge("Alice", 1, Integer::sum); // Merge with existing
| Type | Order | Null Keys | Thread-Safe | Use Case |
|---|---|---|---|---|
HashMap | None | Yes (1) | No | General purpose |
LinkedHashMap | Insertion | Yes (1) | No | Predictable iteration |
TreeMap | Sorted | No | No | Sorted keys, range queries |
ConcurrentHashMap | None | No | Yes | Multi-threaded access |
EnumMap | Enum order | No | No | Enum keys (very fast) |
// Queue - FIFO (First In, First Out) Queue<String> queue = new LinkedList<>(); queue.offer("first"); // Add to tail queue.offer("second"); queue.peek(); // View head (null if empty) queue.poll(); // Remove head (null if empty) // Deque - Double-ended queue (also a Stack!) Deque<String> deque = new ArrayDeque<>(); // Preferred over LinkedList // Queue operations deque.offerLast("tail"); // Add to tail deque.pollFirst(); // Remove from head // Stack operations (LIFO - Last In, First Out) deque.push("top"); // Add to head (same as offerFirst) deque.pop(); // Remove from head deque.peek(); // View head // DON'T use java.util.Stack (legacy, synchronized) // Use ArrayDeque for stack operations
// PriorityQueue - elements ordered by priority // Backed by heap - not sorted, but min/max always at head // Natural ordering (min-heap) PriorityQueue<Integer> minHeap = new PriorityQueue<>(); minHeap.offer(5); minHeap.offer(1); minHeap.offer(3); minHeap.poll(); // 1 (smallest) minHeap.poll(); // 3 minHeap.poll(); // 5 // Max-heap using Comparator PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder()); maxHeap.offer(5); maxHeap.offer(1); maxHeap.offer(3); maxHeap.poll(); // 5 (largest) // Custom priority PriorityQueue<Task> taskQueue = new PriorityQueue<>( Comparator.comparing(Task::getPriority) .thenComparing(Task::getCreatedAt) ); // Time complexity: // offer(): O(log n) // poll(): O(log n) // peek(): O(1)
// Immutable collections - cannot be modified after creation // List.of() - immutable list List<String> list = List.of("a", "b", "c"); list.add("d"); // UnsupportedOperationException! // Set.of() - immutable set Set<String> set = Set.of("a", "b", "c"); // Map.of() - immutable map (up to 10 entries) Map<String, Integer> map = Map.of( "one", 1, "two", 2, "three", 3 ); // Map.ofEntries() - for more entries Map<String, Integer> bigMap = Map.ofEntries( Map.entry("one", 1), Map.entry("two", 2), // ... many more ); // Creating immutable copy of existing collection List<String> mutable = new ArrayList<>(List.of("a", "b")); List<String> immutable = List.copyOf(mutable); // Immutable copy mutable.add("c"); // OK // immutable is unchanged
numbers = [1, 2, 3, 4, 5] # Filter, map, reduce result = [x * 2 for x in numbers if x % 2 == 0] # [4, 8] # Or with functional style from functools import reduce evens = filter(lambda x: x % 2 == 0, numbers) doubled = map(lambda x: x * 2, evens) result = list(doubled) # [4, 8] # Sum total = sum(numbers) # Group by from itertools import groupby grouped = {k: list(v) for k, v in groupby(sorted(items, key=get_key), get_key)}
List<Integer> numbers = List.of(1, 2, 3, 4, 5); // Filter, map, collect List<Integer> result = numbers.stream() .filter(x -> x % 2 == 0) // Keep evens .map(x -> x * 2) // Double them .toList(); // [4, 8] // Sum int total = numbers.stream() .mapToInt(Integer::intValue) .sum(); // 15 // Find first matching Optional<Integer> first = numbers.stream() .filter(x -> x > 3) .findFirst(); // Optional[4] // Group by Map<Boolean, List<Integer>> partitioned = numbers.stream() .collect(Collectors.partitioningBy(x -> x % 2 == 0)); // {false=[1, 3, 5], true=[2, 4]} // Group by custom key Map<String, List<User>> byDepartment = users.stream() .collect(Collectors.groupingBy(User::getDepartment)); // Parallel stream (automatic parallelization) long count = numbers.parallelStream() .filter(x -> isPrime(x)) .count();
| Need | Use | Why |
|---|---|---|
| Ordered list, random access | ArrayList | O(1) access, cache-friendly |
| Frequent add/remove at ends | ArrayDeque | O(1) at both ends |
| Fast contains check | HashSet | O(1) lookup |
| Sorted unique elements | TreeSet | O(log n), sorted iteration |
| Key-value pairs | HashMap | O(1) get/put |
| Sorted key-value pairs | TreeMap | O(log n), sorted by key |
| Insertion-ordered map | LinkedHashMap | Predictable iteration order |
| Thread-safe map | ConcurrentHashMap | High concurrent throughput |
| Priority-based processing | PriorityQueue | O(log n) insert, O(1) min/max access |