Phase 1 — Programming & Data Structure Foundations
Target level: Easy → low-Medium Expected duration: 2 weeks (12-week track) / 4 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~5 labs/week + 30–50 problems applying every data structure under the framework
Why This Phase Exists
Phase 0 fixed your execution — how to read, communicate, derive constraints, brute force, optimize, test, and recover. Phase 1 fixes your vocabulary: the data structures and language-runtime concepts that 95% of every coding interview rests on.
If you cannot, on demand, state the amortized cost of a dynamic-array push, the worst-case behavior of a hash map under adversarial keys, why s += c in a Python loop is O(N²), what happens to your for-loop when you mutate the collection it iterates, and the difference between deep and shallow copy in your language — you do not have a foundation. You have a list of words.
This phase makes the foundation real. Every concept comes with: internal representation, complexity table, memory behavior, language-specific gotchas (Python, Java, Go, C++, JS/TS), interview traps, common bugs, and testing strategy.
What You Will Be Able To Do After This Phase
- Pick the correct data structure for any Easy/Medium problem in under 60 seconds.
- State the worst-case, average-case, and amortized complexity of every operation on every fundamental DS.
- Predict your code’s memory and cache behavior, not just its asymptotic time.
- Write idiomatic code in your interview language without falling into language-specific traps.
- Recognize when a problem is really a hash map / heap / monotonic stack problem in disguise.
- Implement, from scratch, every data structure listed below — without notes — in under 20 minutes each.
Concepts To Master
You must master every item below before moving to Phase 2. Pattern problems in Phase 2 assume fluency with these primitives.
Data Structures
- Arrays — operations, complexity, memory layout, cache behavior, dynamic resizing amortization, gotchas per language
- Strings — immutability, encoding pitfalls, concat-in-loop blowup, substring complexity
- Hash Maps — hashing, collisions, load factor, adversarial inputs, ordered vs unordered, custom hash for tuples
- Hash Sets — operations, set algebra, when to use vs map
- Linked Lists — singly/doubly, sentinels, tail pointer, common manipulation patterns
- Stacks — array-backed vs linked, monotonic stack preview
- Queues — deque, ring buffer, priority queue preview
- Heaps — binary heap, sift up/down, complexity, top-K pattern
- Sorted arrays / sorted sets — binary search, bisect, sorted-set complexity per language
- Trees — binary, BST, traversals iterative + recursive, balanced vs unbalanced
- Tries — operations, space cost, alphabet size considerations
- Graphs — adjacency list, matrix, edge list, when each
- Disjoint Set Union — path compression + union by rank
- Bitsets / Bit Manipulation — set/clear/popcount, common idioms
- Counters / Multisets —
Counter,HashMap+counterpattern, multiset alternatives
Runtime Concepts
- Stack vs heap memory
- Scope and lifetime
- Value vs reference semantics
- Mutable vs immutable
- Hash collisions and adversarial keys
- Iterator invalidation
- Garbage collection basics (refcount vs tracing)
- Memory leaks (especially in GC’d languages)
- Deep vs shallow copy
- Recursion depth and stack overflow
Why These Concepts Matter In Interviews
Most “I knew the algorithm but couldn’t get it to pass” failures aren’t algorithmic. They are foundation failures:
- “My hash map is slow” → adversarial collision pattern in the input.
- “My recursion crashed” → no idea Python’s default recursion limit is 1000.
- “I edited the list while iterating and got weird output” → iterator invalidation.
- “My BFS queue is slow” → using
list.pop(0)instead ofdeque.popleft. - “Java said
ConcurrentModificationException” → the JDK’s fail-fast iterator policy. - “Go map iteration ordered differently every run” → intentional non-determinism for hash-flooding defense.
- “C++ vector reference invalidated after push_back” → reallocation moved the storage.
- “JS object stringified my int keys to strings” → all object keys are strings; use
Map.
Every one of these is on the rubric somewhere as “implementation correctness” or “language fluency.” Phase 1 closes them.
Inline Data Structure Reference
The remainder of this README is a data-structure and runtime reference manual. Read it linearly the first time. Skim it as a reference any time you forget a complexity, language API, or gotcha.
1. Arrays
Internal Representation
A contiguous block of memory holding fixed-size elements indexed by offset. Static arrays have fixed size; dynamic arrays (Python list, Java ArrayList, Go slice, C++ vector, JS Array) wrap a static array and grow it geometrically.
Operations and Complexity
| Operation | Static | Dynamic (avg) | Dynamic (worst) |
|---|---|---|---|
Index read/write a[i] | O(1) | O(1) | O(1) |
Append push_back | n/a | O(1) amortized | O(N) (resize) |
| Prepend / insert middle | n/a | O(N) | O(N) |
| Pop end | n/a | O(1) | O(1) |
| Pop front | n/a | O(N) | O(N) |
| Search (unsorted) | O(N) | O(N) | O(N) |
| Search (sorted, binary search) | O(log N) | O(log N) | O(log N) |
| Length | O(1) | O(1) | O(1) |
Memory Layout & Cache Behavior
Contiguous memory means the CPU prefetcher can stream elements predictively, giving arrays the best cache locality of any data structure. A linear scan over an int[] is typically 5–20× faster than the same scan over a linked list of the same length, even though both are O(N). When constants matter (HFT, hot loops), this difference dominates.
Dynamic Resizing Amortization
Geometric growth (typically 2× or 1.5×) gives O(1) amortized append: doubling means total work to grow from 1 to N is 1 + 2 + 4 + … + N = 2N - 1, amortized to O(1) per push. Linear growth (+1) gives O(N) amortized — never use it.
Language-Specific Gotchas
- Python:
listoverallocates with growth factor ~1.125;arr.pop(0)is O(N); usecollections.dequefor queue. Lists are heterogeneous (each slot is aPyObject*), defeating cache locality. Usearray.arrayor NumPy for primitive-typed storage. - Java:
ArrayListisObject[]— boxing for primitives. Useint[]for hot paths.ArrayList.remove(0)is O(N).Arrays.asList(arr)returns a fixed-size view, not a realArrayList. - Go: Slices are a 3-tuple
(ptr, len, cap).appendmay or may not reallocate — appending to one slice can silently mutate another sharing storage. Always check capacity.s = s[:0]reuses storage;s = nilreleases it. - C++:
std::vector<T>reallocates onpush_backpast capacity, invalidating all iterators and references. Reserve up front when you know the size.vector<bool>is a packed bitset, notbool[]— itsoperator[]returns a proxy. - JS/TS: Arrays are dense or sparse; sparse arrays (
a[1000000] = 1) have terrible perf.arr.shift()is O(N). Holes (new Array(10)) skip duringforEachbut not duringfor.
Common Interview Traps
- “Insert in the middle” — be sure they don’t actually want a different DS.
- “In-place” — explicitly disallows the cheap copy-to-new-array solution.
- Off-by-one at boundaries:
[l, r)vs[l, r].
2. Strings
Internal Representation
An array of code units (bytes for char[], UTF-16 code units in Java/JS, variable-width in Python 3 with PEP 393 latin-1/UCS-2/UCS-4). In most languages, strings are immutable — every “modification” allocates a new string.
Operations and Complexity
| Operation | Complexity |
|---|---|
Index s[i] | O(1) for fixed-width encoding, O(i) for UTF-8 by codepoint |
| Length | O(1) (cached) |
Concat s + t | O(|s| + |t|) — new allocation |
Substring s[l:r] | O(r − l) (copy) in most langs, O(1) (view) in Go and Java pre-7u6 |
| Equality | O(min len) but fast with hash compare first |
| Find substring (naive) | O(NM) |
| Find substring (KMP / Z) | O(N + M) |
Immutability
Java/Python/JS strings are immutable. So this loop:
s = ""
for c in chars:
s += c # O(N) per iteration → O(N²) total
This bug shows up in real interviews and gets candidates dinged for not knowing language internals. Use "".join(chars) (Python), StringBuilder (Java), [].join('') (JS), strings.Builder (Go).
Encoding Pitfalls
- “String length” in characters vs bytes vs grapheme clusters can all differ.
"é"may be 1 codepoint or 2 (e + combining accent) → 2 codepoints, 1 grapheme. - Python
len("😀") == 1; Java"😀".length() == 2(UTF-16 surrogate pair). - JS
"😀".length === 2for the same reason; iterate withfor…ofto get codepoints. - Always clarify: “Are inputs ASCII?” — if no, ask whether the unit of “character” is byte, codepoint, or grapheme.
Substring Complexity
Substring extraction copies the underlying chars in most modern languages. Go strings are immutable byte slices and substring is O(1) view (but converting to/from []byte is O(N)).
Language-Specific Gotchas
- Python: strings are immutable; use lists of chars and
"".join()for building.s[::-1]is idiomatic reverse. - Java:
String s += c→ quadratic. UseStringBuilder.String.intern()exists but rarely needed in interviews. - Go:
stringis immutable bytes;[]byte(s)andstring(b)each allocate. Range over a string yields(i, rune), not(i, byte). - C++:
std::stringis mutable; SSO (small string optimization) keeps short strings on the stack.s.c_str()is null-terminated. - JS/TS: strings are UTF-16 code units; emoji and non-BMP chars are 2 units long.
3. Hash Maps
Internal Representation
A bucket array, indexed by hash(key) % capacity. Collisions resolved by either separate chaining (linked list / tree per bucket — Java since 8 promotes long chains to red-black trees) or open addressing (Python, Ruby — probe sequence in the same array).
Operations and Complexity
| Operation | Average | Worst |
|---|---|---|
| Insert | O(1) | O(N) (all collide) |
| Lookup | O(1) | O(N) |
| Delete | O(1) | O(N) |
| Iterate | O(N + capacity) | O(N + capacity) |
Load Factor
The capacity / size ratio. When load factor exceeds a threshold (~0.75 typical), the table doubles and rehashes — this is amortized O(1) per insert but O(N) for the resize itself.
Adversarial Inputs
A static, public hash function lets an attacker craft N keys that all hash to the same bucket → O(N²) insertion. Real-world history: this brought down Java web servers in 2003 and PHP in 2011. Modern languages (Python PYTHONHASHSEED, Go map randomization, Java tree fallback) defend against this.
In an interview, if the problem says “the input may be adversarial,” do not rely on hash maps for worst-case bounds — use sorting + binary search or a balanced BST.
Ordered vs Unordered
- Insertion-ordered: Python 3.7+
dict, JavaLinkedHashMap, JSMap. - Sorted (by key): Java
TreeMap, C++std::map, Pythonsortedcontainers.SortedDict. - Hash, no order guarantee: C++
std::unordered_map, JavaHashMap(unordered as of 8+), Gomap(intentionally randomized).
If the problem requires ordered iteration, do not use a plain hash map.
Custom Hash for Tuples / Composite Keys
- Python: tuples of hashable items are hashable for free.
- Java: must implement both
equalsandhashCodefor any custom key class. Forgetting one is a top-10 interview bug. - Go: map keys must be “comparable types” — structs of comparables work, slices/maps don’t.
- C++: must specialize
std::hash<T>or pass a custom hasher tounordered_map. - JS: object keys are coerced to strings; use
Mapfor object-keyed maps.
Common Interview Traps
- Mutating a key after insertion (its hash changes; the map can’t find it).
- Iterating while mutating →
ConcurrentModificationException(Java) /RuntimeError(Python). - Assuming O(1) without acknowledging worst case.
4. Hash Sets
Operations and Complexity
Same as hash map (a hash set is conceptually a hash map with null values).
| Operation | Average | Worst |
|---|---|---|
| add / contains / remove | O(1) | O(N) |
| union | O(|A| + |B|) | |
| intersection | O(min(|A|, |B|)) | |
| difference | O(|A|) |
Set Algebra
- Union:
A ∪ B— PythonA | B, JavaA.addAll(B), C++ manual. - Intersection:
A ∩ B— iterate the smaller, lookup in the larger. - Difference:
A \ B— iterate A, skip if in B. - Symmetric difference:
A △ B—(A ∪ B) \ (A ∩ B).
Set vs Map: When To Choose
Use a set when you only need presence; use a map when you need associated value (count, index, parent, etc.). Many “use a set” problems become trivially easier with a map (e.g., Two Sum needs a value → index map, not a set).
Language-Specific Gotchas
- Python:
setandfrozenset; tuples are hashable, lists are not. - Java:
HashSet,LinkedHashSet,TreeSet. - Go: no built-in set — use
map[T]struct{}(zero-byte value). - C++:
std::unordered_set,std::set(sorted). - JS/TS:
Setpreserves insertion order; objects are reference-equal.
5. Linked Lists
Singly vs Doubly
- Singly: each node has
next. Reverse, cycle detection, two-pointer dance. - Doubly: each node has
nextandprev. Required for O(1) erase given an iterator (LRU cache pattern).
Sentinels (Dummy Nodes)
A dummy head node simplifies edge cases: “what if the list is empty?” “what if we delete the first node?” become non-special. Always use a dummy when you have to return a head pointer that may change.
dummy = ListNode(0)
dummy.next = head
prev = dummy
# … operations on prev.next …
return dummy.next
Tail Pointer
Maintaining a tail pointer makes append O(1) (otherwise O(N)). Always-mark whether your manipulation invalidates the tail.
Common Manipulation Patterns
- Reverse iteratively: three pointers
prev,curr,next. - Reverse recursively:
reverse(head.next)thenhead.next.next = head; head.next = None. - Find middle: slow/fast pointers; fast moves 2× slow.
- Detect cycle: Floyd’s tortoise and hare.
- Merge two sorted lists: dummy head + zip pattern.
- Remove Nth from end: lead pointer ahead by N.
Operations and Complexity
| Operation | Singly | Doubly |
|---|---|---|
| Index | O(N) | O(N) |
| Insert at known node | O(1) | O(1) |
| Delete at known node | O(N)* | O(1) |
| Search | O(N) | O(N) |
*Singly: O(N) because you need the previous node; O(1) if you have the previous pointer.
Language-Specific Gotchas
- Python: define
class ListNode: __slots__ = ('val', 'next')for memory; default uses a__dict__. - Java:
LinkedList<T>exists but is rarely the right choice;ArrayDequebeats it on most operations. - Go: standard library has
container/list(doubly linked, generic-erased before Go 1.18). - C++:
std::list(doubly),std::forward_list(singly). - JS: typically write
class Node { constructor(v) { this.val = v; this.next = null; } }.
6. Stacks
Implementations
Array-backed (dynamic array, push/pop end) or linked (push/pop head). Array-backed is faster in practice due to cache locality.
Operations and Complexity
| Operation | Complexity |
|---|---|
| push | O(1) amortized |
| pop | O(1) |
| peek (top) | O(1) |
Monotonic Stack Preview
A monotonic stack maintains elements in increasing or decreasing order. Used for next-greater / next-smaller / largest rectangle in histogram. Preview only — full pattern in Phase 2.
for x in arr:
while stack and stack[-1] < x:
# process element being popped: x is its next-greater
stack.pop()
stack.append(x)
Language-Specific Gotchas
- Python: use
listdirectly withappend/pop. Don’t usequeue.LifoQueue(locks). - Java: prefer
ArrayDequeover the legacyStack(synchronized, slow). - Go: slice with
s = append(s, x)ands[len(s)-1]/s = s[:len(s)-1]. - C++:
std::stack<T>adapter onstd::deque; or just usevector. - JS: array
push/pop.
7. Queues
Variants
- Plain queue (FIFO): enqueue rear, dequeue front.
- Deque (double-ended): push/pop both ends in O(1).
- Ring buffer / circular buffer: fixed-capacity deque on a static array.
- Priority queue: see Heaps.
Operations and Complexity
| Operation | Linked | Array deque | Ring buffer |
|---|---|---|---|
| enqueue rear | O(1) | O(1) amortized | O(1) (if not full) |
| dequeue front | O(1) | O(1) | O(1) |
| peek front | O(1) | O(1) | O(1) |
Language-Specific Gotchas
- Python:
collections.dequefor queue; never uselist.pop(0)(O(N)). - Java:
ArrayDeque<T>for both stack and queue.LinkedListworks but slower. Avoidjava.util.Queue<T> q = new LinkedList<>();for hot paths. - Go: no built-in deque;
container/listexists. Most CP code uses a slice as a queue withq[head:](lazy popfront). - C++:
std::deque(random-access amortized O(1)) orstd::queueadapter. - JS: array
push/shiftworks butshiftis O(N); use a custom ring buffer for large queues.
8. Heaps
Binary Heap
A complete binary tree where each parent ≤ children (min-heap) or ≥ (max-heap). Stored in an array: parent of i is (i-1)/2, children are 2i+1 and 2i+2.
Operations and Complexity
| Operation | Complexity |
|---|---|
| push (sift up) | O(log N) |
| pop top (sift down) | O(log N) |
| peek top | O(1) |
| heapify (build from N elements) | O(N) |
| decrease-key | O(log N) (if you know the index) |
| arbitrary delete | O(log N) (if you know the index), else O(N) |
Top-K Pattern (Preview)
- “Top K largest” → min-heap of size K. Push every element; if size > K, pop. Final heap = top K.
- “Stream median” → max-heap (lower half) + min-heap (upper half), balanced.
Language-Specific Gotchas
- Python:
heapqis a min-heap of any orderable; for max-heap, push-x. Tuples break ties lexicographically — careful with non-orderable secondary keys. - Java:
PriorityQueue<T>is a min-heap by default; passComparator.reverseOrder()for max.peek()may return null on empty. - Go:
container/heaprequires you to implement theheap.Interface. Significant boilerplate. - C++:
std::priority_queue<T>is a max-heap by default. Usestd::priority_queue<T, vector<T>, greater<T>>for min-heap. Or usemake_heap+push_heap+pop_heapon a vector. - JS: no built-in; write your own or use a library.
9. Sorted Arrays / Sorted Sets
Binary Search
On a sorted array, find a target or its insertion point in O(log N). Three canonical variants:
- Lower bound: smallest index where
a[i] >= target. - Upper bound: smallest index where
a[i] > target. - Exact match: lower bound + check
a[i] == target.
Operations on Sorted Set / Multiset
| Operation | Complexity |
|---|---|
| Insert | O(log N) |
| Erase | O(log N) |
| Find / lower_bound / upper_bound | O(log N) |
| Iterate in order | O(N) |
| Min / max | O(1) (or O(log N)) |
| Kth element | O(log N) (with order statistics tree) or O(K) iteration |
Language-Specific Gotchas
- Python:
bisect.bisect_left/bisect_rightfor sorted lists.sortedcontainers.SortedListfor an ordered multiset (O(log N) insert). - Java:
TreeSet<T>/TreeMap<K,V>;floor,ceiling,higher,lowerare essential APIs to know. - Go: no standard sorted set — must implement or use a third-party library.
- C++:
std::set/std::multiset(red-black tree);lower_bound/upper_boundmember functions. - JS: no standard sorted set; use a sorted array with binary search or write a treap.
10. Trees
Binary Tree Definitions
- Binary tree: each node has ≤ 2 children.
- BST: left subtree < node < right subtree (in-order traversal yields sorted sequence).
- Balanced (AVL, RB): height O(log N) guaranteed.
- Unbalanced: worst case O(N) (degenerate to linked list).
Traversals (Recursive)
def inorder(n):
if not n: return
inorder(n.left)
visit(n)
inorder(n.right)
Pre-order: visit, left, right. Post-order: left, right, visit. Level-order: BFS with a queue.
Traversals (Iterative)
- In-order with stack: push left chain, pop and visit, then go right.
- Pre-order with stack: push root, pop, visit, push right then left.
- Post-order with stack: trickier — use a marker or a 2-stack trick.
- Morris traversal: O(1) extra space using threaded pointers; advanced.
Operations and Complexity (Balanced)
| Operation | Balanced | Unbalanced |
|---|---|---|
| Insert / delete / search | O(log N) | O(N) |
| Min / max | O(log N) | O(N) |
| In-order traversal | O(N) | O(N) |
Language-Specific Gotchas
- Python: sys.setrecursionlimit; CPython has no tail-call elimination.
- Java:
TreeMapis red-black; recursion depth limited by JVM stack (~1000s). - Go: no standard balanced BST.
- C++:
std::map/std::setare red-black; iterators traverse in order. - JS: no standard balanced BST; recursion depth is engine-dependent.
11. Tries
Internal Representation
Each node has up to alphabet-size children (e.g., 26 for lowercase, 256 for byte, larger for unicode). End-of-word flag per node.
Operations and Complexity
| Operation | Complexity |
|---|---|
| Insert word of length L | O(L) |
| Search word of length L | O(L) |
| Prefix search | O(L) |
| Space | O(total chars × alphabet size) |
Alphabet Size Considerations
Fixed array per node: O(σ) memory per node, fast O(1) child lookup. HashMap per node: O(actual children) memory, slightly slower lookup. For 26 letters use array; for full unicode use hash map.
Language-Specific Gotchas
- Python: dict of dicts; can use
defaultdict(dict). - Java:
Map<Character, TrieNode>orTrieNode[26]. - Go: struct with
[26]*TrieNode. - C++: struct with
TrieNode* children[26]. Manual memory management orunique_ptr. - JS: plain objects or
Map.
12. Graphs
Representations
| Representation | Space | Edge query | Iterate neighbors |
|---|---|---|---|
| Adjacency list | O(V + E) | O(deg(v)) | O(deg(v)) |
| Adjacency matrix | O(V²) | O(1) | O(V) |
| Edge list | O(E) | O(E) | O(E) |
When To Use Each
- Adjacency list: sparse graphs (E ≪ V²) — almost always the right answer for interviews.
- Adjacency matrix: dense graphs (E ≈ V²), Floyd-Warshall, when V is small (≤ 500).
- Edge list: Kruskal’s MST, when you need to sort edges, when graph is given as edges and you don’t need neighbor queries.
Common Forms in Interviews
List<List<Integer>>adjacency.Map<String, List<String>>for non-integer node IDs.- Implicit graph (grid: neighbors are
(±1, 0)and(0, ±1)).
Language-Specific Gotchas
- Python:
defaultdict(list)is ideal foradj[u].append(v). - Java:
List<List<Integer>>with explicitArrayListinitialization in a loop; primitive int adjacency lists need third-party (Eclipse Collections, fastutil). - Go:
[][]intslice-of-slices. - C++:
vector<vector<int>>; for performance, usevector<int>with offsets (CSR format). - JS: array of arrays or
Map<string, string[]>.
13. Disjoint Set Union (Union-Find)
Operations
find(x): which component isxin?union(x, y): merge components ofxandy.
Optimizations
- Path compression: during
find, set parent of every visited node to the root. - Union by rank/size: attach the shorter tree under the taller.
- Together: O(α(N)) per op (inverse Ackermann — practically constant, ≤ 4 for any realistic N).
Naive vs Optimized
| Variant | find | union |
|---|---|---|
| Naive | O(N) | O(N) |
| Path compression only | O(log N) amortized | O(log N) amortized |
| Path compression + union by rank | O(α(N)) amortized | O(α(N)) amortized |
Reference Implementation (Python)
parent = list(range(N))
rank = [0] * N
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] # path compression
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry: return False
if rank[rx] < rank[ry]: rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]: rank[rx] += 1
return True
Language-Specific Gotchas
- Python: the recursive form blows past the recursion limit for N > 1000; always iterative.
- Java: prefer
int[] parentoverMap<Integer, Integer>for primitive perf. - Go: straightforward with
[]int. - C++:
vector<int>parent and rank. - JS: typed array
Int32Arrayfor speed.
14. Bitsets / Bit Manipulation
Common Idioms
| Operation | Idiom |
|---|---|
Set bit i | x | (1 << i) |
Clear bit i | x & ~(1 << i) |
Toggle bit i | x ^ (1 << i) |
Test bit i | (x >> i) & 1 |
| Lowest set bit | x & -x |
| Pop lowest set bit | x & (x - 1) |
| Popcount | language builtin (__builtin_popcount, Integer.bitCount, bin(x).count('1')) |
| Iterate subsets of mask | s = mask; while s > 0: …; s = (s - 1) & mask |
Bitsets
A packed array of bits — 64× the density and 64× the throughput of bool[] for many ops. Use when N up to ~10⁵ and you need fast set operations.
Language-Specific Gotchas
- Python: ints are arbitrary precision; no overflow but no SIMD either.
bin(x).count('1')works butint.bit_count()(3.10+) is faster. - Java:
intis 32-bit,longis 64-bit. Negative numbers:>>is arithmetic,>>>is logical. - Go: untyped constants vs typed; explicit cast required (
int(x),uint32(x)). - C++:
std::bitset<N>for compile-time-known N;vector<bool>is a specialization (proxy reference). - JS: bitwise ops coerce to 32-bit signed int — beware truncation. Use
BigIntfor 64-bit ops.
15. Counters / Multisets
Counter Pattern
A hash map from key to count. Used for: frequency analysis, anagram detection, sliding window distinct-element count.
Operations
| Operation | Complexity |
|---|---|
| Increment count[k] | O(1) |
| Decrement / remove if zero | O(1) |
| Total count | O(distinct keys) |
| Sorted by count | O(N log N) |
Multiset
A counter is essentially a multiset (allows duplicates, remembers count). For an ordered multiset (need min/max/kth in order), use a TreeMap+count or sortedcontainers.SortedList.
Language-Specific Gotchas
- Python:
collections.Counter—Counter(s),most_common(k), arithmetic operators (c1 - c2drops zeros). - Java:
HashMap<K, Integer>withgetOrDefault(k, 0) + 1andmerge(k, 1, Integer::sum). There is no built-inCounter. - Go:
map[K]int. Manual increment. - C++:
std::unordered_map<K, int>;++m[k]works because default-constructed int is 0. - JS:
Map(preserves insertion order) or plain object (string keys only).
Inline Runtime Concepts Reference
These concepts cut across all data structures. They are interview rubric line items in their own right.
1. Stack vs Heap Memory
The call stack holds function frames: locals, args, return addresses. Fast, fixed-size (typically 1–8 MB in interview environments). Allocations are pointer-bump and bound to the frame’s lifetime.
The heap is dynamically managed memory — new, malloc, Python objects, JVM objects. Slower allocation, can be GC’d or manually freed. Survives beyond the function that created it.
Why It Matters In Interviews
- “Why does my recursion overflow at depth 10⁵?” → call stack ~1 MB / ~64 bytes per frame ≈ 16K frames before crash.
- “Why is my linked list slower than my array?” → heap-allocated nodes scattered in memory, no cache locality.
Per-Language Gotchas
- Python: all “values” except small ints are heap objects. Locals are name bindings, not stack-allocated values.
- Java: primitives in locals are stack-allocated; objects always heap. Escape analysis can sometimes elide a heap alloc.
- Go: escape analysis decides stack vs heap;
&xin a returned closure forces heap allocation. - C++: explicit (
int x;is stack,new intis heap). Stack allocation is much faster. - JS: all objects are heap; primitives may be stack-internal but the language hides it.
2. Scope and Lifetime
Scope = where a name is visible. Lifetime = how long the value lives.
These can differ! In a closure, a variable’s scope ends with the function but its lifetime extends as long as the closure references it.
Per-Language Gotchas
- Python: late binding in closures —
[lambda: i for i in range(3)]all return 2, not 0/1/2. - Java: local variables captured by lambdas must be effectively final.
- Go: loop variable capture changed in Go 1.22 — pre-1.22,
for i := range … { go func() { use(i) }() }captures the samei. - C++: dangling reference if you return
&local. Lifetime extension viaconst&is a niche rule. - JS:
varis function-scoped,let/constare block-scoped. Hoisting trap.
3. Value vs Reference Semantics
Does assigning or passing a variable copy the value or share a reference?
| Language | Primitives | Objects/Arrays |
|---|---|---|
| Python | by value (immutable) | by reference |
| Java | by value | reference passed by value |
| Go | by value (struct, array) | slice/map/chan = ref-ish |
| C++ | by value (default) | explicit & or * for ref |
| JS | by value | by reference |
Trap
“I passed my array to the function, modified it inside, and the caller saw the change!” — yes, because the array (Python list, Java int[], JS array) is passed by reference.
“I passed my int to the function, modified it inside, but the caller didn’t see the change!” — yes, because the int is by value.
4. Mutable vs Immutable
Immutable values cannot be modified after creation; “modification” returns a new value.
| Language | Immutable types |
|---|---|
| Python | str, int, tuple, frozenset, bytes |
| Java | String, all primitive wrappers, LocalDate, etc. |
| Go | string |
| C++ | const-qualified |
| JS | primitives (string, number, boolean, undefined, null, bigint, symbol) |
Trap
- Using a mutable object as a hash map key, then mutating it → key lost.
- “Why is
s += cslow?” — string is immutable, every iteration copies the whole string.
5. Hash Collisions and Adversarial Keys
A hash map’s O(1) average requires a “good” hash function and “uniform” inputs. An adversary who knows the hash function can craft keys all colliding to one bucket → O(N²) blowup.
Defenses
- Random seed (Python’s
PYTHONHASHSEED, Go map random seed) — attacker can’t predict the function. - Tree fallback — Java
HashMapsince 8 converts long collision chains into red-black trees, capping worst case at O(log N). - Cryptographic hash — overkill, but immune.
Interview Note
If the problem explicitly says “adversarial” or “competitive” inputs, do not rely on hash maps. Use a sorted structure (TreeMap, sortedcontainers, std::map).
6. Iterator Invalidation
Modifying a collection while iterating can break the iterator.
Behaviors
- Python: mutating a dict during iteration raises
RuntimeError. - Java:
ConcurrentModificationException(fail-fast iterator). - Go: map iteration order is randomized; modifying the map mid-iteration is technically allowed but the new keys may or may not be visited.
- C++:
vector::push_backinvalidates all iterators if it reallocates.unordered_map::insertinvalidates all on rehash. - JS:
MapandSetiteration sees later insertions; deletions are honored.
Safe Pattern
Collect mutations into a list during iteration; apply after the loop.
7. Garbage Collection Basics
Two main strategies:
- Reference counting (refcount): each object has a count of references to it; when 0, freed. Fast but cannot collect cycles. CPython uses refcount + cycle collector.
- Tracing GC (mark-and-sweep, generational): periodically traces from roots; unreachable objects freed. Java, Go, JS, C# use variants.
Why It Matters
- Refcount:
del xis immediate; deterministic destruction. - Tracing: deallocation happens “later” — pauses (“stop-the-world”) historically, mostly amortized in modern collectors.
Per-Language Gotchas
- Python: cycles between objects with
__del__finalizers may never collect (pre-3.4). - Java: “GC pause” is a real concern in latency-sensitive interviews; mention G1, ZGC.
- Go: GC is concurrent and non-generational; predictable sub-millisecond pauses.
- C++: no GC — must
deletewhat younew. RAII (smart pointers) automates this. - JS: V8 uses generational GC.
8. Memory Leaks (in GC’d Languages)
A “leak” in a GC’d language means: the object is unreachable from the developer’s intent, but reachable from the GC’s perspective — so it’s never freed.
Common Sources
- Listeners not removed: event handler holds a reference to the listener forever.
- Caches without eviction: map grows monotonically.
- Closure capture: closure holds reference to large enclosing object.
- Static fields: lives forever.
- ThreadLocals not cleared in pooled threads: classic JVM leak.
Per-Language Gotchas
- Python: circular refs with finalizers; module-level state.
- Java:
ThreadLocal+ thread pool; classloader leaks (PermGen/Metaspace). - Go: goroutine leak (blocking on a channel that never receives).
- C++: real memory leak via missing
delete. - JS: detached DOM nodes, timers not cleared.
9. Deep vs Shallow Copy
- Shallow copy: new container, but elements are shared references.
- Deep copy: recursively copies elements too.
When It Matters
Backtracking: if you store snapshots of a mutable list, you need a deep (or at least one-level) copy, otherwise all snapshots reflect the latest state.
result = []
path = []
def backtrack(...):
if done: result.append(path[:]) # MUST copy; otherwise all entries are the same list
Per-Language Gotchas
- Python:
list(x)/x[:]shallow;copy.deepcopy(x)deep. - Java:
clone()is shallow by default; deep requires explicit traversal. - Go:
copy(dst, src)shallow; for deep, write your own. - C++: copy constructor: shallow by default for raw pointers; smart pointers and STL containers do “value” copies.
- JS: spread
[...arr]and{...obj}are shallow; deep viastructuredClone(x)(modern) or JSON round-trip (lossy).
10. Recursion Depth and Stack Overflow
Each recursive call adds a frame to the call stack. The stack has a fixed size; exceeding it crashes (or in Python, raises RecursionError).
Default Limits
- Python:
sys.getrecursionlimit()defaults to 1000. Raise withsys.setrecursionlimit(10**6)plusthreading.stack_size(...). - Java: ~10K frames typical; tune with
-Xss. - Go: stacks start small (8KB) and grow up to 1 GB.
- C++: thread stack ~1 MB default; tune at thread creation.
- JS: ~10K frames typical, engine-dependent.
Mitigation
- Convert to iteration with an explicit stack (DFS, in-order traversal).
- Tail-call optimization is not present in most popular languages (no Python, no Java, no JS, no Go). C++ and Scheme can do it.
- For trees, balance matters: a degenerate (chain-shaped) tree blows recursion at depth N, not log N.
Recommended Problem Categories
After mastering the concepts, drill these problem categories. Each maps to a lab below.
| Category | Sample Problems |
|---|---|
| Array manipulation | Rotate Array, Move Zeros, Plus One, Spiral Matrix |
| String mechanics | Reverse Words, Implement strStr, Length of Last Word |
| Hash map / set | Two Sum, Group Anagrams, Contains Duplicate II, Intersection of Two Arrays |
| Linked list | Reverse Linked List, Merge Two Sorted Lists, Palindrome Linked List, Linked List Cycle |
| Stack / queue | Valid Parentheses, Min Stack, Implement Queue using Stacks, Daily Temperatures |
| Heap | Kth Largest in Stream, Last Stone Weight, K Closest Points to Origin |
| Binary search | First Bad Version, Search Insert Position, Find Peak Element |
| Recursion | Generate Parentheses, Letter Combinations of a Phone Number, Subsets |
| Tree traversal | Inorder Traversal (iterative + recursive), Symmetric Tree, Same Tree, Binary Tree Paths |
Aim for ~30 problems across these categories minimum (more for the 6/12-month tracks).
Mastery Checklist
- Implement, from scratch, in 20 minutes each, with passing tests: dynamic array, linked list (singly + doubly), stack, queue (deque), binary heap, hash map (open addressing), trie, union-find.
- State worst-case, average-case, and amortized complexity for every operation in the table above.
-
Explain why
s += cin a loop isO(N²)in three of {Python, Java, JS, Go}. - Write iterative in-order, pre-order, and post-order tree traversals (no recursion).
- Solve any LeetCode Easy in 15 minutes using the framework.
- Solve any LeetCode Easy-Medium hash-map problem in under 20 minutes.
- Recognize when a problem needs a heap vs sorted set vs counter without prompting.
- Articulate the iterator invalidation rules for your interview language.
- Articulate refcount vs tracing GC for your interview language.
- Predict whether a recursive solution will overflow on N = 10⁵ in your interview language.
Exit Criteria
Move to Phase 2 only when:
- Mastery checklist 100% complete.
- All 9 labs completed with passing tests.
- Solved ≥ 30 problems across the recommended categories using the framework, each reviewed via REVIEW_TEMPLATE.md.
- Implemented every primary data structure from scratch — no notes, no internet, just the language’s basic features.
- Self-mocked one Easy and one low-Medium problem with full framework execution; passed using READINESS_CHECKLIST.md section criteria for “data-structure fluency.”
- For every failure during these problems, ran the diagnosis in FAILURE_ANALYSIS.md and queued it via SPACED_REPETITION.md.
If any item fails, do not move on. Phase 2 patterns assume Phase 1 fluency. Without it, you will plateau.
Hands-On Labs
Complete in order. Each lab uses the strict 22-section format used throughout the curriculum.
- Lab 01 — Array Fundamentals
- Lab 02 — String Mechanics
- Lab 03 — Hashmap Mastery
- Lab 04 — Linked List Pointers
- Lab 05 — Stack & Queue Applications
- Lab 06 — Heap Priority
- Lab 07 — Binary Search Fundamentals
- Lab 08 — Recursion & Stack
- Lab 09 — Tree Traversal Fundamentals
Common Mistakes In Phase 1
- Memorizing complexities without understanding them — when an interviewer probes “why is dict insert O(1) on average?”, you must derive it.
- Skipping “from scratch” implementations — using
import heapqis fine for problems, but you must be able to write the heap yourself. - Treating Python list as a queue —
pop(0)is O(N); usedeque. - Forgetting to check overflow in C++/Java —
int + intoverflows silently; cast tolong. - Recursion in Python without setting the limit — N=10⁴ trees blow the default stack.
- Confusing
==withis/equals/===— identity vs equality differs by language. - Using a mutable object as a hash key.
- Not knowing your language’s iterator invalidation rules.
If any of these still trip you up, you are not done with Phase 1.