Phase 1 — Programming & Data Structure Foundations

Target level: Easy → low-Medium Expected duration: 2 weeks (12-week track) / 4 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~5 labs/week + 30–50 problems applying every data structure under the framework


Why This Phase Exists

Phase 0 fixed your execution — how to read, communicate, derive constraints, brute force, optimize, test, and recover. Phase 1 fixes your vocabulary: the data structures and language-runtime concepts that 95% of every coding interview rests on.

If you cannot, on demand, state the amortized cost of a dynamic-array push, the worst-case behavior of a hash map under adversarial keys, why s += c in a Python loop is O(N²), what happens to your for-loop when you mutate the collection it iterates, and the difference between deep and shallow copy in your language — you do not have a foundation. You have a list of words.

This phase makes the foundation real. Every concept comes with: internal representation, complexity table, memory behavior, language-specific gotchas (Python, Java, Go, C++, JS/TS), interview traps, common bugs, and testing strategy.


What You Will Be Able To Do After This Phase

  • Pick the correct data structure for any Easy/Medium problem in under 60 seconds.
  • State the worst-case, average-case, and amortized complexity of every operation on every fundamental DS.
  • Predict your code’s memory and cache behavior, not just its asymptotic time.
  • Write idiomatic code in your interview language without falling into language-specific traps.
  • Recognize when a problem is really a hash map / heap / monotonic stack problem in disguise.
  • Implement, from scratch, every data structure listed below — without notes — in under 20 minutes each.

Concepts To Master

You must master every item below before moving to Phase 2. Pattern problems in Phase 2 assume fluency with these primitives.

Data Structures

  1. Arrays — operations, complexity, memory layout, cache behavior, dynamic resizing amortization, gotchas per language
  2. Strings — immutability, encoding pitfalls, concat-in-loop blowup, substring complexity
  3. Hash Maps — hashing, collisions, load factor, adversarial inputs, ordered vs unordered, custom hash for tuples
  4. Hash Sets — operations, set algebra, when to use vs map
  5. Linked Lists — singly/doubly, sentinels, tail pointer, common manipulation patterns
  6. Stacks — array-backed vs linked, monotonic stack preview
  7. Queues — deque, ring buffer, priority queue preview
  8. Heaps — binary heap, sift up/down, complexity, top-K pattern
  9. Sorted arrays / sorted sets — binary search, bisect, sorted-set complexity per language
  10. Trees — binary, BST, traversals iterative + recursive, balanced vs unbalanced
  11. Tries — operations, space cost, alphabet size considerations
  12. Graphs — adjacency list, matrix, edge list, when each
  13. Disjoint Set Union — path compression + union by rank
  14. Bitsets / Bit Manipulation — set/clear/popcount, common idioms
  15. Counters / Multisets — Counter, HashMap+counter pattern, multiset alternatives

Runtime Concepts

  1. Stack vs heap memory
  2. Scope and lifetime
  3. Value vs reference semantics
  4. Mutable vs immutable
  5. Hash collisions and adversarial keys
  6. Iterator invalidation
  7. Garbage collection basics (refcount vs tracing)
  8. Memory leaks (especially in GC’d languages)
  9. Deep vs shallow copy
  10. Recursion depth and stack overflow

Why These Concepts Matter In Interviews

Most “I knew the algorithm but couldn’t get it to pass” failures aren’t algorithmic. They are foundation failures:

  • “My hash map is slow” → adversarial collision pattern in the input.
  • “My recursion crashed” → no idea Python’s default recursion limit is 1000.
  • “I edited the list while iterating and got weird output” → iterator invalidation.
  • “My BFS queue is slow” → using list.pop(0) instead of deque.popleft.
  • “Java said ConcurrentModificationException” → the JDK’s fail-fast iterator policy.
  • “Go map iteration ordered differently every run” → intentional non-determinism for hash-flooding defense.
  • “C++ vector reference invalidated after push_back” → reallocation moved the storage.
  • “JS object stringified my int keys to strings” → all object keys are strings; use Map.

Every one of these is on the rubric somewhere as “implementation correctness” or “language fluency.” Phase 1 closes them.


Inline Data Structure Reference

The remainder of this README is a data-structure and runtime reference manual. Read it linearly the first time. Skim it as a reference any time you forget a complexity, language API, or gotcha.


1. Arrays

Internal Representation

A contiguous block of memory holding fixed-size elements indexed by offset. Static arrays have fixed size; dynamic arrays (Python list, Java ArrayList, Go slice, C++ vector, JS Array) wrap a static array and grow it geometrically.

Operations and Complexity

OperationStaticDynamic (avg)Dynamic (worst)
Index read/write a[i]O(1)O(1)O(1)
Append push_backn/aO(1) amortizedO(N) (resize)
Prepend / insert middlen/aO(N)O(N)
Pop endn/aO(1)O(1)
Pop frontn/aO(N)O(N)
Search (unsorted)O(N)O(N)O(N)
Search (sorted, binary search)O(log N)O(log N)O(log N)
LengthO(1)O(1)O(1)

Memory Layout & Cache Behavior

Contiguous memory means the CPU prefetcher can stream elements predictively, giving arrays the best cache locality of any data structure. A linear scan over an int[] is typically 5–20× faster than the same scan over a linked list of the same length, even though both are O(N). When constants matter (HFT, hot loops), this difference dominates.

Dynamic Resizing Amortization

Geometric growth (typically 2× or 1.5×) gives O(1) amortized append: doubling means total work to grow from 1 to N is 1 + 2 + 4 + … + N = 2N - 1, amortized to O(1) per push. Linear growth (+1) gives O(N) amortized — never use it.

Language-Specific Gotchas

  • Python: list overallocates with growth factor ~1.125; arr.pop(0) is O(N); use collections.deque for queue. Lists are heterogeneous (each slot is a PyObject*), defeating cache locality. Use array.array or NumPy for primitive-typed storage.
  • Java: ArrayList is Object[] — boxing for primitives. Use int[] for hot paths. ArrayList.remove(0) is O(N). Arrays.asList(arr) returns a fixed-size view, not a real ArrayList.
  • Go: Slices are a 3-tuple (ptr, len, cap). append may or may not reallocate — appending to one slice can silently mutate another sharing storage. Always check capacity. s = s[:0] reuses storage; s = nil releases it.
  • C++: std::vector<T> reallocates on push_back past capacity, invalidating all iterators and references. Reserve up front when you know the size. vector<bool> is a packed bitset, not bool[] — its operator[] returns a proxy.
  • JS/TS: Arrays are dense or sparse; sparse arrays (a[1000000] = 1) have terrible perf. arr.shift() is O(N). Holes (new Array(10)) skip during forEach but not during for.

Common Interview Traps

  • “Insert in the middle” — be sure they don’t actually want a different DS.
  • “In-place” — explicitly disallows the cheap copy-to-new-array solution.
  • Off-by-one at boundaries: [l, r) vs [l, r].

2. Strings

Internal Representation

An array of code units (bytes for char[], UTF-16 code units in Java/JS, variable-width in Python 3 with PEP 393 latin-1/UCS-2/UCS-4). In most languages, strings are immutable — every “modification” allocates a new string.

Operations and Complexity

OperationComplexity
Index s[i]O(1) for fixed-width encoding, O(i) for UTF-8 by codepoint
LengthO(1) (cached)
Concat s + tO(|s| + |t|) — new allocation
Substring s[l:r]O(r − l) (copy) in most langs, O(1) (view) in Go and Java pre-7u6
EqualityO(min len) but fast with hash compare first
Find substring (naive)O(NM)
Find substring (KMP / Z)O(N + M)

Immutability

Java/Python/JS strings are immutable. So this loop:

s = ""
for c in chars:
    s += c   # O(N) per iteration → O(N²) total

This bug shows up in real interviews and gets candidates dinged for not knowing language internals. Use "".join(chars) (Python), StringBuilder (Java), [].join('') (JS), strings.Builder (Go).

Encoding Pitfalls

  • “String length” in characters vs bytes vs grapheme clusters can all differ. "é" may be 1 codepoint or 2 (e + combining accent) → 2 codepoints, 1 grapheme.
  • Python len("😀") == 1; Java "😀".length() == 2 (UTF-16 surrogate pair).
  • JS "😀".length === 2 for the same reason; iterate with for…of to get codepoints.
  • Always clarify: “Are inputs ASCII?” — if no, ask whether the unit of “character” is byte, codepoint, or grapheme.

Substring Complexity

Substring extraction copies the underlying chars in most modern languages. Go strings are immutable byte slices and substring is O(1) view (but converting to/from []byte is O(N)).

Language-Specific Gotchas

  • Python: strings are immutable; use lists of chars and "".join() for building. s[::-1] is idiomatic reverse.
  • Java: String s += c → quadratic. Use StringBuilder. String.intern() exists but rarely needed in interviews.
  • Go: string is immutable bytes; []byte(s) and string(b) each allocate. Range over a string yields (i, rune), not (i, byte).
  • C++: std::string is mutable; SSO (small string optimization) keeps short strings on the stack. s.c_str() is null-terminated.
  • JS/TS: strings are UTF-16 code units; emoji and non-BMP chars are 2 units long.

3. Hash Maps

Internal Representation

A bucket array, indexed by hash(key) % capacity. Collisions resolved by either separate chaining (linked list / tree per bucket — Java since 8 promotes long chains to red-black trees) or open addressing (Python, Ruby — probe sequence in the same array).

Operations and Complexity

OperationAverageWorst
InsertO(1)O(N) (all collide)
LookupO(1)O(N)
DeleteO(1)O(N)
IterateO(N + capacity)O(N + capacity)

Load Factor

The capacity / size ratio. When load factor exceeds a threshold (~0.75 typical), the table doubles and rehashes — this is amortized O(1) per insert but O(N) for the resize itself.

Adversarial Inputs

A static, public hash function lets an attacker craft N keys that all hash to the same bucket → O(N²) insertion. Real-world history: this brought down Java web servers in 2003 and PHP in 2011. Modern languages (Python PYTHONHASHSEED, Go map randomization, Java tree fallback) defend against this.

In an interview, if the problem says “the input may be adversarial,” do not rely on hash maps for worst-case bounds — use sorting + binary search or a balanced BST.

Ordered vs Unordered

  • Insertion-ordered: Python 3.7+ dict, Java LinkedHashMap, JS Map.
  • Sorted (by key): Java TreeMap, C++ std::map, Python sortedcontainers.SortedDict.
  • Hash, no order guarantee: C++ std::unordered_map, Java HashMap (unordered as of 8+), Go map (intentionally randomized).

If the problem requires ordered iteration, do not use a plain hash map.

Custom Hash for Tuples / Composite Keys

  • Python: tuples of hashable items are hashable for free.
  • Java: must implement both equals and hashCode for any custom key class. Forgetting one is a top-10 interview bug.
  • Go: map keys must be “comparable types” — structs of comparables work, slices/maps don’t.
  • C++: must specialize std::hash<T> or pass a custom hasher to unordered_map.
  • JS: object keys are coerced to strings; use Map for object-keyed maps.

Common Interview Traps

  • Mutating a key after insertion (its hash changes; the map can’t find it).
  • Iterating while mutating → ConcurrentModificationException (Java) / RuntimeError (Python).
  • Assuming O(1) without acknowledging worst case.

4. Hash Sets

Operations and Complexity

Same as hash map (a hash set is conceptually a hash map with null values).

OperationAverageWorst
add / contains / removeO(1)O(N)
unionO(|A| + |B|)
intersectionO(min(|A|, |B|))
differenceO(|A|)

Set Algebra

  • Union: A ∪ B — Python A | B, Java A.addAll(B), C++ manual.
  • Intersection: A ∩ B — iterate the smaller, lookup in the larger.
  • Difference: A \ B — iterate A, skip if in B.
  • Symmetric difference: A △ B(A ∪ B) \ (A ∩ B).

Set vs Map: When To Choose

Use a set when you only need presence; use a map when you need associated value (count, index, parent, etc.). Many “use a set” problems become trivially easier with a map (e.g., Two Sum needs a value → index map, not a set).

Language-Specific Gotchas

  • Python: set and frozenset; tuples are hashable, lists are not.
  • Java: HashSet, LinkedHashSet, TreeSet.
  • Go: no built-in set — use map[T]struct{} (zero-byte value).
  • C++: std::unordered_set, std::set (sorted).
  • JS/TS: Set preserves insertion order; objects are reference-equal.

5. Linked Lists

Singly vs Doubly

  • Singly: each node has next. Reverse, cycle detection, two-pointer dance.
  • Doubly: each node has next and prev. Required for O(1) erase given an iterator (LRU cache pattern).

Sentinels (Dummy Nodes)

A dummy head node simplifies edge cases: “what if the list is empty?” “what if we delete the first node?” become non-special. Always use a dummy when you have to return a head pointer that may change.

dummy = ListNode(0)
dummy.next = head
prev = dummy
# … operations on prev.next …
return dummy.next

Tail Pointer

Maintaining a tail pointer makes append O(1) (otherwise O(N)). Always-mark whether your manipulation invalidates the tail.

Common Manipulation Patterns

  • Reverse iteratively: three pointers prev, curr, next.
  • Reverse recursively: reverse(head.next) then head.next.next = head; head.next = None.
  • Find middle: slow/fast pointers; fast moves 2× slow.
  • Detect cycle: Floyd’s tortoise and hare.
  • Merge two sorted lists: dummy head + zip pattern.
  • Remove Nth from end: lead pointer ahead by N.

Operations and Complexity

OperationSinglyDoubly
IndexO(N)O(N)
Insert at known nodeO(1)O(1)
Delete at known nodeO(N)*O(1)
SearchO(N)O(N)

*Singly: O(N) because you need the previous node; O(1) if you have the previous pointer.

Language-Specific Gotchas

  • Python: define class ListNode: __slots__ = ('val', 'next') for memory; default uses a __dict__.
  • Java: LinkedList<T> exists but is rarely the right choice; ArrayDeque beats it on most operations.
  • Go: standard library has container/list (doubly linked, generic-erased before Go 1.18).
  • C++: std::list (doubly), std::forward_list (singly).
  • JS: typically write class Node { constructor(v) { this.val = v; this.next = null; } }.

6. Stacks

Implementations

Array-backed (dynamic array, push/pop end) or linked (push/pop head). Array-backed is faster in practice due to cache locality.

Operations and Complexity

OperationComplexity
pushO(1) amortized
popO(1)
peek (top)O(1)

Monotonic Stack Preview

A monotonic stack maintains elements in increasing or decreasing order. Used for next-greater / next-smaller / largest rectangle in histogram. Preview only — full pattern in Phase 2.

for x in arr:
    while stack and stack[-1] < x:
        # process element being popped: x is its next-greater
        stack.pop()
    stack.append(x)

Language-Specific Gotchas

  • Python: use list directly with append/pop. Don’t use queue.LifoQueue (locks).
  • Java: prefer ArrayDeque over the legacy Stack (synchronized, slow).
  • Go: slice with s = append(s, x) and s[len(s)-1] / s = s[:len(s)-1].
  • C++: std::stack<T> adapter on std::deque; or just use vector.
  • JS: array push/pop.

7. Queues

Variants

  • Plain queue (FIFO): enqueue rear, dequeue front.
  • Deque (double-ended): push/pop both ends in O(1).
  • Ring buffer / circular buffer: fixed-capacity deque on a static array.
  • Priority queue: see Heaps.

Operations and Complexity

OperationLinkedArray dequeRing buffer
enqueue rearO(1)O(1) amortizedO(1) (if not full)
dequeue frontO(1)O(1)O(1)
peek frontO(1)O(1)O(1)

Language-Specific Gotchas

  • Python: collections.deque for queue; never use list.pop(0) (O(N)).
  • Java: ArrayDeque<T> for both stack and queue. LinkedList works but slower. Avoid java.util.Queue<T> q = new LinkedList<>(); for hot paths.
  • Go: no built-in deque; container/list exists. Most CP code uses a slice as a queue with q[head:] (lazy popfront).
  • C++: std::deque (random-access amortized O(1)) or std::queue adapter.
  • JS: array push/shift works but shift is O(N); use a custom ring buffer for large queues.

8. Heaps

Binary Heap

A complete binary tree where each parent ≤ children (min-heap) or ≥ (max-heap). Stored in an array: parent of i is (i-1)/2, children are 2i+1 and 2i+2.

Operations and Complexity

OperationComplexity
push (sift up)O(log N)
pop top (sift down)O(log N)
peek topO(1)
heapify (build from N elements)O(N)
decrease-keyO(log N) (if you know the index)
arbitrary deleteO(log N) (if you know the index), else O(N)

Top-K Pattern (Preview)

  • “Top K largest” → min-heap of size K. Push every element; if size > K, pop. Final heap = top K.
  • “Stream median” → max-heap (lower half) + min-heap (upper half), balanced.

Language-Specific Gotchas

  • Python: heapq is a min-heap of any orderable; for max-heap, push -x. Tuples break ties lexicographically — careful with non-orderable secondary keys.
  • Java: PriorityQueue<T> is a min-heap by default; pass Comparator.reverseOrder() for max. peek() may return null on empty.
  • Go: container/heap requires you to implement the heap.Interface. Significant boilerplate.
  • C++: std::priority_queue<T> is a max-heap by default. Use std::priority_queue<T, vector<T>, greater<T>> for min-heap. Or use make_heap + push_heap + pop_heap on a vector.
  • JS: no built-in; write your own or use a library.

9. Sorted Arrays / Sorted Sets

On a sorted array, find a target or its insertion point in O(log N). Three canonical variants:

  • Lower bound: smallest index where a[i] >= target.
  • Upper bound: smallest index where a[i] > target.
  • Exact match: lower bound + check a[i] == target.

Operations on Sorted Set / Multiset

OperationComplexity
InsertO(log N)
EraseO(log N)
Find / lower_bound / upper_boundO(log N)
Iterate in orderO(N)
Min / maxO(1) (or O(log N))
Kth elementO(log N) (with order statistics tree) or O(K) iteration

Language-Specific Gotchas

  • Python: bisect.bisect_left / bisect_right for sorted lists. sortedcontainers.SortedList for an ordered multiset (O(log N) insert).
  • Java: TreeSet<T> / TreeMap<K,V>; floor, ceiling, higher, lower are essential APIs to know.
  • Go: no standard sorted set — must implement or use a third-party library.
  • C++: std::set / std::multiset (red-black tree); lower_bound / upper_bound member functions.
  • JS: no standard sorted set; use a sorted array with binary search or write a treap.

10. Trees

Binary Tree Definitions

  • Binary tree: each node has ≤ 2 children.
  • BST: left subtree < node < right subtree (in-order traversal yields sorted sequence).
  • Balanced (AVL, RB): height O(log N) guaranteed.
  • Unbalanced: worst case O(N) (degenerate to linked list).

Traversals (Recursive)

def inorder(n):
    if not n: return
    inorder(n.left)
    visit(n)
    inorder(n.right)

Pre-order: visit, left, right. Post-order: left, right, visit. Level-order: BFS with a queue.

Traversals (Iterative)

  • In-order with stack: push left chain, pop and visit, then go right.
  • Pre-order with stack: push root, pop, visit, push right then left.
  • Post-order with stack: trickier — use a marker or a 2-stack trick.
  • Morris traversal: O(1) extra space using threaded pointers; advanced.

Operations and Complexity (Balanced)

OperationBalancedUnbalanced
Insert / delete / searchO(log N)O(N)
Min / maxO(log N)O(N)
In-order traversalO(N)O(N)

Language-Specific Gotchas

  • Python: sys.setrecursionlimit; CPython has no tail-call elimination.
  • Java: TreeMap is red-black; recursion depth limited by JVM stack (~1000s).
  • Go: no standard balanced BST.
  • C++: std::map / std::set are red-black; iterators traverse in order.
  • JS: no standard balanced BST; recursion depth is engine-dependent.

11. Tries

Internal Representation

Each node has up to alphabet-size children (e.g., 26 for lowercase, 256 for byte, larger for unicode). End-of-word flag per node.

Operations and Complexity

OperationComplexity
Insert word of length LO(L)
Search word of length LO(L)
Prefix searchO(L)
SpaceO(total chars × alphabet size)

Alphabet Size Considerations

Fixed array per node: O(σ) memory per node, fast O(1) child lookup. HashMap per node: O(actual children) memory, slightly slower lookup. For 26 letters use array; for full unicode use hash map.

Language-Specific Gotchas

  • Python: dict of dicts; can use defaultdict(dict).
  • Java: Map<Character, TrieNode> or TrieNode[26].
  • Go: struct with [26]*TrieNode.
  • C++: struct with TrieNode* children[26]. Manual memory management or unique_ptr.
  • JS: plain objects or Map.

12. Graphs

Representations

RepresentationSpaceEdge queryIterate neighbors
Adjacency listO(V + E)O(deg(v))O(deg(v))
Adjacency matrixO(V²)O(1)O(V)
Edge listO(E)O(E)O(E)

When To Use Each

  • Adjacency list: sparse graphs (E ≪ V²) — almost always the right answer for interviews.
  • Adjacency matrix: dense graphs (E ≈ V²), Floyd-Warshall, when V is small (≤ 500).
  • Edge list: Kruskal’s MST, when you need to sort edges, when graph is given as edges and you don’t need neighbor queries.

Common Forms in Interviews

  • List<List<Integer>> adjacency.
  • Map<String, List<String>> for non-integer node IDs.
  • Implicit graph (grid: neighbors are (±1, 0) and (0, ±1)).

Language-Specific Gotchas

  • Python: defaultdict(list) is ideal for adj[u].append(v).
  • Java: List<List<Integer>> with explicit ArrayList initialization in a loop; primitive int adjacency lists need third-party (Eclipse Collections, fastutil).
  • Go: [][]int slice-of-slices.
  • C++: vector<vector<int>>; for performance, use vector<int> with offsets (CSR format).
  • JS: array of arrays or Map<string, string[]>.

13. Disjoint Set Union (Union-Find)

Operations

  • find(x): which component is x in?
  • union(x, y): merge components of x and y.

Optimizations

  • Path compression: during find, set parent of every visited node to the root.
  • Union by rank/size: attach the shorter tree under the taller.
  • Together: O(α(N)) per op (inverse Ackermann — practically constant, ≤ 4 for any realistic N).

Naive vs Optimized

Variantfindunion
NaiveO(N)O(N)
Path compression onlyO(log N) amortizedO(log N) amortized
Path compression + union by rankO(α(N)) amortizedO(α(N)) amortized

Reference Implementation (Python)

parent = list(range(N))
rank = [0] * N

def find(x):
    while parent[x] != x:
        parent[x] = parent[parent[x]]   # path compression
        x = parent[x]
    return x

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry: return False
    if rank[rx] < rank[ry]: rx, ry = ry, rx
    parent[ry] = rx
    if rank[rx] == rank[ry]: rank[rx] += 1
    return True

Language-Specific Gotchas

  • Python: the recursive form blows past the recursion limit for N > 1000; always iterative.
  • Java: prefer int[] parent over Map<Integer, Integer> for primitive perf.
  • Go: straightforward with []int.
  • C++: vector<int> parent and rank.
  • JS: typed array Int32Array for speed.

14. Bitsets / Bit Manipulation

Common Idioms

OperationIdiom
Set bit ix | (1 << i)
Clear bit ix & ~(1 << i)
Toggle bit ix ^ (1 << i)
Test bit i(x >> i) & 1
Lowest set bitx & -x
Pop lowest set bitx & (x - 1)
Popcountlanguage builtin (__builtin_popcount, Integer.bitCount, bin(x).count('1'))
Iterate subsets of masks = mask; while s > 0: …; s = (s - 1) & mask

Bitsets

A packed array of bits — 64× the density and 64× the throughput of bool[] for many ops. Use when N up to ~10⁵ and you need fast set operations.

Language-Specific Gotchas

  • Python: ints are arbitrary precision; no overflow but no SIMD either. bin(x).count('1') works but int.bit_count() (3.10+) is faster.
  • Java: int is 32-bit, long is 64-bit. Negative numbers: >> is arithmetic, >>> is logical.
  • Go: untyped constants vs typed; explicit cast required (int(x), uint32(x)).
  • C++: std::bitset<N> for compile-time-known N; vector<bool> is a specialization (proxy reference).
  • JS: bitwise ops coerce to 32-bit signed int — beware truncation. Use BigInt for 64-bit ops.

15. Counters / Multisets

Counter Pattern

A hash map from key to count. Used for: frequency analysis, anagram detection, sliding window distinct-element count.

Operations

OperationComplexity
Increment count[k]O(1)
Decrement / remove if zeroO(1)
Total countO(distinct keys)
Sorted by countO(N log N)

Multiset

A counter is essentially a multiset (allows duplicates, remembers count). For an ordered multiset (need min/max/kth in order), use a TreeMap+count or sortedcontainers.SortedList.

Language-Specific Gotchas

  • Python: collections.CounterCounter(s), most_common(k), arithmetic operators (c1 - c2 drops zeros).
  • Java: HashMap<K, Integer> with getOrDefault(k, 0) + 1 and merge(k, 1, Integer::sum). There is no built-in Counter.
  • Go: map[K]int. Manual increment.
  • C++: std::unordered_map<K, int>; ++m[k] works because default-constructed int is 0.
  • JS: Map (preserves insertion order) or plain object (string keys only).

Inline Runtime Concepts Reference

These concepts cut across all data structures. They are interview rubric line items in their own right.


1. Stack vs Heap Memory

The call stack holds function frames: locals, args, return addresses. Fast, fixed-size (typically 1–8 MB in interview environments). Allocations are pointer-bump and bound to the frame’s lifetime.

The heap is dynamically managed memory — new, malloc, Python objects, JVM objects. Slower allocation, can be GC’d or manually freed. Survives beyond the function that created it.

Why It Matters In Interviews

  • “Why does my recursion overflow at depth 10⁵?” → call stack ~1 MB / ~64 bytes per frame ≈ 16K frames before crash.
  • “Why is my linked list slower than my array?” → heap-allocated nodes scattered in memory, no cache locality.

Per-Language Gotchas

  • Python: all “values” except small ints are heap objects. Locals are name bindings, not stack-allocated values.
  • Java: primitives in locals are stack-allocated; objects always heap. Escape analysis can sometimes elide a heap alloc.
  • Go: escape analysis decides stack vs heap; &x in a returned closure forces heap allocation.
  • C++: explicit (int x; is stack, new int is heap). Stack allocation is much faster.
  • JS: all objects are heap; primitives may be stack-internal but the language hides it.

2. Scope and Lifetime

Scope = where a name is visible. Lifetime = how long the value lives.

These can differ! In a closure, a variable’s scope ends with the function but its lifetime extends as long as the closure references it.

Per-Language Gotchas

  • Python: late binding in closures — [lambda: i for i in range(3)] all return 2, not 0/1/2.
  • Java: local variables captured by lambdas must be effectively final.
  • Go: loop variable capture changed in Go 1.22 — pre-1.22, for i := range … { go func() { use(i) }() } captures the same i.
  • C++: dangling reference if you return &local. Lifetime extension via const& is a niche rule.
  • JS: var is function-scoped, let/const are block-scoped. Hoisting trap.

3. Value vs Reference Semantics

Does assigning or passing a variable copy the value or share a reference?

LanguagePrimitivesObjects/Arrays
Pythonby value (immutable)by reference
Javaby valuereference passed by value
Goby value (struct, array)slice/map/chan = ref-ish
C++by value (default)explicit & or * for ref
JSby valueby reference

Trap

“I passed my array to the function, modified it inside, and the caller saw the change!” — yes, because the array (Python list, Java int[], JS array) is passed by reference.

“I passed my int to the function, modified it inside, but the caller didn’t see the change!” — yes, because the int is by value.


4. Mutable vs Immutable

Immutable values cannot be modified after creation; “modification” returns a new value.

LanguageImmutable types
Pythonstr, int, tuple, frozenset, bytes
JavaString, all primitive wrappers, LocalDate, etc.
Gostring
C++const-qualified
JSprimitives (string, number, boolean, undefined, null, bigint, symbol)

Trap

  • Using a mutable object as a hash map key, then mutating it → key lost.
  • “Why is s += c slow?” — string is immutable, every iteration copies the whole string.

5. Hash Collisions and Adversarial Keys

A hash map’s O(1) average requires a “good” hash function and “uniform” inputs. An adversary who knows the hash function can craft keys all colliding to one bucket → O(N²) blowup.

Defenses

  • Random seed (Python’s PYTHONHASHSEED, Go map random seed) — attacker can’t predict the function.
  • Tree fallback — Java HashMap since 8 converts long collision chains into red-black trees, capping worst case at O(log N).
  • Cryptographic hash — overkill, but immune.

Interview Note

If the problem explicitly says “adversarial” or “competitive” inputs, do not rely on hash maps. Use a sorted structure (TreeMap, sortedcontainers, std::map).


6. Iterator Invalidation

Modifying a collection while iterating can break the iterator.

Behaviors

  • Python: mutating a dict during iteration raises RuntimeError.
  • Java: ConcurrentModificationException (fail-fast iterator).
  • Go: map iteration order is randomized; modifying the map mid-iteration is technically allowed but the new keys may or may not be visited.
  • C++: vector::push_back invalidates all iterators if it reallocates. unordered_map::insert invalidates all on rehash.
  • JS: Map and Set iteration sees later insertions; deletions are honored.

Safe Pattern

Collect mutations into a list during iteration; apply after the loop.


7. Garbage Collection Basics

Two main strategies:

  • Reference counting (refcount): each object has a count of references to it; when 0, freed. Fast but cannot collect cycles. CPython uses refcount + cycle collector.
  • Tracing GC (mark-and-sweep, generational): periodically traces from roots; unreachable objects freed. Java, Go, JS, C# use variants.

Why It Matters

  • Refcount: del x is immediate; deterministic destruction.
  • Tracing: deallocation happens “later” — pauses (“stop-the-world”) historically, mostly amortized in modern collectors.

Per-Language Gotchas

  • Python: cycles between objects with __del__ finalizers may never collect (pre-3.4).
  • Java: “GC pause” is a real concern in latency-sensitive interviews; mention G1, ZGC.
  • Go: GC is concurrent and non-generational; predictable sub-millisecond pauses.
  • C++: no GC — must delete what you new. RAII (smart pointers) automates this.
  • JS: V8 uses generational GC.

8. Memory Leaks (in GC’d Languages)

A “leak” in a GC’d language means: the object is unreachable from the developer’s intent, but reachable from the GC’s perspective — so it’s never freed.

Common Sources

  • Listeners not removed: event handler holds a reference to the listener forever.
  • Caches without eviction: map grows monotonically.
  • Closure capture: closure holds reference to large enclosing object.
  • Static fields: lives forever.
  • ThreadLocals not cleared in pooled threads: classic JVM leak.

Per-Language Gotchas

  • Python: circular refs with finalizers; module-level state.
  • Java: ThreadLocal + thread pool; classloader leaks (PermGen / Metaspace).
  • Go: goroutine leak (blocking on a channel that never receives).
  • C++: real memory leak via missing delete.
  • JS: detached DOM nodes, timers not cleared.

9. Deep vs Shallow Copy

  • Shallow copy: new container, but elements are shared references.
  • Deep copy: recursively copies elements too.

When It Matters

Backtracking: if you store snapshots of a mutable list, you need a deep (or at least one-level) copy, otherwise all snapshots reflect the latest state.

result = []
path = []
def backtrack(...):
    if done: result.append(path[:])  # MUST copy; otherwise all entries are the same list

Per-Language Gotchas

  • Python: list(x) / x[:] shallow; copy.deepcopy(x) deep.
  • Java: clone() is shallow by default; deep requires explicit traversal.
  • Go: copy(dst, src) shallow; for deep, write your own.
  • C++: copy constructor: shallow by default for raw pointers; smart pointers and STL containers do “value” copies.
  • JS: spread [...arr] and {...obj} are shallow; deep via structuredClone(x) (modern) or JSON round-trip (lossy).

10. Recursion Depth and Stack Overflow

Each recursive call adds a frame to the call stack. The stack has a fixed size; exceeding it crashes (or in Python, raises RecursionError).

Default Limits

  • Python: sys.getrecursionlimit() defaults to 1000. Raise with sys.setrecursionlimit(10**6) plus threading.stack_size(...).
  • Java: ~10K frames typical; tune with -Xss.
  • Go: stacks start small (8KB) and grow up to 1 GB.
  • C++: thread stack ~1 MB default; tune at thread creation.
  • JS: ~10K frames typical, engine-dependent.

Mitigation

  • Convert to iteration with an explicit stack (DFS, in-order traversal).
  • Tail-call optimization is not present in most popular languages (no Python, no Java, no JS, no Go). C++ and Scheme can do it.
  • For trees, balance matters: a degenerate (chain-shaped) tree blows recursion at depth N, not log N.

Recommended Problem Categories

After mastering the concepts, drill these problem categories. Each maps to a lab below.

CategorySample Problems
Array manipulationRotate Array, Move Zeros, Plus One, Spiral Matrix
String mechanicsReverse Words, Implement strStr, Length of Last Word
Hash map / setTwo Sum, Group Anagrams, Contains Duplicate II, Intersection of Two Arrays
Linked listReverse Linked List, Merge Two Sorted Lists, Palindrome Linked List, Linked List Cycle
Stack / queueValid Parentheses, Min Stack, Implement Queue using Stacks, Daily Temperatures
HeapKth Largest in Stream, Last Stone Weight, K Closest Points to Origin
Binary searchFirst Bad Version, Search Insert Position, Find Peak Element
RecursionGenerate Parentheses, Letter Combinations of a Phone Number, Subsets
Tree traversalInorder Traversal (iterative + recursive), Symmetric Tree, Same Tree, Binary Tree Paths

Aim for ~30 problems across these categories minimum (more for the 6/12-month tracks).


Mastery Checklist

  • Implement, from scratch, in 20 minutes each, with passing tests: dynamic array, linked list (singly + doubly), stack, queue (deque), binary heap, hash map (open addressing), trie, union-find.
  • State worst-case, average-case, and amortized complexity for every operation in the table above.
  • Explain why s += c in a loop is O(N²) in three of {Python, Java, JS, Go}.
  • Write iterative in-order, pre-order, and post-order tree traversals (no recursion).
  • Solve any LeetCode Easy in 15 minutes using the framework.
  • Solve any LeetCode Easy-Medium hash-map problem in under 20 minutes.
  • Recognize when a problem needs a heap vs sorted set vs counter without prompting.
  • Articulate the iterator invalidation rules for your interview language.
  • Articulate refcount vs tracing GC for your interview language.
  • Predict whether a recursive solution will overflow on N = 10⁵ in your interview language.

Exit Criteria

Move to Phase 2 only when:

  1. Mastery checklist 100% complete.
  2. All 9 labs completed with passing tests.
  3. Solved ≥ 30 problems across the recommended categories using the framework, each reviewed via REVIEW_TEMPLATE.md.
  4. Implemented every primary data structure from scratch — no notes, no internet, just the language’s basic features.
  5. Self-mocked one Easy and one low-Medium problem with full framework execution; passed using READINESS_CHECKLIST.md section criteria for “data-structure fluency.”
  6. For every failure during these problems, ran the diagnosis in FAILURE_ANALYSIS.md and queued it via SPACED_REPETITION.md.

If any item fails, do not move on. Phase 2 patterns assume Phase 1 fluency. Without it, you will plateau.


Hands-On Labs

Complete in order. Each lab uses the strict 22-section format used throughout the curriculum.

  1. Lab 01 — Array Fundamentals
  2. Lab 02 — String Mechanics
  3. Lab 03 — Hashmap Mastery
  4. Lab 04 — Linked List Pointers
  5. Lab 05 — Stack & Queue Applications
  6. Lab 06 — Heap Priority
  7. Lab 07 — Binary Search Fundamentals
  8. Lab 08 — Recursion & Stack
  9. Lab 09 — Tree Traversal Fundamentals

Common Mistakes In Phase 1

  • Memorizing complexities without understanding them — when an interviewer probes “why is dict insert O(1) on average?”, you must derive it.
  • Skipping “from scratch” implementations — using import heapq is fine for problems, but you must be able to write the heap yourself.
  • Treating Python list as a queuepop(0) is O(N); use deque.
  • Forgetting to check overflow in C++/Java — int + int overflows silently; cast to long.
  • Recursion in Python without setting the limit — N=10⁴ trees blow the default stack.
  • Confusing == with is / equals / === — identity vs equality differs by language.
  • Using a mutable object as a hash key.
  • Not knowing your language’s iterator invalidation rules.

If any of these still trip you up, you are not done with Phase 1.