p19 — Convert Sorted Array to Binary Search Tree
Source: LeetCode 108 · Easy (the construction is short; the invariant is non-trivial) · Topics: Tree, BST, Divide & Conquer Companies (2024–2025 frequency): Amazon (high), Microsoft (high), Apple (medium), Meta (medium), Bloomberg (medium), Adobe (medium) Loop position: phone screen or first round; often paired with LC 109 (sorted linked list to BST) as the natural follow-up.
1. Quick Context
A sorted array IS the inorder traversal of some BST. There are exponentially many BSTs with that inorder, but the height-balanced one is uniquely (up to a left/right midpoint convention) determined by the divide-and-conquer rule:
Pick the middle element as root. Recursively build the left subtree from the left half. Recursively build the right subtree from the right half.
Three things make this problem interesting:
- The midpoint convention matters.
mid = (lo + hi) // 2(floor) andmid = (lo + hi + 1) // 2(ceiling) both give height-balanced BSTs but DIFFERENT trees. LC accepts either. State your choice. - The recurrence proves the height bound.
T(n) = T(⌊n/2⌋) + T(⌈n/2⌉) + O(1)gives O(N) build and resulting height ⌈log₂(n+1)⌉. - This is the inverse of p18’s inorder. Building from a sorted sequence ↔ reading a BST as a sorted sequence. The construction pattern (divide on middle, recurse) is the same skeleton as building a segment tree, merge sort, and balanced parenthesization DP.
What it looks like it tests: can you write a recursive constructor. What it actually tests: can you (a) state the invariant (“sorted array IS the inorder of a BST”), (b) recognize the divide-and-conquer skeleton, and (c) prove the balance property without hand-waving.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/convert-sorted-array-to-binary-search-tree/
STOP. 5-min timer. This is a fast solve once you see the divide-and-conquer; longer if you don’t.
3. Prerequisite Concepts
- p16 — the BST invariant
- p18 — inorder of a BST is sorted (this problem is the inverse construction)
- Divide-and-conquer recursion on array indices (see phase-01-foundations/labs/lab-07-binary-search-fundamentals.md)
- Master theorem (for the T(n) = 2T(n/2) + O(1) → O(N) derivation)
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “Given an integer array nums sorted in strictly ascending order, build a height-balanced BST whose inorder traversal is nums. Return the root. A height-balanced BST is one where the depths of the two subtrees of every node differ by at most 1.”
Step 2 — Clarify:
- “Strictly ascending — duplicates?” (LC: strictly ascending. No duplicates.)
- “Define height-balanced?” (Yes — abs(depth(left) - depth(right)) ≤ 1 at every node.)
- “Unique answer?” (No — LC accepts any valid height-balanced BST.)
- “Empty array?” (Return None.)
- “Should I mutate
nums?” (LC: doesn’t matter; just use indices.)
Step 3 — Constraints: N up to 10^4. O(N) construction. Resulting height = ⌈log₂(N+1)⌉.
Step 4 — Examples:
[-10,-3,0,5,9]→ root 0, left {-10, -3}, right {5, 9}.[]→ None.[1]→ single node.[1,2]→ root 1 with right 2 (or root 2 with left 1, depending on midpoint convention).[1,2,3,4,5,6,7]→ perfect balanced tree with root 4.
Step 5 — Brute Force: Insert one at a time into a vanilla BST? That degrades to a linked list (skewed depth N). Wrong — does not satisfy height-balance. State this as the wrong approach to not use.
Step 6 — Complexity (optimal): O(N) time (each index processed exactly once), O(log N) stack space (recursion depth = tree height = O(log N)).
Step 7 — Pattern Recognition: “Build a balanced structure from a sorted sequence by picking the middle” → divide-and-conquer on indices. Same skeleton as merge sort, segment tree build, binary search.
Step 8 — Develop:
def build(lo, hi): # half-open [lo, hi)
if lo == hi: return None
mid = (lo + hi) // 2
root = TreeNode(nums[mid])
root.left = build(lo, mid)
root.right = build(mid + 1, hi)
return root
return build(0, len(nums))
Step 9 — Prove balance: The recurrence on the height H(n) is H(n) = 1 + max(H(⌊n/2⌋), H(⌈n/2⌉)) with H(0) = -1. Solve: H(n) = ⌈log₂(n+1)⌉. Since the two subtrees have sizes ⌊n/2⌋ and ⌈n/2⌉ differing by at most 1, their heights differ by at most 1 (height is monotone in size for balanced builds). Therefore the tree is height-balanced. ∎
5. Progressive Hints
If stuck for 5 min, open hints.md.
6. Deeper Insight — Divide-and-Conquer on Indices
This problem is the cleanest example of the divide-and-conquer on sorted indices template:
def build(lo, hi):
if lo >= hi: return EMPTY # base case
mid = pick_middle(lo, hi)
return COMBINE(nums[mid], build(lo, mid), build(mid + 1, hi))
The same skeleton, with different pick_middle and COMBINE, gives:
| Problem | pick_middle | COMBINE |
|---|---|---|
| Sorted array → balanced BST (this) | (lo + hi) // 2 | TreeNode(val, left, right) |
| Merge sort | (lo + hi) // 2 | merge(left_sorted, right_sorted) |
| Segment tree build | (lo + hi) // 2 | Node(combine(left.agg, right.agg), left, right) |
| Quick sort (random pivot) | random | partition-then-recurse |
| Binary search | (lo + hi) // 2 | conditionally descend one side only |
Recognizing this skeleton means you have a reusable mental template for half the divide-and-conquer problems you’ll see at Phase 3+.
7. Anti-Pattern Analysis
- Insert one at a time into an empty BST. Sorted insertion degrades to a left/right skew → depth N → fails height-balance. Mention as the wrong approach.
- Off-by-one in the index range. Mixing inclusive/exclusive bounds (
[lo, hi]vs[lo, hi)) is the #1 bug. Pick one convention and use it consistently. (This README uses half-open[lo, hi).) - Allocating subarrays on each recursive call (
build(nums[:mid])). That’s O(N log N) extra work and O(N log N) memory. Pass indices, not slices. Mention this trade-off if asked. - Returning the wrong midpoint (forgetting to construct the BST around it). Some candidates accidentally write
build(lo, mid - 1)andbuild(mid + 1, hi)(inclusive[lo, hi]) withmid = (lo + hi) // 2— fine. Others mix conventions and end up double-countingmid. Pick one and stick to it. - Assuming the answer is unique. Both
(lo + hi) // 2and(lo + hi + 1) // 2produce valid height-balanced BSTs that differ when N is even. State your convention upfront.
8. Skills & Takeaways
- The divide-and-conquer-on-indices template — your reusable skeleton for merge sort, segment tree build, and BST construction.
- Pass indices, not slices — performance discipline that matters in production.
- Half-open intervals
[lo, hi)— the Knuth-recommended convention; less prone to off-by-one than[lo, hi]. - Inverse-of-inorder framing — sorted ↔ inorder of BST. This biconditional unlocks LC 109, 109 follow-ups, and serialize/deserialize problems.
- Proof of balance via height recurrence — the kind of derivation Google interviewers ask for explicitly.
9. When to Move On
Move on when:
- You can write the half-open
[lo, hi)version from memory in <2 minutes with zero off-by-one bugs. - You can prove the tree is height-balanced via the height recurrence.
- You can sketch LC 109 (sorted linked list to BST) and articulate the two approaches: (a) convert list to array first (O(N) space) or (b) inorder-construction trick (advance the list pointer in inorder order, O(1) extra space — the elegant solution).
- Your stress test validates the resulting tree is both a valid BST (via p16) AND height-balanced (via height-difference check at every node) on 1000 random sorted arrays.
10. Company Context
Amazon:
- Standard SDE-II warmup. Follow-up: LC 109 (sorted linked list → BST). Some Bar Raisers ask you to write both in the same hour.
- Watch for: “what if the array doesn’t fit in memory but is on disk?” → discuss external divide-and-conquer (read midpoint, recurse on halves via disk seeks).
- Scorecard phrase: “Coding Excellence — clean half-open indices; correctly proved balance.”
Microsoft:
- Frames as “build a balanced index from a sorted column.” Will probe “what’s the height — exactly?” Answer:
⌈log₂(N+1)⌉. - Scorecard phrase: “Algorithm — divide-and-conquer; precise height analysis.”
Apple:
- Asks the in-place variant: “build the BST without allocating new TreeNodes — reuse the array as an implicit structure.” Requires implicit-tree-via-index thinking (parent at
i, children at2i+1,2i+2). Niche but Apple-friendly. - Scorecard phrase: “Strong — handled in-place variant; understood implicit-tree representation.”
Meta:
- Often pairs with LC 109. May ask: “modify so the BST is also an AVL — would your construction need rotations?” Answer: no, the divide-by-middle construction is already height-balanced (in fact, weight-balanced), which implies AVL-balanced. Free win.
- Scorecard phrase: “Algorithm — recognized weight-balance implies height-balance; no rotations needed.”
Google:
- Will demand the height proof. Also: “what changes if
numsallowed duplicates?” → answer: pick a side convention (e.g., equals go right), still height-balanced. - Scorecard phrase: “Deep Dive — full balance proof; cleanly handled duplicates side-convention.”
Bloomberg:
- Framed as “build a balanced tree over a sorted instrument list for log-time lookup.” Follow-up: “how do you handle daily insertions?” → discuss when to rebuild vs incrementally insert (rebuild when imbalance > threshold).
- Scorecard phrase: “Solid — connected to amortized index maintenance.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff/Principal |
|---|---|---|---|---|
| First instinct | Sorted insert (skewed) | Picks middle, off-by-one | Picks middle, correct first try | Picks middle + names “divide on midpoint” pattern |
| Index convention | Mixes inclusive/exclusive | Settles on one after bug | Half-open from the start | Half-open + states convention upfront in a comment |
| Balance proof | “Trust me” | Hand-waves | States recurrence | Solves recurrence to ⌈log₂(N+1)⌉ |
| LC 109 follow-up | Stuck | “Convert to array first” | Both approaches | Implements O(1)-extra-space inorder-pointer trick |
| Generalization | None | None | Notes merge-sort similarity | Names divide-on-indices skeleton; applies to segment tree |
12. Level Delta
Mid (L4):
- Writes the recursive construction with one off-by-one debug pass.
- States “tree is balanced” without formal proof.
Senior (L5):
- Writes correct half-open construction first try.
- States the height recurrence and gives the closed-form ⌈log₂(N+1)⌉.
- Sketches LC 109 verbally.
Staff (L6):
- Frames upfront: “this is the inverse of inorder; divide-on-midpoint.” States balance proof.
- Implements LC 109 with the O(1)-extra-space pointer trick.
- Discusses when to use weight-balanced vs height-balanced vs AVL/RB (rebuild cost vs incremental cost).
Principal (L7+):
- Generalizes to the divide-and-conquer skeleton across merge sort, segment tree, and balanced parenthesization.
- Discusses external-memory variant for N too large for RAM (B-tree-style midpoint-based bulk loading).
- Connects to cache-oblivious trees (van Emde Boas layout) for memory-hierarchy performance.
13. Follow-up Q&A
Q1: How do you do this for a sorted linked list (LC 109)? A: Two approaches.
- (a) Walk the list once to build an array, then call this problem’s solution. O(N) time, O(N) extra space.
- (b) In-order construction with a list pointer: compute the length first. Recursively build the tree’s left subtree (which consumes the first half of the list), then read the current list head as this node’s value (advance pointer), then build the right subtree. The list pointer is consumed in inorder. O(N) time, O(log N) stack space, O(1) extra. The elegant answer.
Q2: How do you build the BST in-place using the array’s memory (no TreeNode allocations)?
A: Use implicit-tree-via-index representation: node i has children 2i+1, 2i+2. The construction is exactly the same divide-and-conquer; output is a heap-style array. Trades clarity for cache locality and zero allocation.
Q3: What if nums has duplicates?
A: Decide a side convention (e.g., duplicates go right). The midpoint-divide still works; height balance still holds. State the convention in a comment because callers will assume one or the other.
Q4: Could you build the BST iteratively?
A: Yes — use an explicit stack of (lo, hi, parent_node, is_left_child) quadruples. The divide-and-conquer skeleton flattens trivially. Same O(N) time, but trades stack memory for explicit-stack heap memory.
Q5: How would you parallelize this for very large nums?
A: The two recursive calls (left and right halves) are independent — pure functional divide-and-conquer. Trivially parallel with a work-stealing executor. The recursion tree has depth O(log N) so parallel speedup is bounded by available cores up to ~N/threshold.
14. Solution Walkthrough
See solution.py:
sorted_array_to_bst(nums)— half-open[lo, hi)recursive divide-and-conquer (the canonical form).sorted_array_to_bst_inclusive(nums)—[lo, hi]inclusive variant (also correct; for comparison).sorted_array_to_bst_iterative(nums)— explicit-stack iterative form for the “no recursion” follow-up.sorted_list_to_bst(head)— LC 109 with the in-order pointer trick (O(1) extra space follow-up).- Validation helpers:
is_valid_bst(root),is_height_balanced(root),inorder_values(root). stress_test()— generates 1000 random sorted arrays; for each, verifies all 3 array-based builds produce (a) a valid BST, (b) a height-balanced tree, (c) inorder == input. Also stress-tests LC 109 against the array version.edge_cases()— empty, single, two-elem, three-elem, perfect 2^k - 1 sizes.
Run: python3 solution.py → “ALL TESTS PASSED” on stdout.
15. Beyond the Problem
Principal-engineer code review comment:
“The recursion is fine, but consider whether you actually need to allocate
TreeNodeobjects at all. If this is used to build a static lookup tree that’s queried millions of times and never mutated, the heap-style array layout (children[i] = 2i+1, 2i+2) is ~2× faster on modern CPUs because it’s cache-coherent — the pointer-based tree thrashes the L1 cache after a few hundred nodes. Also: please name your indicesloandhi, notstartandend, because the next reader will spend 30 seconds figuring out ifendis inclusive or exclusive. Convention beats cleverness. Last note: themid = (lo + hi) // 2overflows in C/Java for very largehi; in Python you’re safe, but add a# (lo + (hi - lo) // 2) in non-bignum languagescomment if this might be ported. Cheap insurance against a future overflow bug that’s hellish to debug.”
Connections forward:
- p20 (BFS level order) is the output form of this tree (level-by-level read).
- Phase 3 segment trees use the same divide-on-midpoint build with
O(N)array-backed implicit tree. - Phase 5 DP problems on intervals (Matrix Chain, Burst Balloons) use the same
build(lo, hi)skeleton. - LC 109 (sorted linked list → BST) is the canonical follow-up — drill it.
- LC 1382 (Balance a BST) is the generalization: given an arbitrary BST, output a balanced one (inorder to array, then apply this algorithm).