p14 — Diameter of Binary Tree
Source: LeetCode 543 · Easy (lies — this is a Medium in disguise) · Topics: Tree, DFS Companies (2024–2025 frequency): Meta (very high — flagship), Amazon (high), Microsoft (high), Google (medium), Bloomberg (medium), Apple (medium) Loop position: mid-loop algorithmic round; the “Easy” label is misleading — most candidates fail this on first try.
1. Quick Context
This is THE most important trick in the entire tree chapter. The technique is called the two-channel pattern (or “return-one-thing-while-tracking-another”). The recursive function returns the HEIGHT of a subtree. As a side effect, at each node, it computes a candidate answer (left_height + right_height) and updates a closure-captured best if larger. The parent doesn’t need the candidate — it needs the height to continue the recursion. The candidate just passes through.
Once you internalize this, you have unlocked:
- LC 543 (this) — diameter (longest path of edges).
- LC 124 — Max Path Sum (Hard; the same pattern; return-max-gain, accumulate global-best including BOTH-sides-at-node).
- LC 687 — Longest Univalue Path (same pattern; gated by value equality).
- LC 1372 — Longest ZigZag (same pattern; track two heights per node — left-going and right-going).
- LC 1973 — Count Nodes Equal to Sum of Descendants (return sum, accumulate count).
- LC 1325 — Delete Leaves with Given Value (return new node, with side-effect deletion).
It is THE template for “find a global optimum over all paths/structures within a tree.”
What it looks like it tests: find the longest path between any two nodes. What it actually tests: can you write a recursion that returns ONE thing (height for parent’s use) while accumulating ANOTHER (best diameter as side effect)? Most candidates incorrectly try to make the recursion return the diameter.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/diameter-of-binary-tree/
STOP. 7-min timer (this one is genuinely harder than its “Easy” label).
3. Prerequisite Concepts
- p12 — Max Depth (you already write height in your sleep)
- Recursion with side-effects via
nonlocalor a list-of-one trick - The mental shift: “the value the function RETURNS is different from the value being COMPUTED.”
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “Given the root of a binary tree, return the longest path between any two nodes, measured in edges. The path may or may not pass through the root.”
Step 2 — Clarify:
- “Edges or nodes?” (Edges per LC. Single-node tree → diameter 0. Two-node tree → diameter 1.)
- “Path can curve through any node, not just root?” (Yes — that’s the whole point.)
- “Single node — diameter 0?” (Yes, edges = 0.)
- “Empty tree — what?” (LC won’t pass empty; defensively return 0.)
Step 3 — Constraints: N up to 10^4. O(N) achievable; the wrong approach is O(N²).
Step 4 — Examples:
[1,2,3,4,5]→ 3 (path 4→2→1→3 or 5→2→1→3).[1,2]→ 1.[1]→ 0.- Long left-skewed chain → length-1 = N-1.
- A “T-shape” where the long path goes through a non-root node.
Step 5 — Brute Force: For every node, compute height(left) + height(right), track max. Height itself is O(N). Total O(N²). State this then optimize.
Step 6 — Complexity (optimal): O(N) time (each node visited once), O(H) stack space.
Step 7 — Pattern Recognition: “Find a global maximum over all subtree-rooted paths” → two-channel post-order (return one quantity, accumulate another).
Step 8 — Optimize from O(N²) to O(N): the insight is that we can compute heights bottom-up in a SINGLE pass and simultaneously check the candidate-diameter at each node. Each node’s candidate-diameter is height(left) + height(right) in EDGES. Capture this side-effect into a closure variable.
Step 9 — Prove correctness: Every path between two nodes in a tree has a unique HIGHEST node (the LCA — lowest-common-ancestor of the endpoints, which is the topmost node on the path). At that node, the path consists of edges going down through the left subtree to one endpoint AND edges going down through the right subtree to the other endpoint. The number of such edges is exactly height(left) + height(right) (where height = max edges from node to any descendant leaf, with leaf height = 0). So if we check left_height + right_height at EVERY node and take the maximum, we examine every possible path’s candidate exactly once at its LCA. Correct.
5. Progressive Hints
If stuck for 7 min, open hints.md. This problem is famously where candidates need hints — don’t feel bad.
6. Deeper Insight — The Two-Channel Pattern
The mental shift. Beginners try: “make diameter(node) return the diameter of the subtree rooted at node.” This is HARD because the diameter of a subtree could pass through the node OR be entirely in the left subtree OR entirely in the right subtree — and you also need the height to combine with the parent. Two-channel solves this by separating concerns:
def height(node):
if node is None:
return -1 # edge convention: height of empty = -1; height of leaf = 0
lh = height(node.left)
rh = height(node.right)
# CHANNEL A (side effect): candidate diameter at THIS node = lh + rh + 2 edges
# (the +2 accounts for the edge from node to each subtree root)
nonlocal best
best = max(best, lh + rh + 2)
# CHANNEL B (return): height for parent's use
return 1 + max(lh, rh)
Or, the more common edge-convention: height of empty = 0, height of leaf = 1, and diameter-at-node = lh + rh (the +2 is absorbed):
def height(node):
if node is None: return 0
lh = height(node.left); rh = height(node.right)
nonlocal best
best = max(best, lh + rh) # diameter through this node, in edges
return 1 + max(lh, rh)
This second form is cleaner and what we’ll use. The intuition: height here counts nodes from this node down to deepest leaf (so leaf → 1). Then lh + rh measures edges because each side contributes its node-count, and edges-between-them is one less than the total node count along the path, BUT we don’t include the LCA itself, hence lh + rh directly equals edge count.
Wait, let me re-derive carefully because this is exactly where candidates trip:
- Let
h(node) = node-count from node to deepest leaf in subtree, inclusive(so leaf is 1, empty is 0). - For a node with subtrees of node-heights
lhandrh:- Path through node:
(lh nodes on left path) + (rh nodes on right path) + 1 for the node itself=lh + rh + 1NODES. - But we want EDGES, which is
nodes - 1=lh + rh + 1 - 1=lh + rhedges. ✓
- Path through node:
So best = max(best, lh + rh) is correct for edge-count diameter.
Why nonlocal best not return-the-tuple? You can return a tuple (height, best_so_far), but each parent must MERGE the two children’s bests — best = max(left_best, right_best, this_diameter). Works, more verbose. The nonlocal (or list-of-one) trick collapses the merge into “just always write to the shared variable.” Either is fine; demonstrate awareness of both.
Why this generalizes to LC 124 (Max Path Sum, Hard): SAME structure. Return type: max-gain from this node going DOWN (always ≥ 0, since we can choose not to include a negative subtree). Side effect: best path-sum at this node = node.val + max(0, left_gain) + max(0, right_gain) (we allow both sides because the path bends here). Critical: the RETURN must NOT include both sides (because the parent can only use a single-direction descent through this node; the bent path terminates here). The asymmetry between “what to return” and “what to accumulate” IS the trick.
Why LC 687 (Longest Univalue Path): SAME structure. Return: longest single-direction same-value chain from this node down. Side effect: longest both-sides through this node. Gate each side’s contribution on node.left.val == node.val.
Why LC 1372 (Longest ZigZag): SAME structure but return TWO heights per call: (zig_left, zig_right) — longest zigzag going first-left vs first-right. Update best with max of these. The output is max(best - 1) adjusted.
Two-channel is the single most reusable tree technique. Internalize it now.
7. Anti-Pattern Analysis
Wrong-but-tempting #1 — Computing height at each node (O(N²)):
def diameter(node):
if node is None: return 0
here = height(node.left) + height(node.right)
return max(here, diameter(node.left), diameter(node.right))
Each height call is O(N); called at N nodes = O(N²). Works on small inputs, TLE on large. The two-channel pattern is the fix.
Wrong-but-tempting #2 — Returning diameter from the recursion:
def diameter(node):
if node is None: return 0
return max(diameter(node.left), diameter(node.right), 1 + ??? )
You can’t combine — the parent doesn’t know the children’s heights anymore. Either (a) return BOTH (height, best) as tuple, or (b) capture best as side effect. Plain return doesn’t work.
Wrong-but-tempting #3 — Confusing nodes vs edges:
LC counts EDGES. Diameter of [1, 2] is 1 (one edge), not 2 (two nodes). Many candidates return node-count and get off-by-one. Always ask “edges or nodes?” upfront.
Wrong-but-tempting #4 — Updating best based on returned value instead of using children’s heights:
def height(node):
if node is None: return 0
h = 1 + max(height(node.left), height(node.right))
nonlocal best
best = max(best, h) # WRONG: this measures depth, not diameter
return h
This computes max depth (which we already did in p12), not diameter. The diameter side-effect MUST be lh + rh, not h.
Wrong-but-tempting #5 — Forgetting that path may not pass through root:
def diameterOfBinaryTree(root):
return height(root.left) + height(root.right)
Wrong: the longest path might be entirely within the left subtree. Must check at EVERY node, not just root. (This is also why we use the two-channel pattern.)
8. Skills & Takeaways
Generalizable patterns:
- Two-channel post-order — return one thing, accumulate another. THE most reusable tree trick.
- LCA-of-path observation — every path has a unique highest node; check candidates there.
nonlocalfor accumulation — closure-captured mutable state in recursion.
Analogous problems:
- LC 124 — Max Path Sum (Hard): return max-gain (≥0); accumulate
node.val + max(0,lg) + max(0,rg). - LC 687 — Longest Univalue Path: return single-direction same-value chain; accumulate both-sides.
- LC 1372 — Longest ZigZag Path: return two heights (left-first, right-first); accumulate both.
- LC 1373 — Max BST Sum: return (is_BST, min, max, sum); accumulate best sum if BST.
- LC 968 — Min Cameras to Cover Tree: return state (covered/needs-camera/has-camera); accumulate camera count.
When to NOT use two-channel: if you only need a single quantity (max depth, sum, count) — direct recursion suffices. Two-channel pays off when “what parent needs” differs from “what we’re solving for globally.”
9. When to Move On
- I can write the two-channel pattern from scratch in <5 min for diameter.
- I can articulate “function returns height; side-effect accumulates best” in one sentence.
-
I solved LC 124 (Max Path Sum) using the same template — and explained why
max(0, ...)clamps negative gains. - I solved LC 687 (Longest Univalue) using the same template.
-
My
stress_test()cross-checks against the O(N²) brute force. -
I default to
nonlocal best(or list-of-one) for the side-effect; can also write the tuple-return form on request. - I always ask “edges or nodes?” before coding.
10. Company Context
Meta (FLAGSHIP — expect this in onsite)
- The framing: “Diameter of binary tree” → immediately follows with LC 124 (Max Path Sum) as Hard extension.
- Adversarial extension: “Now return the actual two endpoint nodes, not just length.” (Carry endpoint nodes alongside height.) Or: “What if edges have weights?” (
best = lw + rwwherelw = left_height + edge_weight_to_left_child.) - What they want to hear: Recognize two-channel pattern by name; pivot to LC 124 in <5 min; handle weighted-edge extension fluently.
Amazon
- The framing: Often as “longest reporting chain in an org tree (not just to/from CEO).”
- What they want to hear: Same template; clear separation of “channel A” and “channel B.”
- Adversarial extension: “Implement on an N-ary tree.” (Instead of
lh + rh, find TOP TWO heights among all children; their sum is the candidate.) - What they want to hear: N-ary generalization in <3 min after recognizing top-two-instead-of-left-right.
Microsoft / Apple / Bloomberg
- Standard problem. Two-channel pattern; mention generalization to weighted edges or N-ary.
11. Interviewer’s Lens
| Phase | Strong | Weak | Scorecard (strong) |
|---|---|---|---|
| Reading | Asks “edges vs nodes? must pass through root?” | Assumes through root | “Disambiguates ambiguity” |
| Approach | States O(N²) brute, pivots to two-channel | Codes O(N²), submits | “Recognizes inefficiency” |
| Coding | Two-channel in <5 min with nonlocal | Tries to return diameter, struggles | “Reflex two-channel — STAFF signal” |
| Vocabulary | Names “two-channel pattern” / “post-order with side-effect” | Hand-waves | “Has the vocabulary” |
| LC 124 pivot | Solves in <5 min after diameter | Stuck on negative-value handling | “Sees the template — Staff” |
This problem is THE pivot point between Mid and Senior tree skills. Crisp two-channel = Senior. Naming the pattern + offering LC 124 unprompted = Staff.
12. Level Delta
| Level | Looks like |
|---|---|
| Mid | O(N²) brute first. Coaxed to two-channel. ~15-20 min total. |
| Senior | Recognizes O(N²) immediately, pivots to two-channel in <8 min total. Explains nonlocal cleanly. |
| Staff | Senior + names “two-channel” + offers LC 124 + N-ary generalization + weighted-edge unprompted. ~7 min total. |
| Principal | Staff + discusses production: critical-path analysis in build/CI systems, network-graph diameter for distributed-system latency analysis, generalization to graphs (BFS-from-arbitrary-then-BFS-from-farthest-found, a known graph-theory trick). |
13. Follow-up Questions
Follow-up 1: “What’s the time complexity of the obvious approach?”
“O(N²). At each of N nodes, compute height (O(N) each). LC TLE on large inputs.”
Follow-up 2: “Now max path sum (LC 124, Hard).”
“Same two-channel pattern. Return type: max gain from this node going DOWN one side, clamped to ≥ 0 (we can choose NOT to include a negative subtree, since path is allowed to end at this node). Side effect: best path-through-here = node.val + max(0, left_gain) + max(0, right_gain) — both sides allowed at this node because the path BENDS here and we won’t extend further upward through it. The RETURN must not include both sides (parent can only descend one way). O(N) time.”
Follow-up 3: “What if the tree is N-ary, not binary?”
“At each node with K children, candidate diameter = sum of the TOP TWO children’s heights. Use heapq.nlargest(2, [height(c) for c in children]) or a one-pass max1/max2 tracker. Return 1 + max(heights). O(N·K) for the heap or O(N) with one-pass.”
Follow-up 4: “What if edges have weights?”
“Replace ‘height = 1 + max(child_heights)’ with ‘height = max(child_height + edge_weight)’ for each child. Diameter = max(left_h + edge_to_left, right_h + edge_to_right) summed for both sides at each node. Same template, different arithmetic.”
Follow-up 5 (Principal signal): “Generalize to an arbitrary graph (not just tree).”
“For a tree (no cycles), the double-BFS trick is classic: (1) BFS from any node, find the farthest node A; (2) BFS from A, find the farthest node B; the distance A→B is the diameter. Provable for trees. For general graphs, diameter computation is much harder — O(N·M) BFS-from-every-node in unweighted, or O(N³) Floyd-Warshall in weighted; approximation algorithms exist for huge graphs (sampling-based). For binary trees specifically, the two-channel pattern is strictly better than double-BFS because we get O(N) with a single traversal.”
14. Full Solution Walkthrough
See solution.py. Five sections:
- TreeNode + helpers
diameter_brute— O(N²) reference: at every node, computeheight(left) + height(right), track maxdiameter_two_channel— O(N) optimal withnonlocal bestdiameter_two_channel_tuple— equivalent using returned-tuple(height, best), demonstrated for completenessstress_test— brute and both optimized variants must agree on 1000 random treesedge_cases— empty, single, two-node, three-node-T, deep-skewed, balanced
15. Beyond the Problem — Production Reality
Where two-channel shows up:
- Build-system critical-path analysis — find the longest dependency chain through a build DAG. Same template applied to DAG (with memoization).
- CI test-scheduling — find the longest serial-test chain to set the parallelism upper bound.
- Network topology analysis — diameter of a network is the worst-case hop count.
- Phylogenetic trees in computational biology — longest evolutionary path between species.
- Distributed-system latency analysis — diameter of the gossip network = worst-case message propagation time.
- Game-tree analysis — longest forced sequence.
Production caveats:
- Cycles in real graphs: if the tree assumption is broken (e.g., DAGs with shared subnodes), memoize. If true cycles exist (general graphs), use BFS-based algorithms.
- Memory: the recursion-stack size matters for very deep trees. Iterative two-channel is possible (post-order with stack of
(node, state)) but ugly; convert to iterative only if you’ve seen stack overflows. - Online updates: if the tree changes (insert/delete), diameter must be recomputed. There are advanced data structures (link-cut trees, top-trees) supporting O(log N) diameter updates — well beyond interview scope but worth naming.
Principal-engineer code review comment: “When I see two-channel recursion in code review I check three things: (1) is the side-effect variable correctly scoped (closure / nonlocal / instance attr)? (2) does the RETURN type match what callers need, distinct from the accumulated answer? (3) is the recursion stack bounded for our input? In our build-dependency code, dep chains can be 500-deep; we bumped Python’s recursion limit AND added a depth-guard to detect adversarial DAGs (>10k = bail with telemetry). The pattern is correct; the operational concerns are what separate working code from production code.”