p76 — Dungeon Game

Source: LeetCode 174 · Hard · Topics: DP, Matrix Companies (2024–2025): Microsoft (high), Amazon (medium-high), Google (medium), Meta (medium) Loop position: The reverse-DP problem. Tests whether candidate recognizes that the natural forward DP (start → goal) is wrong here, because the answer depends on FUTURE costs that aren’t known at the time you’d write a forward state. Must fill from bottom-right to top-left.

1. Quick Context

Knight starts top-left, must reach bottom-right (princess). Each cell adds/subtracts health. Knight dies if health ≤ 0 at any point. Find minimum starting health so knight survives the entire path (knight chooses optimal path).

Strategy: Reverse DP. Let dp[i][j] = min health knight needs WHEN ENTERING cell (i,j) to survive from (i,j) to goal. Then dp[i][j] = max(1, min(dp[i+1][j], dp[i][j+1]) - dungeon[i][j]). Answer: dp[0][0].

O(m·n) time, O(m·n) space (reducible to O(n)).

🔗 https://leetcode.com/problems/dungeon-game/

STOP. 40-min timer. The first wrong approach (forward DP) will burn 20 min before the candidate realizes it doesn’t work.

3. Prerequisite Concepts

  • Grid DP fundamentals (Unique Paths, Min Path Sum).
  • Why some DPs require reverse fill direction.
  • The “min health on entry” formulation, not “max health accumulated.”

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Min starting health so knight survives all cells on some path from (0,0) to (m-1,n-1).

2. Clarify: Knight may only move RIGHT or DOWN. Health must be ≥ 1 at all times including after entering the princess cell.

3. Examples: [[-2,-3,3],[-5,-10,1],[10,30,-5]] → 7. Path: right→right→down→down with starting health 7 keeps you alive.

5. Brute force: DFS over all paths trying each starting health (or each path’s min health requirement). Exponential.

6. Target: O(m·n).

7. Pattern: Reverse DP on grid.

8. Develop: see solution.py.

9. Test: all positive cells (need 1); all negative; 1×1; path with the “valley then peak” trap (health drops below 1 even though path total is positive).

5. Progressive Hints

See hints.md.

6. Deeper Insight

6.1 Why forward DP fails

Try dp[i][j] = “min starting health to reach (i,j) alive.” Recurrence would need min(dp[i-1][j], dp[i][j-1]). Problem: the OPTIMAL path to (i,j) might require less starting health, but force more health LATER. You can’t decide locally without knowing the future. This is the classic “future affects past” pitfall.

Concrete counterexample: at cell (i,j), path A arrives with 1 health and path B arrives with 100 health; later cells require 50 health buffer. Path A appears “optimal” locally (less starting health) but path B is actually feasible. Forward DP picks A and produces wrong answer.

6.2 Why reverse DP works

dp[i][j] = “min health needed at the MOMENT of entering (i,j) to survive from here to goal.” Now the future is encapsulated in dp[i+1][j] and dp[i][j+1] — we know exactly what minimum health each successor demands. We pick the cheaper successor and back-derive:

dp[i][j] = max(1, min(dp[i+1][j], dp[i][j+1]) - dungeon[i][j])

The max(1, ...) enforces the “health must stay ≥ 1” constraint: even if entering this cell gives huge bonus, you still need 1 HP to be alive entering.

6.3 Base case

dp[m-1][n-1] = max(1, 1 - dungeon[m-1][n-1]). The princess cell still must be entered with ≥ 1 HP. If dungeon[m-1][n-1] = -10, you need 11 HP entering.

Pad boundary with +inf (a sentinel for “no successor here”) to unify the recurrence — dp[m][j] = dp[i][n] = +inf for j < n / i < m respectively.

6.4 The max(1, ...) is the entire subtlety

Without it, a cell with dungeon[i][j] = +100 might give dp[i][j] = -50 (i.e., “you can enter with negative health, gain 100, and have positive health to survive the rest”). But you’d already be DEAD before entering. The max(1, ...) says: you need at least 1 HP entering, regardless of any gain you’d get.

6.5 Space optimization

Each row depends only on the row below + the right neighbor in the current row. Reduce to a 1D array of length n+1, iterating bottom-up and right-to-left.

6.6 Why this is “DP archetype 4: state defined relative to goal, not origin”

Most DPs define dp[i] as “best outcome ending at i” (origin-relative). Reverse DPs define dp[i] as “best outcome STARTING from i” (goal-relative). When the cost at i depends on what happens AFTER i (not before), you must reverse. Other examples: jump games where you need future reachability, suffix-array DPs, “min cost from here to goal” with constraints.

6.7 Connection to Bellman optimality

This IS Bellman’s equation in disguise: V*(s) = max_a R(s,a) + γV*(s'). The “max(1, …)” is a state constraint (health ≥ 1) imposed on the value function. The reverse fill is value iteration computed in topological order (the grid has no cycles, so one sweep suffices).

7. Anti-Pattern Analysis

  1. Forward DP from (0,0) — burns 20 min before failing on the “future affects past” trap.
  2. Track max-health-along-path instead of min-health-needed — wrong formulation; doesn’t account for required buffer at later cells.
  3. Forget max(1, ...) — silently wrong on cells with large positive bonus.
  4. Binary search on starting health + path feasibility check — works but O(mn · log(range)); worse than direct DP. Mention only as alternative.
  5. Try Dijkstra — graph isn’t well-defined here; the “cost” depends on path history.
  6. Iterate top-left to bottom-right with reverse formula — wrong order; you’d read uninitialized cells.

8. Skills & Takeaways

  • Reverse DP pattern recognition.
  • “Future affects past” diagnosis.
  • Goal-relative state definition.
  • max(1, ...) constraint enforcement in DP values.

9. When to Move On

  • Solve cold in 30 min with O(mn) reverse DP.
  • Articulate why forward DP fails with a concrete counterexample.
  • Optimize to O(n) space.
  • Recognize the reverse-DP archetype in other problems (e.g., Frog Jump variants, Min Falling Path with constraints).

10. Company Context

  • Microsoft: L65/L66; “reverse DP” probe. Common in Bing-team interviews.
  • Amazon: L6; tests pattern recognition + Bellman-equation framing.
  • Google: L5; appears in optimization rotations.
  • Meta: E5; rare but discriminating.

Scorecard for strong: “Diagnosed forward DP as wrong with a concrete counterexample, defined dp[i][j] as ‘min health entering this cell,’ wrote O(mn) reverse DP cold, enforced max(1, …) correctly, optimized to O(n) space, cited connection to Bellman.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff+
DiagnosisForward DP, no recognitionForward DP fails → reverse after hintRecognizes reverse cold+ diagnoses via “future affects past” lens
RecurrenceConfusedCorrect after debuggingCold with max(1, ...)+ explains why constraint is enforced as max
OptimizationNone2D onlyO(n) with right-to-left scan+ Bellman framing + value iteration analogy
GeneralizationNoneNoneOther reverse-DP examples+ RL value-iteration parallel

12. Level Delta

  • Mid (L4): Tries forward DP, fails, switches to reverse after one hint. Eventually correct.
  • Senior (L5): Cold reverse DP + correct max(1, ...) + concrete forward-DP counterexample + O(n) space.
  • Staff (L6): + Bellman optimality framing + value iteration analogy + generalizes “future-affects-past” diagnostic.
  • Principal (L7+): + RL value iteration parallel + connection to dynamic-programming-on-MDPs literature + production game-AI applications.

13. Follow-up Q&A

Q1: Knight can move in 4 directions (incl. up/left). A1: Now there are cycles in the state graph. Must use value iteration (repeated sweeps until convergence) or Dijkstra-style with proper cost function. O(mn log mn) with Dijkstra on transformed graph.

Q2: What if some cells are blocked (impassable)? A2: Set dp[i][j] = +inf for blocked cells; recurrence handles propagation naturally.

Q3: Return the actual optimal path. A3: After filling dp, trace from (0,0) choosing the direction (down or right) with smaller dp value; tie-break arbitrarily.

Q4: Knight has bounded health (≤ H). Feasibility? A4: Check dp[0][0] ≤ H. If we want to also minimize NUMBER OF HEALTH POTIONS used, problem becomes multi-objective.

Q5: Connect to RL/MDP framework. A5: Each cell is a state, moves are actions, dungeon[i][j] is the reward, terminal at (m-1, n-1). The reverse DP is value iteration on this deterministic MDP. Adding stochastic transitions makes it a real MDP requiring iterative value iteration (not one sweep).

14. Solution Walkthrough

See solution.py:

  • calculate_minimum_hp_dp — iterative O(mn) reverse DP.
  • calculate_minimum_hp_1d — O(n) space.
  • calculate_minimum_hp_memo — top-down with @lru_cache.
  • calculate_minimum_hp_brute — enumerate all paths, compute min health each, take min over paths (oracle).

15. Beyond the Problem

Principal code review: “Dungeon Game is the cleanest illustration of value iteration in a deterministic MDP — every grid DP textbook should open with it. The ‘future affects past’ diagnostic generalizes: any time a local choice’s quality depends on what happens after it, you must define the state goal-relative, not origin-relative. This is exactly why Bellman’s equation V*(s) = max_a R(s,a) + γV*(s') is written with V on both sides — the future is encoded in V*(s’), not in a path history. Production game AI (chess engines, Go bots) all run variants of this: define value goal-relative, iterate from terminal states backward. The max(1, ...) here is a state constraint, the analog of which appears in constrained MDPs and safe reinforcement learning. When you write this in 20 lines, you’re writing the kernel of an entire field.”