Lab 09 — Graph Modeling (Bus Routes)
Goal
Practice the modeling skill that separates competent graph candidates from L5+ candidates: given a problem with no obvious graph, invent the right node and edge definition. After this lab you should be able to enumerate at least two valid graph models for a given problem, choose the one with the smallest state space, and justify the choice in <90 seconds.
Background Concepts
The hardest interview graph questions don’t say “graph.” Examples:
- Bus Routes (LC 815): “fewest buses to take” — model: nodes = bus routes (not stops!); edges = “two routes share a stop”; multi-source BFS from all routes containing the source stop.
- Word Ladder (LC 127): nodes = words; edges = one-character difference. (See Lab 01.)
- Open the Lock (LC 752): nodes = 4-digit states; edges = single-digit ±1; BFS.
- Sliding Puzzle (LC 773): nodes = board states (encode as string); edges = legal swap; BFS.
The modeling decision space is:
- What is a node? — Often the natural object (stop, word, board state) is wrong; a higher-order or quotient object yields a smaller state space.
- What is an edge? — Direct adjacency (one move), or “shared resource” (two routes share a stop).
- Is it weighted? — If yes → Dijkstra. If unweighted → BFS.
- Sources/targets? — Single, multiple, or “any-of-set” (multi-source BFS).
- Implicit vs explicit? — Build the adjacency upfront or compute on the fly.
The Bus Routes trap: candidates model nodes = stops, edges = “stops on the same route.” Then BFS distance is steps within a route, not number of routes taken. The whole question collapses to nonsense. Modeling nodes = routes makes the BFS distance number of routes, which is the answer ± 1.
Interview Context
Bus Routes (LC 815) is asked at Google, Meta, and Amazon at L5+. The question itself is mid-level once modeled correctly; the difficulty is the modeling. Interviewers probe modeling explicitly: “tell me how you’d represent this as a graph” — this is the test. If you flounder for 5 minutes, you’re done. The senior signal: state both modelings (stops vs routes), explain why the route-model gives the right BFS distance, then code.
Problem Statement
You are given an array routes where routes[i] is a bus route that the ith bus repeats forever. For example, routes[0] = [1, 5, 7] means the 0th bus travels in the sequence 1 → 5 → 7 → 1 → 5 → 7 → … forever. You start at source and want to reach target. You can travel between stops by buses only. Return the fewest number of buses required, or -1 if impossible.
Constraints
- 1 ≤ |routes| ≤ 500
- 1 ≤ |routes[i]| ≤ 10^5
- Σ |routes[i]| ≤ 10^5
- 0 ≤ routes[i][j] < 10^6
- 0 ≤ source, target < 10^6
Clarifying Questions
- Do
source == targetcases return 0? (Yes — no bus needed.) - Is
sourceguaranteed to be on some route? (Not necessarily — return -1 if not.) - Are routes circular as stated? (Yes — but that doesn’t matter for “buses taken”; what matters is which stops a route covers.)
- Are stop numbers unique within a route? (Per LC, yes — but treat defensively.)
- Can two routes share a stop? (Yes — that’s exactly how transfers happen.)
Examples
routes = [[1,2,7],[3,6,7]], source = 1, target = 6
→ 2 (take bus 0 from 1 to 7, then bus 1 from 7 to 6)
routes = [[7,12],[4,5,15],[6],[15,19],[9,12,13]], source = 15, target = 12
→ -1
Initial Brute Force
BFS over stops as nodes and edges between any two stops sharing a route. Build adjacency: for each route, add all (stop, stop’) pairs as edges. At Σ |routes| = 10^5, with a 10^5-stop route, that’s 5 × 10^9 edges — TLE / OOM.
Brute Force Complexity
O((Σ L)²) edges in the worst case. Infeasible at Σ L = 10^5.
Optimization Path
Switch the modeling: nodes = routes, edges = “two routes share a stop.” Build a stop → list of routes containing it index in O(Σ L). For each pair of routes sharing a stop, that’s an edge — but we never enumerate all such pairs explicitly. Instead, in BFS, when we visit route r, we expand to every other route sharing any of r’s stops, looked up via the index.
To avoid revisiting, mark routes (not stops) as visited. Also mark stops as visited (after expanding all routes through that stop) to avoid the O(L²) blowup of re-expanding a popular stop.
Final BFS distance = number of routes used. Answer is the BFS layer at which we find any route containing target.
Final Expected Approach
- If
source == target, return 0. - Build
stop_to_routes: dict of stop → set of route indices. - BFS over routes. Initialize queue with all routes containing
source, distance = 1. - For each popped
(route, d), scan all stops inroutes[route]. If any istarget, return d. - Mark each stop visited (skip if already). For each unvisited stop, enqueue all unvisited routes containing it, distance d + 1.
- If queue empties without finding
target, return -1.
Data Structures Used
defaultdict(set)forstop_to_routes.collections.dequefor BFS queue.setfor visited routes and visited stops.
Correctness Argument
Distance interpretation: Initializing the queue with routes containing source at distance 1 means: distance d = “number of routes taken so far, including the current one.” When a route at distance d covers target, the answer is d.
No double-counting: Marking stops visited prevents the O(L²) blowup; marking routes visited prevents re-expansion. Both are needed: a stop is visited once we’ve added all its routes; a route is visited once we’ve expanded its stops.
BFS optimality: Standard BFS argument on the route-graph — first time a route is popped, its distance is minimum. Therefore the first time a route containing target is popped, the answer is its distance.
Complexity
| Operation | Time | Space |
|---|---|---|
| Index build | O(Σ L) | O(Σ L) |
| BFS | O(Σ L + R²) where R = routes count | O(Σ L) |
| Total | O(Σ L + R²) | O(Σ L) |
(R² because in the worst case every pair of routes shares some stop and we add an edge.)
Implementation Requirements
from collections import defaultdict, deque
def numBusesToDestination(routes, source, target):
if source == target:
return 0
stop_to_routes = defaultdict(set)
for i, r in enumerate(routes):
for s in r:
stop_to_routes[s].add(i)
if source not in stop_to_routes:
return -1
visited_routes = set()
visited_stops = {source}
queue = deque()
for r in stop_to_routes[source]:
queue.append((r, 1))
visited_routes.add(r)
while queue:
route, d = queue.popleft()
for stop in routes[route]:
if stop == target:
return d
if stop in visited_stops:
continue
visited_stops.add(stop)
for nr in stop_to_routes[stop]:
if nr not in visited_routes:
visited_routes.add(nr)
queue.append((nr, d + 1))
return -1
Tests
- Standard:
[[1,2,7],[3,6,7]], src=1, tgt=6 → 2. - src == tgt: → 0.
- src not on any route: → -1.
- Single route covering both: → 1.
- Disconnected routes: → -1.
- Long route (10^5 stops, 1 route): src and tgt on it → 1; tgt not on it → -1.
- Many routes sharing one hub: BFS expansion through hub.
Follow-up Questions
- “What if buses have different costs?” → Dijkstra over routes with edge weight = cost.
- “What’s the actual sequence of buses taken?” → Maintain parent pointers in BFS; reconstruct.
- “What if you can walk between adjacent stops?” → Add walking edges; might or might not change the model.
- “Multi-source: any of K starting stops to any of M targets.” → Multi-source BFS on the route-graph.
- “Online: routes added/removed live.” → Recompute on each query, or use dynamic-graph techniques.
Product Extension
Real transit-routing systems (Google Maps, Citymapper, Apple Maps) model transit as a multi-modal graph: nodes are stops, walking edges connect nearby stops, transit edges represent “ride a route between two of its stops.” The route-as-node model used here is a simplification useful for “fewest transfers” queries. Production systems combine with time-dependent shortest-path (CSA, RAPTOR) for actual journey planning.
Language/Runtime Follow-ups
- Python:
defaultdict(set)anddeque. The inner loop scansroutes[route]once per route popped — bound this by visiting stops only once. - Java:
HashMap<Integer, Set<Integer>>for the index;ArrayDeque<int[]>for(route, distance). - Go:
map[int]map[int]boolormap[int][]int. Slice-as-queue. - C++:
unordered_map<int, vector<int>>.queue<pair<int,int>>. Reserve capacity if Σ L is known. - JS/TS:
Map<number, Set<number>>and array-as-queue (use a deque polyfill if N large).
Common Bugs
- Modeling stops as nodes — leads to O((Σ L)²) edge count and TLE.
- Returning d instead of d - 1 (or vice versa) — make sure the distance semantics match: d = number of buses taken.
- Not marking the source stop as visited — re-expanding routes through it.
- Not marking routes as visited — exponential queue blowup.
- Returning -1 when source == target instead of 0.
- Forgetting that
source not in stop_to_routesis a -1 case. - Stack overflow in BFS (no — BFS is iterative; this is a DFS-only problem).
Debugging Strategy
For a 3-route example, print the stop_to_routes index. Walk through BFS by hand: queue contents, visited sets after each pop. Verify d increases exactly once per layer of routes. If TLE, profile to confirm visited-stop marking is preventing repeated route expansion. If wrong answer, check whether you’re returning d or d - 1.
Mastery Criteria
- Recognized “fewest buses” as a graph problem in <30 seconds.
- Enumerated both stop-as-node and route-as-node models in <90 seconds.
- Articulated why route-as-node yields the correct distance unprompted.
- Wrote BFS with both visited-routes and visited-stops sets in <12 minutes.
- Stated O(Σ L + R²) complexity unprompted.
- Solved LC 815 in <30 minutes from cold start.
- Solved LC 752 (Open the Lock) in <20 minutes by encoding state as a string.
- Solved LC 773 (Sliding Puzzle) in <30 minutes by encoding the 2D board as a string state.
- When given a new “no obvious graph” problem, can produce a correct model in <3 minutes.