Lab 02 — Constraints Extraction

Goal

Train the discipline of converting written constraints into algorithmic targets. Given 1 ≤ N ≤ 10^5, you should immediately think “O(N log N) is the budget, O(N^2) will TLE”.

Background Concepts

  • The constraint-to-complexity mapping (FRAMEWORK.md)
  • Operations-per-second budget on competitive judges (~10^8 simple ops/sec safe)
  • Implicit constraints (memory, integer overflow, recursion depth)

Interview Context

In live interviews, the interviewer often omits explicit constraints, expecting you to ask. Constraints discipline the algorithm choice. Candidates who skip this step often write an O(N^2) algorithm that the interviewer was hoping they’d avoid.

Problem Statement

For each of the 10 problem prompts below, derive:

  1. The complexity budget
  2. At least one algorithm family that fits
  3. At least one approach that does not fit and why

Prompts:

  1. 1 ≤ N ≤ 20 — count subsets satisfying property X
  2. 1 ≤ N ≤ 200 — shortest path in weighted graph with up to N^2 edges
  3. 1 ≤ N ≤ 10^4, queries ≤ 10^5 — range sum
  4. 1 ≤ N ≤ 10^5 — find duplicate
  5. 1 ≤ N ≤ 5 × 10^5 — kth smallest in array
  6. 1 ≤ N ≤ 10^6, integers up to 10^9 — count of pairs with sum ≤ K
  7. N ≤ 10^9, queries Q ≤ 10^5 — count of integers in [1,N] divisible by K
  8. T test cases, T ≤ 10^4, each with N ≤ 10^3 — pairwise XOR maximum
  9. 1 ≤ a, b ≤ 10^18 — compute a^b mod p
  10. Stream of up to 10^7 elements — top-K running

Constraints

The point of this lab is constraints. Treat each prompt as separate.

Clarifying Questions

For each prompt, list:

  • What’s the operation budget?
  • Is the time limit explicit (e.g., 1 sec, 2 sec)?
  • Is there a memory limit (e.g., 256 MB)?
  • Are values within int32 / int64 range?

Examples

Worked example for prompt #4 (N ≤ 10^5, find duplicate):

  • Budget: ~10^7–10^8 ops → O(N), O(N log N), O(N · log_max) all fit
  • Fits: hashset O(N), sort+scan O(N log N), Floyd’s cycle if input ∈ [1,N]
  • Doesn’t fit: O(N^2) brute force (10^10 ops)

Initial Brute Force

Not applicable — meta-lab.

Brute Force Complexity

N/A.

Optimization Path

The optimization path here is constraint-to-algorithm mapping. Memorize the table from FRAMEWORK.md.

Final Expected Approach

For each prompt, write your final answer as:

Prompt #K:
  Budget: O(...)
  Fitting algorithm family: ...
  Disqualified approach: ... because budget / memory / overflow / etc.

Data Structures Used

The point is to map N range → DS choice:

  • N ≤ 20: bitmask, recursion (no DS needed)
  • N ≤ 200: 2D arrays (Floyd, etc.)
  • N ≤ 10^4: O(N^2) DP arrays, simple sort
  • N ≤ 10^5: hashmap, heap, sorted set, segment tree, Fenwick
  • N ≤ 10^6: array + linear scan, two pointers, hash, no log factors
  • N ≤ 10^9: math, binary search on answer, sieve segment

Correctness Argument

The argument here is budget: justify why your chosen algorithm fits within ~10^8 ops/sec * time-limit. Be explicit: N · log N = 10^5 · 17 ≈ 1.7 × 10^6 — comfortably fits 1-second limit.

Complexity

For each prompt, you produce both:

  • The budget
  • The justification per the table

Implementation Requirements

Written deliverable. No code.

Tests

Not applicable for this lab.

Follow-up Questions

For prompt #6 (10^6 elements, pairs with sum ≤ K):

  • “What if values can be negative?” → may need different sort + two-pointer logic
  • “What if we want to enumerate the pairs, not just count?” → output limit changes everything
  • “What if the array streams in?” → online algorithm needed

For prompt #9 (a, b ≤ 10^18):

  • “What if p is not prime?” → can’t use Fermat’s little theorem inverse
  • “What if we need a^b exactly (no mod)?” → impossible for these sizes

Product Extension

In production, “constraint” often means “expected QPS × max payload size × peak time”. A request handler that’s O(payload_size^2) is fine for size 10 but catastrophic for size 10^4. Same intuition as competitive judges, just different vocabulary.

Language/Runtime Follow-ups

  • Python: constant factor ~30–100× slower than C++. Halve your effective budget. N=10^5 with O(N^2) is risky in Python.
  • Java: ~2–4× slower than C++. Beware autoboxing in hot loops.
  • Go: ~2× slower than C++. Map operations have higher constants than unordered_map.
  • C++: baseline. Use ios_base::sync_with_stdio(false) for fast I/O.
  • JavaScript: ~3–10× slower than C++. Avoid object lookups in hot loops; prefer typed arrays.

Common Bugs

  • Forgetting Q queries multiplies your budget (10^5 queries × 10^5 per-query = 10^10!)
  • Forgetting T test cases (e.g., T = 10^4 with O(N^2) per test, N = 10^3 → 10^10)
  • Underestimating constants in Python/JS
  • Forgetting recursion depth limits (Python default 1000)
  • Forgetting integer overflow at int32 boundary (~2 × 10^9)

Debugging Strategy

When code TLEs:

  1. Recompute total operations: outer loops × inner work × test cases × queries
  2. Check the constant factor (string concat in a loop is a classic disaster)
  3. Profile to find the actual hotspot (often I/O, not algorithm)

Deliverable

For each of the 10 prompts above, write the structured answer block. Compare yours to the table at FRAMEWORK.md.

Mastery Criteria

  • Correctly mapped 10/10 prompts to budget within 30 seconds each
  • Identified at least one disqualifying approach for each (per query / total ops)
  • Recognized the multiplier effect of T test cases / Q queries
  • Identified at least 2 prompts where Python/JS would need extra care