Mock 10 — Infrastructure / Backend

Interview type: Backend / Platform / Infrastructure deep-dive coding Target role: Senior / Staff backend, distributed systems, database, storage Time limit: 75 minutes Format: Build a non-trivial backend component (KV store, log-structured index, sharded cache) Hints policy: Acceptable on the algorithm; failures on storage/concurrency fundamentals are red flags. Primary goal: Demonstrate that you can build the building blocks of real backend systems, not just consume them.


What This Mock Tests

Companies like Stripe, Snowflake, Databricks, Confluent, Cockroach Labs, and infra teams at FAANG ask coding questions that resemble small slices of their actual products. You’re expected to:

  • Understand storage primitives (logs, indexes, B-trees, LSM)
  • Reason about durability, ordering, concurrency
  • Write code that could be a starting point for a production component
  • Discuss the gap between what you built and what production would need

Scoring weights: Production Awareness (#14), Code Quality (#7), Correctness (#5), Tradeoff Reasoning (#13) are all critical.


Pick One Build

Build A — In-Memory KV Store with Snapshot Isolation

kv = KVStore()
tx = kv.begin()           # returns a transaction
tx.put("k", "v")
tx.get("k")               # returns "v" (read your writes)
tx2 = kv.begin()
tx2.get("k")              # returns None (snapshot isolation; tx1 not committed)
tx.commit()
tx2.get("k")              # still None (tx2's snapshot was taken before commit)

Required:

  • MVCC (multi-version concurrency control)
  • Snapshot reads return a consistent view
  • Concurrent writers
  • Tests for read-your-writes, isolation between tx, serialization-style conflict detection (optional)

Build B — Log-Structured Index (Mini-LSM)

db = LSMTree(memtable_threshold=1000)
db.put("k1", "v1")
db.put("k2", "v2")
db.get("k1")             # "v1"
db.delete("k1")
db.get("k1")             # None (tombstone)
# After threshold writes, memtable flushes to immutable SSTable
db.range("k1", "k9")     # iterator over keys

Required:

  • In-memory memtable (sorted, e.g., SortedList or skiplist)
  • Flush to immutable SSTable when memtable exceeds threshold
  • get checks memtable, then SSTables in reverse-time order
  • Tombstones for deletes
  • Range scan that merges across all levels
  • Tests covering: writes, reads-after-flush, range correctness, deletes

Build C — Consistent Hash Ring with Replication

ring = HashRing(replication_factor=3, virtual_nodes=128)
ring.add_node("node-A")
ring.add_node("node-B")
ring.add_node("node-C")
nodes = ring.get("user-123")    # 3 nodes responsible
ring.remove_node("node-A")       # 1/3 of keys reassign
nodes_after = ring.get("user-123")

Required:

  • Virtual nodes for load balancing
  • Replication factor enforced
  • Adding/removing a node moves only its share of keys
  • A test that verifies < 5% of keys move when a node is added (with sufficient virtual nodes)

Expected Communication Style

  1. Restate with assumptions stated upfront.
  2. Decompose before coding: which modules, what interfaces, what’s the data flow.
  3. Discuss the storage model:
    • For A: how to represent versions per key
    • For B: how to lay out SSTables; in this exercise, in-memory simulated
    • For C: ring representation; virtual nodes; lookup data structure
  4. Identify the concurrency model and discuss what fails without it.
  5. Code the core end-to-end before tackling optimization.
  6. Discuss production gaps: persistence (we’re in-memory), replication consistency (we’re local), recovery (no WAL), monitoring (none).
  7. Test the invariants, not just the happy path.

Solution Sketches

A. KV with MVCC:

class KVStore:
    def __init__(self):
        self._data = {}            # key → list of (version, value)
        self._next_version = itertools.count()
        self._lock = threading.Lock()
    
    def begin(self):
        with self._lock:
            v = next(self._next_version)
        return Transaction(self, v)

class Transaction:
    def __init__(self, store, version):
        self._store = store
        self._snapshot_version = version
        self._writes = {}     # local buffer
        self._committed = False
    
    def get(self, key):
        if key in self._writes: return self._writes[key]
        versions = self._store._data.get(key, [])
        for v, val in reversed(versions):
            if v <= self._snapshot_version: return val
        return None
    
    def put(self, key, value): self._writes[key] = value
    
    def commit(self):
        with self._store._lock:
            commit_v = next(self._store._next_version)
            for k, v in self._writes.items():
                self._store._data.setdefault(k, []).append((commit_v, v))
            self._committed = True

B. LSM Tree: SortedList for memtable; on flush, freeze into immutable sorted list (the “SSTable”). get walks memtable + SSTables newest-first. Range scan: heap-merge iterators over all levels. Tombstones represented as special sentinel.

C. Consistent Hash: sorted list of (hash(virtual_node_id), node) pairs. Lookup: hash key, binary search for next pair, walk forward to collect R distinct nodes. Add/remove: insert/delete the virtual node entries.


Common Failure Modes

  1. A: returned all versions instead of the snapshot-visible one. Snapshot isolation not implemented; just MVCC storage.
  2. A: forgot to use a local write buffer. Transactions are visible to others before commit.
  3. B: re-sorting on every read. Should sort on flush; reads are merge.
  4. B: no tombstone semantics. Deleted key still appears in older SSTable.
  5. C: hash ring without virtual nodes. Load imbalance — one node gets 60% of keys.
  6. C: re-hashing entire keyspace on node change. Defeats the purpose of consistent hashing.
  7. All: no concurrency testing. Build passes single-thread; explodes under load.

Passing Bar

  • Total score: 56/70 (average 4.0)
  • Working core implementation
  • Concurrency correct (or explicit single-threaded contract with rationale)
  • Production gaps discussed substantively
  • At least 4 tests covering invariants (not just smoke tests)
  • Code quality: production-quality

Follow-up Questions

For A:

  • Add serializable isolation. → Validation phase: on commit, abort if any key read had a newer version. (Optimistic concurrency.)
  • Garbage-collect old versions. → Track oldest active snapshot; vacuum versions older than that.
  • Persist to disk. → Write-ahead log per transaction; redo on recovery.
  • Distributed version: two-phase commit + Paxos for log replication.

For B:

  • Add bloom filters per SSTable. → Skip SSTable scan if key definitely not present.
  • Compaction strategy: leveled vs size-tiered. → Tradeoffs in write amplification.
  • Crash recovery. → WAL replay before opening memtable.
  • Range scan optimization with min/max key per SSTable.

For C:

  • Heterogeneous nodes (different capacity). → Virtual nodes proportional to capacity.
  • Read consistency across replicas. → Quorum reads (R + W > N).
  • Hinted handoff when a replica is down. → Buffer writes for offline replicas, flush on return.
  • Anti-entropy / Merkle trees. → Detect divergence between replicas.

Required Tests

  • Happy path correctness
  • Boundary case (empty store, single key, max capacity)
  • Concurrency: ≥ 8 threads, hammer the API for several seconds, verify invariants hold
  • Recovery semantics (if implemented)
  • For A: snapshot isolation — test that opens tx1, writes via tx2, commits tx2, reads via tx1 → must return the pre-commit value
  • For B: write enough to trigger flush; read works across memtable + SSTable
  • For C: load distribution test (after adding N nodes, max-min keys per node ratio ≤ 1.3)

Required Production Discussion

  • Persistence strategy and crash recovery
  • Replication model and consistency tradeoffs
  • Monitoring: latency p50/p99, throughput, queue depths, GC pauses
  • Failure modes: node loss, network partition, slow disk
  • Backpressure: what happens when writes outpace flush

Self-Evaluation Template

Mock 10 — Infrastructure / Backend
Date: _______
Build: _______
Time: ___ / 75 min

Scores (1–5):
___ Total /70

Core working? Y/N
Concurrency tests pass under stress? Y/N
Production gaps discussed unprompted? Y/N (list them: ___)

What I left out (and what it would take):

Action item:

What to Do If You Fail

  • Storage primitives unclear: Read “Designing Data-Intensive Applications” (Chapters 3 and 5).
  • Concurrency issues: Phase 9 (language/runtime) concurrency sections.
  • Code quality: A senior code review of your build.
  • Pass twice consecutively before Mock 11.