p100 — Design In-Memory File System

Source: LeetCode 588 · Hard · Topics: Design, OOP, Trie, Path Parsing Companies (2024–2025): Google (high), Amazon (high), Meta (medium), Microsoft (high) Loop position: The canonical “design + OOP” interview — operating-system-flavored, tests both your ability to model hierarchical data and to write clean code at scale.

1. Quick Context

Implement an in-memory filesystem with four operations:

  • ls(path) → if path is a directory, return its sorted child names; if a file, return [filename].
  • mkdir(path) → create the directory; intermediate directories must be auto-created.
  • addContentToFile(path, content) → create file if absent; append content.
  • readContentFromFile(path) → return file contents.

Paths are Unix-style: absolute, separated by /, no .. or ..

🔗 https://leetcode.com/problems/design-in-memory-file-system/

STOP. 35-min timer.

3. Prerequisite Concepts

  • Trie / n-ary tree where each node is a directory or file.
  • Path splitting and traversal.
  • OOP: minimal abstraction, no over-engineering.

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Hierarchical FS with ls/mkdir/append/read.

2. Clarify: Are file and directory names disjoint per parent? Yes. Is path “/” valid for ls? Yes — returns root’s children. Append semantics: concat to existing content.

3. Examples:

  • ls("/")[]
  • mkdir("/a/b/c") → ok
  • addContentToFile("/a/b/c/d", "hello")
  • ls("/")["a"]
  • ls("/a/b/c")["d"]
  • readContentFromFile("/a/b/c/d")"hello"

5. Brute: Use a flat dict[str, str] for files and dict[str, list[str]] for directories. Works but messy.

6. Target: All ops O(path-length + result-size).

7. Pattern: Trie of Node objects. Each node carries children: dict[name, Node] and content: str (empty string = directory).

8. Develop: see solution.py.

9. Test: root ls; mkdir of nested path; ls of file; append accumulates; mkdir intermediate.

5. Progressive Hints

See hints.md.

6. Deeper Insight

6.1 Node design — file or directory in one type

class Node:
    children: dict[str, Node]
    content: str  # "" if directory, else file content

The empty-content-means-directory convention works because empty files aren’t a real LC test case. Cleaner alternative: separate Dir and File classes, or a single class with an is_file: bool flag. All three are acceptable; pick one and stay consistent.

6.2 Path parsing

parts = path.split("/")
# Drop the leading empty string from "/a/b/c" → ["", "a", "b", "c"]
# Keep only nonempty parts.
parts = [p for p in parts if p]

ls("/") produces parts == [], which means “operate on root”.

6.3 Traverse-or-create helper

def _walk(self, parts, create=False) -> Node:
    node = self.root
    for p in parts:
        if p not in node.children:
            if not create:
                return None  # or raise
            node.children[p] = Node()
        node = node.children[p]
    return node

This pattern unifies all four ops:

  • ls: walk without create; inspect node.
  • mkdir: walk WITH create.
  • addContentToFile: walk WITH create, append.
  • readContentFromFile: walk without create, return content.

6.4 Why this is “design Hard”

The algorithm is trivial. The Hard rating is API surface + correctness: 4 methods, several edge cases each, easy to write 100 lines of buggy code. Senior+ signal is showing you can write 30 clean lines with one _walk helper.

6.5 Family: hierarchical / trie design

  • LC 588 In-Memory File System (this).
  • LC 211 Add and Search Word (regex-trie).
  • LC 1166 Design File System (path → value mapping).
  • LC 642 Design Search Autocomplete System.
  • Real: Linux VFS, ZFS, FUSE userspace filesystems.

7. Anti-Pattern Analysis

  1. Separate flat dicts for files and directories — must keep them in sync; error-prone.
  2. isinstance checks everywhere — symptom of poor data model; unify with one node type.
  3. No _walk helper — open-coded traversal in each method — code duplication; bugs creep in.
  4. addContentToFile overwrites instead of appends — read the spec carefully.
  5. ls on file returns directory listing — must return [basename] when path is a file.
  6. Forget path.split("/") returns leading empty string — off-by-one on root.

8. Skills & Takeaways

  • Trie-of-nodes for hierarchical data.
  • Single _walk(create) helper unifies all ops.
  • “File or directory in one node” pattern (or its alternatives).

9. When to Move On

  • Implement in 25 min cold.
  • Explain why one node type is cleaner than two.
  • Articulate where a real FS goes beyond (inodes, permissions, journaling).

10. Company Context

  • Google: L5; “design + code” is the bar.
  • Amazon: L6 onsite.
  • Microsoft: Windows team uses this exact problem.

Strong: “Trie of nodes; each node has children dict and content string (empty = dir). A single _walk(parts, create) helper drives all four ops. Path: split on / and drop empties. ls returns sorted children, or [basename] if the node is a file.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff+
Data modelFlat dictsMixedTrie coldTrie + helper
Code qualityDuplicatedSome dupClean helperSingle helper + tests
ls on fileMissesFixesColdCold
Real FSNoneNoneMentions+ inodes, journaling

12. Level Delta

  • Mid (L4): Works with some code duplication.
  • Senior (L5): Clean trie + _walk helper.
  • Staff (L6): + extends to symlinks/permissions sketch + cites real FS (Linux VFS, FUSE).
  • Principal (L7+): + discusses inode vs dentry separation (one inode can have multiple paths via hardlinks), journaling for crash recovery (ext4, XFS), CoW snapshots (ZFS, Btrfs), and FUSE for userspace FSes.

13. Follow-up Q&A

Q1: Add rm and mv. A1: rm: walk to parent, delete from children. mv: walk to source parent, remove; walk to dest parent (create=True), insert. Atomic only if no concurrent writers.

Q2: Symbolic links. A2: New node type Symlink(target_path). On traversal, dereference (with cycle detection, max depth like Linux’s 40).

Q3: Permissions. A3: Each node gets mode, owner, group. Each op checks against current process identity.

Q4: Make it crash-safe (journaling). A4: Log every metadata change (creates, deletes, renames) to a write-ahead log; replay on startup. ext4 / NTFS use this.

Q5: Multi-threaded. A5: Per-node RWLock; or copy-on-write directory snapshots; or path-prefix lock hierarchy with strict ordering to avoid deadlock.

Q6: Hardlinks. A6: Split node into inode (data + refcount) and dentry (name pointing to inode). Multiple dentries can point to the same inode.

14. Solution Walkthrough

See solution.py:

  • Node — unified file/directory node.
  • FileSystem — four ops via single _walk helper.
  • FileSystemBrute — flat-dicts oracle.
  • Stress test driving random sequences against both.

15. Beyond the Problem

Principal code review: “Design In-Memory File System is operating-systems-101 in 50 lines. The LeetCode version is toy; production filesystems add three orthogonal axes of complexity: (1) Persistence + crash safety: every metadata mutation goes through a journal (ext4’s jbd2, XFS’s metadata log, NTFS’s $LogFile) so a power loss leaves the FS recoverable. (2) Concurrency: VFS uses per-inode locks, dcache RCU readers, and i_mutex for atomic ops; getting this right is what makes a production FS hard. (3) Storage layer: extents/blocks/B-trees (ext4 uses HTree dir index, XFS uses B+ trees, ZFS uses Merkle trees for end-to-end checksumming). The Staff+ recognition is inode/dentry separation — the data is decoupled from the name. This is what enables hardlinks (multiple names, one data), bind mounts (the same data appears at multiple paths), and elegant unlink-while-open semantics (you can rm an open file in Linux; the inode persists until the last fd closes). The LeetCode ‘children + content’ model collapses this; a senior interviewer might push you toward the separation as a follow-up. The broader lesson: filesystems are the canonical example of a hierarchical naming service, and the same patterns recur in URI routing, DNS, LDAP, package namespaces (Python imports, npm scopes), and config trees (etcd, Consul KV). The trie-with-helper is the universal data structure for hierarchical naming.”