p100 — Design In-Memory File System

Source: LeetCode 588 · Hard · Topics: Design, OOP, Trie, Path Parsing Companies (2024–2025): Google (high), Amazon (high), Meta (medium), Microsoft (high) Loop position: The canonical “design + OOP” interview — operating-system-flavored, tests both your ability to model hierarchical data and to write clean code at scale.

1. Quick Context

Implement an in-memory filesystem with four operations:

ls(path) → if path is a directory, return its sorted child names; if a file, return [filename].
mkdir(path) → create the directory; intermediate directories must be auto-created.
addContentToFile(path, content) → create file if absent; append content.
readContentFromFile(path) → return file contents.

Paths are Unix-style: absolute, separated by /, no .. or ..

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/design-in-memory-file-system/

STOP. 35-min timer.

3. Prerequisite Concepts

Trie / n-ary tree where each node is a directory or file.
Path splitting and traversal.
OOP: minimal abstraction, no over-engineering.

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Hierarchical FS with ls/mkdir/append/read.

2. Clarify: Are file and directory names disjoint per parent? Yes. Is path “/” valid for ls? Yes — returns root’s children. Append semantics: concat to existing content.

3. Examples:

ls("/") → []
mkdir("/a/b/c") → ok
addContentToFile("/a/b/c/d", "hello")
ls("/") → ["a"]
ls("/a/b/c") → ["d"]
readContentFromFile("/a/b/c/d") → "hello"

5. Brute: Use a flat dict[str, str] for files and dict[str, list[str]] for directories. Works but messy.

6. Target: All ops O(path-length + result-size).

7. Pattern: Trie of Node objects. Each node carries children: dict[name, Node] and content: str (empty string = directory).

8. Develop: see solution.py.

9. Test: root ls; mkdir of nested path; ls of file; append accumulates; mkdir intermediate.

6. Deeper Insight

6.1 Node design — file or directory in one type

class Node:
    children: dict[str, Node]
    content: str  # "" if directory, else file content

The empty-content-means-directory convention works because empty files aren’t a real LC test case. Cleaner alternative: separate Dir and File classes, or a single class with an is_file: bool flag. All three are acceptable; pick one and stay consistent.

6.2 Path parsing

parts = path.split("/")
# Drop the leading empty string from "/a/b/c" → ["", "a", "b", "c"]
# Keep only nonempty parts.
parts = [p for p in parts if p]

ls("/") produces parts == [], which means “operate on root”.

6.3 Traverse-or-create helper

def _walk(self, parts, create=False) -> Node:
    node = self.root
    for p in parts:
        if p not in node.children:
            if not create:
                return None  # or raise
            node.children[p] = Node()
        node = node.children[p]
    return node

This pattern unifies all four ops:

ls: walk without create; inspect node.
mkdir: walk WITH create.
addContentToFile: walk WITH create, append.
readContentFromFile: walk without create, return content.

6.4 Why this is “design Hard”

The algorithm is trivial. The Hard rating is API surface + correctness: 4 methods, several edge cases each, easy to write 100 lines of buggy code. Senior+ signal is showing you can write 30 clean lines with one _walk helper.

6.5 Family: hierarchical / trie design

LC 588 In-Memory File System (this).
LC 211 Add and Search Word (regex-trie).
LC 1166 Design File System (path → value mapping).
LC 642 Design Search Autocomplete System.
Real: Linux VFS, ZFS, FUSE userspace filesystems.

7. Anti-Pattern Analysis

Separate flat dicts for files and directories — must keep them in sync; error-prone.
isinstance checks everywhere — symptom of poor data model; unify with one node type.
No _walk helper — open-coded traversal in each method — code duplication; bugs creep in.
addContentToFile overwrites instead of appends — read the spec carefully.
ls on file returns directory listing — must return [basename] when path is a file.
Forget path.split("/") returns leading empty string — off-by-one on root.

8. Skills & Takeaways

Trie-of-nodes for hierarchical data.
Single _walk(create) helper unifies all ops.
“File or directory in one node” pattern (or its alternatives).

9. When to Move On

Implement in 25 min cold.
Explain why one node type is cleaner than two.
Articulate where a real FS goes beyond (inodes, permissions, journaling).

10. Company Context

Google: L5; “design + code” is the bar.
Amazon: L6 onsite.
Microsoft: Windows team uses this exact problem.

Strong: “Trie of nodes; each node has children dict and content string (empty = dir). A single _walk(parts, create) helper drives all four ops. Path: split on / and drop empties. ls returns sorted children, or [basename] if the node is a file.”

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff+
Data model	Flat dicts	Mixed	Trie cold	Trie + helper
Code quality	Duplicated	Some dup	Clean helper	Single helper + tests
ls on file	Misses	Fixes	Cold	Cold
Real FS	None	None	Mentions	+ inodes, journaling

12. Level Delta

Mid (L4): Works with some code duplication.
Senior (L5): Clean trie + _walk helper.
Staff (L6): + extends to symlinks/permissions sketch + cites real FS (Linux VFS, FUSE).
Principal (L7+): + discusses inode vs dentry separation (one inode can have multiple paths via hardlinks), journaling for crash recovery (ext4, XFS), CoW snapshots (ZFS, Btrfs), and FUSE for userspace FSes.

13. Follow-up Q&A

Q1: Add rm and mv. A1: rm: walk to parent, delete from children. mv: walk to source parent, remove; walk to dest parent (create=True), insert. Atomic only if no concurrent writers.

Q2: Symbolic links. A2: New node type Symlink(target_path). On traversal, dereference (with cycle detection, max depth like Linux’s 40).

Q3: Permissions. A3: Each node gets mode, owner, group. Each op checks against current process identity.

Q4: Make it crash-safe (journaling). A4: Log every metadata change (creates, deletes, renames) to a write-ahead log; replay on startup. ext4 / NTFS use this.

Q5: Multi-threaded. A5: Per-node RWLock; or copy-on-write directory snapshots; or path-prefix lock hierarchy with strict ordering to avoid deadlock.

Q6: Hardlinks. A6: Split node into inode (data + refcount) and dentry (name pointing to inode). Multiple dentries can point to the same inode.

14. Solution Walkthrough

See solution.py:

Node — unified file/directory node.
FileSystem — four ops via single _walk helper.
FileSystemBrute — flat-dicts oracle.
Stress test driving random sequences against both.

15. Beyond the Problem

Principal code review: “Design In-Memory File System is operating-systems-101 in 50 lines. The LeetCode version is toy; production filesystems add three orthogonal axes of complexity: (1) Persistence + crash safety: every metadata mutation goes through a journal (ext4’s jbd2, XFS’s metadata log, NTFS’s $LogFile) so a power loss leaves the FS recoverable. (2) Concurrency: VFS uses per-inode locks, dcache RCU readers, and i_mutex for atomic ops; getting this right is what makes a production FS hard. (3) Storage layer: extents/blocks/B-trees (ext4 uses HTree dir index, XFS uses B+ trees, ZFS uses Merkle trees for end-to-end checksumming). The Staff+ recognition is inode/dentry separation — the data is decoupled from the name. This is what enables hardlinks (multiple names, one data), bind mounts (the same data appears at multiple paths), and elegant unlink-while-open semantics (you can rm an open file in Linux; the inode persists until the last fd closes). The LeetCode ‘children + content’ model collapses this; a senior interviewer might push you toward the separation as a follow-up. The broader lesson: filesystems are the canonical example of a hierarchical naming service, and the same patterns recur in URI routing, DNS, LDAP, package namespaces (Python imports, npm scopes), and config trees (etcd, Consul KV). The trie-with-helper is the universal data structure for hierarchical naming.”

LeetCode Interview — Extreme Coding