p100 — Design In-Memory File System
Source: LeetCode 588 · Hard · Topics: Design, OOP, Trie, Path Parsing Companies (2024–2025): Google (high), Amazon (high), Meta (medium), Microsoft (high) Loop position: The canonical “design + OOP” interview — operating-system-flavored, tests both your ability to model hierarchical data and to write clean code at scale.
1. Quick Context
Implement an in-memory filesystem with four operations:
ls(path)→ if path is a directory, return its sorted child names; if a file, return[filename].mkdir(path)→ create the directory; intermediate directories must be auto-created.addContentToFile(path, content)→ create file if absent; append content.readContentFromFile(path)→ return file contents.
Paths are Unix-style: absolute, separated by /, no .. or ..
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/design-in-memory-file-system/
STOP. 35-min timer.
3. Prerequisite Concepts
- Trie / n-ary tree where each node is a directory or file.
- Path splitting and traversal.
- OOP: minimal abstraction, no over-engineering.
4. How to Approach (FRAMEWORK 1–9)
1. Restate: Hierarchical FS with ls/mkdir/append/read.
2. Clarify: Are file and directory names disjoint per parent? Yes. Is path “/” valid for ls? Yes — returns root’s children. Append semantics: concat to existing content.
3. Examples:
ls("/")→[]mkdir("/a/b/c")→ okaddContentToFile("/a/b/c/d", "hello")ls("/")→["a"]ls("/a/b/c")→["d"]readContentFromFile("/a/b/c/d")→"hello"
5. Brute: Use a flat dict[str, str] for files and dict[str, list[str]] for directories. Works but messy.
6. Target: All ops O(path-length + result-size).
7. Pattern: Trie of Node objects. Each node carries children: dict[name, Node] and content: str (empty string = directory).
8. Develop: see solution.py.
9. Test: root ls; mkdir of nested path; ls of file; append accumulates; mkdir intermediate.
5. Progressive Hints
See hints.md.
6. Deeper Insight
6.1 Node design — file or directory in one type
class Node:
children: dict[str, Node]
content: str # "" if directory, else file content
The empty-content-means-directory convention works because empty files aren’t a real LC test case. Cleaner alternative: separate Dir and File classes, or a single class with an is_file: bool flag. All three are acceptable; pick one and stay consistent.
6.2 Path parsing
parts = path.split("/")
# Drop the leading empty string from "/a/b/c" → ["", "a", "b", "c"]
# Keep only nonempty parts.
parts = [p for p in parts if p]
ls("/") produces parts == [], which means “operate on root”.
6.3 Traverse-or-create helper
def _walk(self, parts, create=False) -> Node:
node = self.root
for p in parts:
if p not in node.children:
if not create:
return None # or raise
node.children[p] = Node()
node = node.children[p]
return node
This pattern unifies all four ops:
ls: walk without create; inspect node.mkdir: walk WITH create.addContentToFile: walk WITH create, append.readContentFromFile: walk without create, return content.
6.4 Why this is “design Hard”
The algorithm is trivial. The Hard rating is API surface + correctness: 4 methods, several edge cases each, easy to write 100 lines of buggy code. Senior+ signal is showing you can write 30 clean lines with one _walk helper.
6.5 Family: hierarchical / trie design
- LC 588 In-Memory File System (this).
- LC 211 Add and Search Word (regex-trie).
- LC 1166 Design File System (path → value mapping).
- LC 642 Design Search Autocomplete System.
- Real: Linux VFS, ZFS, FUSE userspace filesystems.
7. Anti-Pattern Analysis
- Separate flat dicts for files and directories — must keep them in sync; error-prone.
isinstancechecks everywhere — symptom of poor data model; unify with one node type.- No
_walkhelper — open-coded traversal in each method — code duplication; bugs creep in. addContentToFileoverwrites instead of appends — read the spec carefully.lson file returns directory listing — must return[basename]when path is a file.- Forget
path.split("/")returns leading empty string — off-by-one on root.
8. Skills & Takeaways
- Trie-of-nodes for hierarchical data.
- Single
_walk(create)helper unifies all ops. - “File or directory in one node” pattern (or its alternatives).
9. When to Move On
- Implement in 25 min cold.
- Explain why one node type is cleaner than two.
- Articulate where a real FS goes beyond (inodes, permissions, journaling).
10. Company Context
- Google: L5; “design + code” is the bar.
- Amazon: L6 onsite.
- Microsoft: Windows team uses this exact problem.
Strong: “Trie of nodes; each node has children dict and content string (empty = dir). A single _walk(parts, create) helper drives all four ops. Path: split on / and drop empties. ls returns sorted children, or [basename] if the node is a file.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff+ |
|---|---|---|---|---|
| Data model | Flat dicts | Mixed | Trie cold | Trie + helper |
| Code quality | Duplicated | Some dup | Clean helper | Single helper + tests |
| ls on file | Misses | Fixes | Cold | Cold |
| Real FS | None | None | Mentions | + inodes, journaling |
12. Level Delta
- Mid (L4): Works with some code duplication.
- Senior (L5): Clean trie +
_walkhelper. - Staff (L6): + extends to symlinks/permissions sketch + cites real FS (Linux VFS, FUSE).
- Principal (L7+): + discusses inode vs dentry separation (one inode can have multiple paths via hardlinks), journaling for crash recovery (ext4, XFS), CoW snapshots (ZFS, Btrfs), and FUSE for userspace FSes.
13. Follow-up Q&A
Q1: Add rm and mv.
A1: rm: walk to parent, delete from children. mv: walk to source parent, remove; walk to dest parent (create=True), insert. Atomic only if no concurrent writers.
Q2: Symbolic links.
A2: New node type Symlink(target_path). On traversal, dereference (with cycle detection, max depth like Linux’s 40).
Q3: Permissions.
A3: Each node gets mode, owner, group. Each op checks against current process identity.
Q4: Make it crash-safe (journaling). A4: Log every metadata change (creates, deletes, renames) to a write-ahead log; replay on startup. ext4 / NTFS use this.
Q5: Multi-threaded. A5: Per-node RWLock; or copy-on-write directory snapshots; or path-prefix lock hierarchy with strict ordering to avoid deadlock.
Q6: Hardlinks. A6: Split node into inode (data + refcount) and dentry (name pointing to inode). Multiple dentries can point to the same inode.
14. Solution Walkthrough
See solution.py:
Node— unified file/directory node.FileSystem— four ops via single_walkhelper.FileSystemBrute— flat-dicts oracle.- Stress test driving random sequences against both.
15. Beyond the Problem
Principal code review: “Design In-Memory File System is operating-systems-101 in 50 lines. The LeetCode version is toy; production filesystems add three orthogonal axes of complexity: (1) Persistence + crash safety: every metadata mutation goes through a journal (ext4’s jbd2, XFS’s metadata log, NTFS’s
$LogFile) so a power loss leaves the FS recoverable. (2) Concurrency: VFS uses per-inode locks, dcache RCU readers, andi_mutexfor atomic ops; getting this right is what makes a production FS hard. (3) Storage layer: extents/blocks/B-trees (ext4 uses HTree dir index, XFS uses B+ trees, ZFS uses Merkle trees for end-to-end checksumming). The Staff+ recognition is inode/dentry separation — the data is decoupled from the name. This is what enables hardlinks (multiple names, one data), bind mounts (the same data appears at multiple paths), and elegant unlink-while-open semantics (you canrman open file in Linux; the inode persists until the last fd closes). The LeetCode ‘children + content’ model collapses this; a senior interviewer might push you toward the separation as a follow-up. The broader lesson: filesystems are the canonical example of a hierarchical naming service, and the same patterns recur in URI routing, DNS, LDAP, package namespaces (Python imports, npm scopes), and config trees (etcd, Consul KV). The trie-with-helper is the universal data structure for hierarchical naming.”