Lab 19 — In-Memory Filesystem
Goal
Implement an in-memory filesystem that supports ls, mkdir, addContentToFile, readContentFromFile over a tree of inode-like directory and file nodes. After this lab you should be able to design and implement the filesystem from a blank screen in <30 minutes and answer the standard follow-ups (concurrency, persistence, paths/permissions) crisply.
Background Concepts
A filesystem is a tree where internal nodes are directories and leaves are files. Every Unix-style filesystem reduces to the same three operations: navigate (walk a path to a node), mutate (create / delete / write), and read (list / cat). The interview problem strips out permissions, hard links, symbolic links, journaling, and on-disk layout, leaving the core tree structure + path-walking — which is enough to test naming, encapsulation, and correctness.
The two design choices that matter:
- One node class or two? A
Directoryhas adict[name, Node]of children; aFilehas a content string. They share path-walking but diverge in storage. Use either an inheritance hierarchy (DirectoryandFileextendNode) or a singleNodewith akinddiscriminator. Inheritance is cleaner for this problem. - Path-walk encapsulation. Every operation needs
_walk(path)that returns the target node (or raises). Centralize it; do not duplicate the path-split logic in each method.
This problem appears as LC 588 — “Design In-Memory File System” — and is asked at Amazon, Google, and Bloomberg as an OOD warmup.
Interview Context
A 30-to-45-minute round at the senior level. The interviewer wants to see (a) clean class decomposition, (b) correct path-handling (absolute paths, edge cases like / and trailing slash), (c) sensible API, (d) tests. Common failure: stuffing everything into one class with one dict[str, str] keyed on full paths — works for small inputs, fails the “ls a directory” requirement, and looks like LeetCode glue rather than production code.
Problem Statement
Implement FileSystem with:
ls(path) -> list[str]: ifpathis a file, return[filename]; if a directory, return sorted list of children.mkdir(path) -> None: create the directory and any missing intermediate directories (mkdir -p semantics).addContentToFile(path, content) -> None: create the file if missing; appendcontentto it. Intermediate directories are created.readContentFromFile(path) -> str: return the file’s content. Raise if not a file.
All paths are absolute, start with /, components separated by /. The root is /.
Constraints
- Path components are non-empty alphanumeric (no
.,.., no slashes-in-names). - Directories and files in the same directory must have distinct names.
addContentToFileon an existing directory should raise.ls("/")returns the children of root.
Clarifying Questions
- Is
addContentToFileappend or overwrite? (LC 588 is append; confirm.) - Are path components case-sensitive? (Yes, like Unix.)
- What’s the error mode for missing files in
readContentFromFile? (Raise — caller should not silently get an empty string.) - Can
mkdir("/a")succeed if/aalready exists as a directory? (Yes — idempotent. As a file? No — raise.) - Does
ls("/a/b")where/a/bdoesn’t exist raise or return[]? (Raise.) - Concurrency? (Single-threaded by default; thread-safety as a follow-up.)
Examples
fs = FileSystem()
fs.ls("/") # []
fs.mkdir("/a/b/c")
fs.addContentToFile("/a/b/c/d.txt", "hello")
fs.ls("/") # ["a"]
fs.ls("/a/b/c") # ["d.txt"]
fs.readContentFromFile("/a/b/c/d.txt") # "hello"
fs.addContentToFile("/a/b/c/d.txt", " world")
fs.readContentFromFile("/a/b/c/d.txt") # "hello world"
Initial Brute Force
class FlatFS:
def __init__(self): self.store = {} # full path -> content (None for dirs)
def mkdir(self, p): self.store[p] = None
def addContentToFile(self, p, c): self.store[p] = self.store.get(p, "") or "" ; self.store[p] += c
def readContentFromFile(self, p): return self.store[p]
def ls(self, p):
if self.store.get(p) is not None: return [p.rsplit("/", 1)[1]]
prefix = p.rstrip("/") + "/"
return sorted({k[len(prefix):].split("/")[0] for k in self.store if k.startswith(prefix)})
This works on small inputs but is O(N) per ls (scans every key) and conflates the “file vs directory” type via a None-or-string trick. It also can’t represent an empty directory unambiguously.
Brute Force Complexity
ls: O(N · L) where N is total entries and L is average path length. addContentToFile: O(L) for hashing. Memory: O(total path text + content).
Optimization Path
Replace the flat dict with a tree of nodes. Each Directory has dict[str, Node] children — ls is now O(K log K) where K is the number of children of the queried directory, not the whole filesystem. mkdir and addContentToFile walk the path once, creating intermediate directories on demand. The walk is O(path_depth).
Final Expected Approach
Node is the base class. Directory(Node) has children: dict[str, Node]. File(Node) has content: list[str] (a list of chunks; append by chunks.append(c); read by "".join(chunks)). The list-of-chunks representation makes append O(1) regardless of total size. _walk(path, create_dirs=False) returns the terminal node, raising or creating intermediates as configured. Each public method is a thin wrapper around _walk.
Data Structures Used
dict[str, Node]for directory children — O(1) lookup, sorted on demand forls.list[str]for file content chunks — O(1) append.- A path-split helper that handles
/correctly.
Correctness Argument
Every node has exactly one parent (the dict entry that points to it). The root is the only node with no parent. _walk is the single source of truth for path resolution; it raises if it encounters a missing intermediate (create_dirs=False) or auto-creates one (create_dirs=True). addContentToFile resolves the parent directory, then either fetches the existing File (and asserts it’s a file, not a directory) or creates a new File. The “name collision” case (a directory exists where a file is being created) is detected at this exact step.
Complexity
| Operation | Time | Space |
|---|---|---|
mkdir | O(L) where L = path components | O(L) new nodes |
addContentToFile | O(L) for walk + O(1) append | O(content) |
readContentFromFile | O(L + content_size) for join | O(content) |
ls (directory) | O(K log K) for sort | O(K) |
ls (file) | O(L) | O(1) |
Implementation Requirements
class Node:
pass
class File(Node):
__slots__ = ("chunks",)
def __init__(self):
self.chunks: list[str] = []
def read(self) -> str:
return "".join(self.chunks)
def append(self, content: str):
if content:
self.chunks.append(content)
class Directory(Node):
__slots__ = ("children",)
def __init__(self):
self.children: dict[str, Node] = {}
def _split(path: str) -> list[str]:
if not path or path[0] != "/":
raise ValueError(f"path must be absolute: {path!r}")
return [p for p in path.split("/") if p]
class FileSystem:
def __init__(self):
self._root = Directory()
def _walk(self, parts: list[str], *, create_dirs: bool = False) -> Node:
node: Node = self._root
for i, p in enumerate(parts):
if not isinstance(node, Directory):
raise NotADirectoryError("/".join(parts[:i]) or "/")
child = node.children.get(p)
if child is None:
if not create_dirs:
raise FileNotFoundError("/" + "/".join(parts[: i + 1]))
child = Directory()
node.children[p] = child
node = child
return node
def ls(self, path: str) -> list[str]:
parts = _split(path)
node = self._walk(parts)
if isinstance(node, File):
return [parts[-1]]
return sorted(node.children.keys())
def mkdir(self, path: str) -> None:
parts = _split(path)
if not parts:
return # mkdir '/' is a no-op
# Walk-with-create, but reject a final-segment that exists as a file
parent = self._walk(parts[:-1], create_dirs=True)
if not isinstance(parent, Directory):
raise NotADirectoryError("/".join(parts[:-1]))
last = parts[-1]
existing = parent.children.get(last)
if existing is None:
parent.children[last] = Directory()
elif isinstance(existing, File):
raise FileExistsError(path + " is a file")
# else: existing directory; idempotent
def addContentToFile(self, path: str, content: str) -> None:
parts = _split(path)
if not parts:
raise IsADirectoryError("/")
parent = self._walk(parts[:-1], create_dirs=True)
if not isinstance(parent, Directory):
raise NotADirectoryError("/".join(parts[:-1]))
last = parts[-1]
node = parent.children.get(last)
if node is None:
node = File()
parent.children[last] = node
elif isinstance(node, Directory):
raise IsADirectoryError(path)
node.append(content)
def readContentFromFile(self, path: str) -> str:
parts = _split(path)
node = self._walk(parts)
if not isinstance(node, File):
raise IsADirectoryError(path)
return node.read()
Tests
def test_empty_root():
fs = FileSystem()
assert fs.ls("/") == []
def test_mkdir_p():
fs = FileSystem()
fs.mkdir("/a/b/c")
assert fs.ls("/") == ["a"]
assert fs.ls("/a") == ["b"]
assert fs.ls("/a/b") == ["c"]
assert fs.ls("/a/b/c") == []
def test_add_and_read_file():
fs = FileSystem()
fs.addContentToFile("/x/y/z.txt", "hello")
fs.addContentToFile("/x/y/z.txt", " world")
assert fs.readContentFromFile("/x/y/z.txt") == "hello world"
assert fs.ls("/x/y") == ["z.txt"]
assert fs.ls("/x/y/z.txt") == ["z.txt"]
def test_mkdir_idempotent():
fs = FileSystem()
fs.mkdir("/a")
fs.mkdir("/a")
assert fs.ls("/") == ["a"]
def test_mkdir_over_file_fails():
fs = FileSystem()
fs.addContentToFile("/a", "x")
try: fs.mkdir("/a")
except FileExistsError: pass
else: assert False
def test_read_nonexistent():
fs = FileSystem()
try: fs.readContentFromFile("/nope")
except FileNotFoundError: pass
else: assert False
def test_ls_sorts():
fs = FileSystem()
for n in ["zeta", "alpha", "mu"]: fs.mkdir(f"/{n}")
assert fs.ls("/") == ["alpha", "mu", "zeta"]
def test_root_path():
fs = FileSystem()
fs.addContentToFile("/a.txt", "x")
assert fs.ls("/a.txt") == ["a.txt"]
Follow-up Questions
- How would you make it thread-safe? Two options. (a) Single coarse
RLockon the whole FileSystem — every operation acquires it. Simple, fine for low write rates. (b) Per-directory lock; acquire locks along the path during a walk. Avoids serializing readers from disjoint subtrees, but care is needed to acquire in path-order to avoid deadlock. For an interview answer, name both and pick (a) unless write contention is the explicit follow-up. - How would you persist state across restarts? Two layers. (i) Snapshot: serialize the tree (DFS, emit
(path, kind, content_or_empty)); on boot, replay. (ii) Write-ahead log: append every mutation as a record (mkdir /a,add /a/b "hello"); periodic checkpoint. Tradeoff: pure snapshot loses recent writes; pure log replays slowly; combine for production. - What configuration knobs would you expose?
max_filesize,max_path_depth,max_filename_length. Don’t expose the lock granularity — implementation detail. Reject paths exceeding the caps with a typed error. - How would you handle a poison-pill input? A path with millions of components, or a single file with multi-gigabyte content. Cap path depth, cap filename length, cap per-file content size, and surface metric counters for rejected requests.
- How would you test it? Unit tests on each method’s contract. Property-based tests: random sequence of
mkdir/addContentoperations followed by anls/readthat asserts consistency with a simple oracle (e.g., a flat dict). Concurrency tests: many threads each writing to disjoint subtrees should produce identical state regardless of interleaving. - What metrics would you emit? Operation counters (per method), latency histograms,
total_files,total_directories,bytes_storedgauges, error counters by type.
Product Extension
Variants in real systems: S3-style flat namespace with / as a virtual delimiter; in-memory FUSE filesystems for tests; Kubernetes ConfigMap/Secret mounting (a tiny in-memory FS exposed to a pod). The data structure is the same; the API surface and persistence vary.
Language/Runtime Follow-ups
- Python: as above. Use
__slots__forFileandDirectoryto cut per-node memory. - Java:
Map<String, Node>(HashMap or TreeMap). For sortedls, TreeMap is natural and avoids the per-call sort. - Go:
type Node interface { ... }withDirectoryandFilestructs. For sortedls, sort.Strings on the keys. - C++:
std::variant<Directory, File>or a tagged union.std::map<std::string, std::unique_ptr<Node>>for ordered children. - JS/TS:
Map<string, Node>(insertion-ordered; sort onls). Use a discriminated union forNode.
Common Bugs
- Using
path.split("/")without filtering empty strings —["", "a", "b"]for"/a/b". - Treating
/differently from non-/paths inconsistently; a single_splithelper avoids this. - Not detecting directory-vs-file at the final path component —
addContentToFile("/a")where/ais a directory must raise. mkdiroverwriting an existing file silently.- Storing file content as a single growing string —
s += contentis O(N) per append. Use a list of chunks. - Returning unsorted
ls— the spec usually requires sorted output for determinism.
Debugging Strategy
When ls is wrong: print the children dict at the resolved node — almost always a wrong-path-walk bug. When append seems to overwrite: check that addContentToFile calls node.append, not node.chunks = [content]. When concurrency tests fail: log every operation in order with a thread ID; the bug is usually a missing lock around parent.children[last] = ....
Mastery Criteria
-
Decomposed
Node/File/Directorycleanly in <5 minutes. -
Wrote a single
_walkhelper used by every public method. -
Handled
mkdiridempotency and the file-vs-directory collision case correctly. - Used the list-of-chunks pattern for O(1) append.
-
Wrote tests for every error mode (
FileNotFoundError,IsADirectoryError,FileExistsError). - Articulated the snapshot+WAL persistence strategy in <60 seconds.
- Implemented from a blank screen in <30 minutes.