Lab 19 — In-Memory Filesystem

Goal

Implement an in-memory filesystem that supports ls, mkdir, addContentToFile, readContentFromFile over a tree of inode-like directory and file nodes. After this lab you should be able to design and implement the filesystem from a blank screen in <30 minutes and answer the standard follow-ups (concurrency, persistence, paths/permissions) crisply.

Background Concepts

A filesystem is a tree where internal nodes are directories and leaves are files. Every Unix-style filesystem reduces to the same three operations: navigate (walk a path to a node), mutate (create / delete / write), and read (list / cat). The interview problem strips out permissions, hard links, symbolic links, journaling, and on-disk layout, leaving the core tree structure + path-walking — which is enough to test naming, encapsulation, and correctness.

The two design choices that matter:

  1. One node class or two? A Directory has a dict[name, Node] of children; a File has a content string. They share path-walking but diverge in storage. Use either an inheritance hierarchy (Directory and File extend Node) or a single Node with a kind discriminator. Inheritance is cleaner for this problem.
  2. Path-walk encapsulation. Every operation needs _walk(path) that returns the target node (or raises). Centralize it; do not duplicate the path-split logic in each method.

This problem appears as LC 588 — “Design In-Memory File System” — and is asked at Amazon, Google, and Bloomberg as an OOD warmup.

Interview Context

A 30-to-45-minute round at the senior level. The interviewer wants to see (a) clean class decomposition, (b) correct path-handling (absolute paths, edge cases like / and trailing slash), (c) sensible API, (d) tests. Common failure: stuffing everything into one class with one dict[str, str] keyed on full paths — works for small inputs, fails the “ls a directory” requirement, and looks like LeetCode glue rather than production code.

Problem Statement

Implement FileSystem with:

  • ls(path) -> list[str]: if path is a file, return [filename]; if a directory, return sorted list of children.
  • mkdir(path) -> None: create the directory and any missing intermediate directories (mkdir -p semantics).
  • addContentToFile(path, content) -> None: create the file if missing; append content to it. Intermediate directories are created.
  • readContentFromFile(path) -> str: return the file’s content. Raise if not a file.

All paths are absolute, start with /, components separated by /. The root is /.

Constraints

  • Path components are non-empty alphanumeric (no ., .., no slashes-in-names).
  • Directories and files in the same directory must have distinct names.
  • addContentToFile on an existing directory should raise.
  • ls("/") returns the children of root.

Clarifying Questions

  1. Is addContentToFile append or overwrite? (LC 588 is append; confirm.)
  2. Are path components case-sensitive? (Yes, like Unix.)
  3. What’s the error mode for missing files in readContentFromFile? (Raise — caller should not silently get an empty string.)
  4. Can mkdir("/a") succeed if /a already exists as a directory? (Yes — idempotent. As a file? No — raise.)
  5. Does ls("/a/b") where /a/b doesn’t exist raise or return []? (Raise.)
  6. Concurrency? (Single-threaded by default; thread-safety as a follow-up.)

Examples

fs = FileSystem()
fs.ls("/")                              # []
fs.mkdir("/a/b/c")
fs.addContentToFile("/a/b/c/d.txt", "hello")
fs.ls("/")                              # ["a"]
fs.ls("/a/b/c")                         # ["d.txt"]
fs.readContentFromFile("/a/b/c/d.txt")  # "hello"
fs.addContentToFile("/a/b/c/d.txt", " world")
fs.readContentFromFile("/a/b/c/d.txt")  # "hello world"

Initial Brute Force

class FlatFS:
    def __init__(self): self.store = {}  # full path -> content (None for dirs)
    def mkdir(self, p): self.store[p] = None
    def addContentToFile(self, p, c): self.store[p] = self.store.get(p, "") or "" ; self.store[p] += c
    def readContentFromFile(self, p): return self.store[p]
    def ls(self, p):
        if self.store.get(p) is not None: return [p.rsplit("/", 1)[1]]
        prefix = p.rstrip("/") + "/"
        return sorted({k[len(prefix):].split("/")[0] for k in self.store if k.startswith(prefix)})

This works on small inputs but is O(N) per ls (scans every key) and conflates the “file vs directory” type via a None-or-string trick. It also can’t represent an empty directory unambiguously.

Brute Force Complexity

ls: O(N · L) where N is total entries and L is average path length. addContentToFile: O(L) for hashing. Memory: O(total path text + content).

Optimization Path

Replace the flat dict with a tree of nodes. Each Directory has dict[str, Node] children — ls is now O(K log K) where K is the number of children of the queried directory, not the whole filesystem. mkdir and addContentToFile walk the path once, creating intermediate directories on demand. The walk is O(path_depth).

Final Expected Approach

Node is the base class. Directory(Node) has children: dict[str, Node]. File(Node) has content: list[str] (a list of chunks; append by chunks.append(c); read by "".join(chunks)). The list-of-chunks representation makes append O(1) regardless of total size. _walk(path, create_dirs=False) returns the terminal node, raising or creating intermediates as configured. Each public method is a thin wrapper around _walk.

Data Structures Used

  • dict[str, Node] for directory children — O(1) lookup, sorted on demand for ls.
  • list[str] for file content chunks — O(1) append.
  • A path-split helper that handles / correctly.

Correctness Argument

Every node has exactly one parent (the dict entry that points to it). The root is the only node with no parent. _walk is the single source of truth for path resolution; it raises if it encounters a missing intermediate (create_dirs=False) or auto-creates one (create_dirs=True). addContentToFile resolves the parent directory, then either fetches the existing File (and asserts it’s a file, not a directory) or creates a new File. The “name collision” case (a directory exists where a file is being created) is detected at this exact step.

Complexity

OperationTimeSpace
mkdirO(L) where L = path componentsO(L) new nodes
addContentToFileO(L) for walk + O(1) appendO(content)
readContentFromFileO(L + content_size) for joinO(content)
ls (directory)O(K log K) for sortO(K)
ls (file)O(L)O(1)

Implementation Requirements

class Node:
    pass


class File(Node):
    __slots__ = ("chunks",)
    def __init__(self):
        self.chunks: list[str] = []

    def read(self) -> str:
        return "".join(self.chunks)

    def append(self, content: str):
        if content:
            self.chunks.append(content)


class Directory(Node):
    __slots__ = ("children",)
    def __init__(self):
        self.children: dict[str, Node] = {}


def _split(path: str) -> list[str]:
    if not path or path[0] != "/":
        raise ValueError(f"path must be absolute: {path!r}")
    return [p for p in path.split("/") if p]


class FileSystem:
    def __init__(self):
        self._root = Directory()

    def _walk(self, parts: list[str], *, create_dirs: bool = False) -> Node:
        node: Node = self._root
        for i, p in enumerate(parts):
            if not isinstance(node, Directory):
                raise NotADirectoryError("/".join(parts[:i]) or "/")
            child = node.children.get(p)
            if child is None:
                if not create_dirs:
                    raise FileNotFoundError("/" + "/".join(parts[: i + 1]))
                child = Directory()
                node.children[p] = child
            node = child
        return node

    def ls(self, path: str) -> list[str]:
        parts = _split(path)
        node = self._walk(parts)
        if isinstance(node, File):
            return [parts[-1]]
        return sorted(node.children.keys())

    def mkdir(self, path: str) -> None:
        parts = _split(path)
        if not parts:
            return  # mkdir '/' is a no-op
        # Walk-with-create, but reject a final-segment that exists as a file
        parent = self._walk(parts[:-1], create_dirs=True)
        if not isinstance(parent, Directory):
            raise NotADirectoryError("/".join(parts[:-1]))
        last = parts[-1]
        existing = parent.children.get(last)
        if existing is None:
            parent.children[last] = Directory()
        elif isinstance(existing, File):
            raise FileExistsError(path + " is a file")
        # else: existing directory; idempotent

    def addContentToFile(self, path: str, content: str) -> None:
        parts = _split(path)
        if not parts:
            raise IsADirectoryError("/")
        parent = self._walk(parts[:-1], create_dirs=True)
        if not isinstance(parent, Directory):
            raise NotADirectoryError("/".join(parts[:-1]))
        last = parts[-1]
        node = parent.children.get(last)
        if node is None:
            node = File()
            parent.children[last] = node
        elif isinstance(node, Directory):
            raise IsADirectoryError(path)
        node.append(content)

    def readContentFromFile(self, path: str) -> str:
        parts = _split(path)
        node = self._walk(parts)
        if not isinstance(node, File):
            raise IsADirectoryError(path)
        return node.read()

Tests

def test_empty_root():
    fs = FileSystem()
    assert fs.ls("/") == []

def test_mkdir_p():
    fs = FileSystem()
    fs.mkdir("/a/b/c")
    assert fs.ls("/") == ["a"]
    assert fs.ls("/a") == ["b"]
    assert fs.ls("/a/b") == ["c"]
    assert fs.ls("/a/b/c") == []

def test_add_and_read_file():
    fs = FileSystem()
    fs.addContentToFile("/x/y/z.txt", "hello")
    fs.addContentToFile("/x/y/z.txt", " world")
    assert fs.readContentFromFile("/x/y/z.txt") == "hello world"
    assert fs.ls("/x/y") == ["z.txt"]
    assert fs.ls("/x/y/z.txt") == ["z.txt"]

def test_mkdir_idempotent():
    fs = FileSystem()
    fs.mkdir("/a")
    fs.mkdir("/a")
    assert fs.ls("/") == ["a"]

def test_mkdir_over_file_fails():
    fs = FileSystem()
    fs.addContentToFile("/a", "x")
    try: fs.mkdir("/a")
    except FileExistsError: pass
    else: assert False

def test_read_nonexistent():
    fs = FileSystem()
    try: fs.readContentFromFile("/nope")
    except FileNotFoundError: pass
    else: assert False

def test_ls_sorts():
    fs = FileSystem()
    for n in ["zeta", "alpha", "mu"]: fs.mkdir(f"/{n}")
    assert fs.ls("/") == ["alpha", "mu", "zeta"]

def test_root_path():
    fs = FileSystem()
    fs.addContentToFile("/a.txt", "x")
    assert fs.ls("/a.txt") == ["a.txt"]

Follow-up Questions

  1. How would you make it thread-safe? Two options. (a) Single coarse RLock on the whole FileSystem — every operation acquires it. Simple, fine for low write rates. (b) Per-directory lock; acquire locks along the path during a walk. Avoids serializing readers from disjoint subtrees, but care is needed to acquire in path-order to avoid deadlock. For an interview answer, name both and pick (a) unless write contention is the explicit follow-up.
  2. How would you persist state across restarts? Two layers. (i) Snapshot: serialize the tree (DFS, emit (path, kind, content_or_empty)); on boot, replay. (ii) Write-ahead log: append every mutation as a record (mkdir /a, add /a/b "hello"); periodic checkpoint. Tradeoff: pure snapshot loses recent writes; pure log replays slowly; combine for production.
  3. What configuration knobs would you expose? max_filesize, max_path_depth, max_filename_length. Don’t expose the lock granularity — implementation detail. Reject paths exceeding the caps with a typed error.
  4. How would you handle a poison-pill input? A path with millions of components, or a single file with multi-gigabyte content. Cap path depth, cap filename length, cap per-file content size, and surface metric counters for rejected requests.
  5. How would you test it? Unit tests on each method’s contract. Property-based tests: random sequence of mkdir / addContent operations followed by an ls/read that asserts consistency with a simple oracle (e.g., a flat dict). Concurrency tests: many threads each writing to disjoint subtrees should produce identical state regardless of interleaving.
  6. What metrics would you emit? Operation counters (per method), latency histograms, total_files, total_directories, bytes_stored gauges, error counters by type.

Product Extension

Variants in real systems: S3-style flat namespace with / as a virtual delimiter; in-memory FUSE filesystems for tests; Kubernetes ConfigMap/Secret mounting (a tiny in-memory FS exposed to a pod). The data structure is the same; the API surface and persistence vary.

Language/Runtime Follow-ups

  • Python: as above. Use __slots__ for File and Directory to cut per-node memory.
  • Java: Map<String, Node> (HashMap or TreeMap). For sorted ls, TreeMap is natural and avoids the per-call sort.
  • Go: type Node interface { ... } with Directory and File structs. For sorted ls, sort.Strings on the keys.
  • C++: std::variant<Directory, File> or a tagged union. std::map<std::string, std::unique_ptr<Node>> for ordered children.
  • JS/TS: Map<string, Node> (insertion-ordered; sort on ls). Use a discriminated union for Node.

Common Bugs

  1. Using path.split("/") without filtering empty strings — ["", "a", "b"] for "/a/b".
  2. Treating / differently from non-/ paths inconsistently; a single _split helper avoids this.
  3. Not detecting directory-vs-file at the final path component — addContentToFile("/a") where /a is a directory must raise.
  4. mkdir overwriting an existing file silently.
  5. Storing file content as a single growing string — s += content is O(N) per append. Use a list of chunks.
  6. Returning unsorted ls — the spec usually requires sorted output for determinism.

Debugging Strategy

When ls is wrong: print the children dict at the resolved node — almost always a wrong-path-walk bug. When append seems to overwrite: check that addContentToFile calls node.append, not node.chunks = [content]. When concurrency tests fail: log every operation in order with a thread ID; the bug is usually a missing lock around parent.children[last] = ....

Mastery Criteria

  • Decomposed Node / File / Directory cleanly in <5 minutes.
  • Wrote a single _walk helper used by every public method.
  • Handled mkdir idempotency and the file-vs-directory collision case correctly.
  • Used the list-of-chunks pattern for O(1) append.
  • Wrote tests for every error mode (FileNotFoundError, IsADirectoryError, FileExistsError).
  • Articulated the snapshot+WAL persistence strategy in <60 seconds.
  • Implemented from a blank screen in <30 minutes.