Lab 22 — Text Editor Buffer (Gap Buffer / Piece Table)

Goal

Implement a text editor data structure that supports cursor-local insert, delete (backspace), left/right cursor movement, and substring read with O(1) amortized cursor-local edits. The reference implementation is a gap buffer; the follow-up is a piece table. After this lab you should articulate why a flat str or list[char] is wrong and produce a working gap buffer in <30 minutes.

Background Concepts

A naive editor representation — a single string — makes every insert and delete O(N): the character data after the cursor must shift. For a million-character document, every keystroke is millions of operations. Real editors avoid this with one of three data structures:

  • Gap buffer: a single contiguous array with a “gap” of unused slots positioned at the cursor. Insert at cursor = O(1) (write into the gap, shrink it). Move cursor = O(distance moved) — the gap moves with the cursor by shifting characters across it. Used by Emacs.
  • Piece table: an immutable original buffer + an append-only “added” buffer + a list of “pieces” describing the visible document as concatenated slices. Insert anywhere = O(1) amortized (append to the added buffer, splice a piece into the piece list). Used by VS Code, Word.
  • Rope / balanced tree of strings: O(log N) for all operations, the most general. Used by Xi-editor and several research editors.

The interview almost always wants the gap buffer because it is the simplest correct answer with the right asymptotics for the locality assumption (most edits happen near the cursor). The piece table is the right follow-up answer when the interviewer asks “what if edits aren’t local?” or “what if you want O(1) amortized regardless of cursor position?”.

Interview Context

A 30-to-45-minute round at Google (Docs), Microsoft (VS Code, Word), JetBrains, and any team that builds editor-like UI. Most candidates default to a list[char] and accept O(N) per insert; that’s a partial answer. Reaching for a gap buffer immediately demonstrates that you’ve thought about real editor performance.

Problem Statement

Implement TextEditor with:

  • insert(text: str): insert text at the current cursor position; cursor moves to the end of inserted text.
  • delete_left(n: int) -> int: delete up to n characters to the left of the cursor; return the actual number deleted (capped by left content).
  • move_left(n: int) -> str: move cursor n positions left (capped at start); return the last 10 characters to the left of the new cursor (or fewer if not available).
  • move_right(n: int) -> str: symmetric on the right.
  • text() -> str: return the full document (debug helper; not on the hot path).

This is the LC 2296 “Design a Text Editor” interface, with the read-back-10 affordance.

Constraints

  • Up to 10^4 calls in total.
  • Each text argument up to 40 characters; total inserted up to ~10^4 characters.
  • Insert and delete_left must be O(amortized 1) plus O(text length). Move operations are O(distance moved).

Clarifying Questions

  1. Is the cursor between characters (column-style) or at a character (cell-style)? (Between — like every editor.)
  2. Does delete_left(n) with n > available delete only what’s available and return that count? (Yes.)
  3. What does move_left return when the cursor is at the start? (Empty string.)
  4. Is the buffer Unicode-aware? (For LC 2296: ASCII suffices. For real editors: must handle code points and grapheme clusters; out of scope here.)
  5. Are inserts at any cursor position guaranteed local (i.e., does the interviewer want gap buffer or piece table)? (Default gap buffer — most edits are local.)

Examples

ed = TextEditor()
ed.insert("leetcode")     # cursor at end; text = "leetcode"
ed.delete_left(4)         # 4
ed.text()                 # "leet"
ed.insert("practice")     # text = "leetpractice"
ed.move_right(3)          # "etpractice" -> last 10 to left
ed.move_left(8)           # "leet"
ed.delete_left(10)        # 4
ed.text()                 # "practice"

Initial Brute Force

class StringEditor:
    def __init__(self): self.s = ""; self.c = 0
    def insert(self, text): self.s = self.s[:self.c] + text + self.s[self.c:]; self.c += len(text)
    def delete_left(self, n):
        d = min(n, self.c); self.s = self.s[:self.c - d] + self.s[self.c:]; self.c -= d; return d
    def move_left(self, n): self.c = max(0, self.c - n); return self.s[max(0, self.c - 10):self.c]
    def move_right(self, n): self.c = min(len(self.s), self.c + n); return self.s[max(0, self.c - 10):self.c]
    def text(self): return self.s

Correct, but every insert and delete is O(N) due to slice + concatenation. At 10^4 ops on a 10^4-char document: 10^8 char-shifts. Borderline-TLE on LC 2296.

Brute Force Complexity

insert: O(N + |text|). delete_left: O(N). move_*: O(1) for the text return (slice). Total worst case: O(N · operations).

Optimization Path

Switch to a gap buffer: a single bytearray (or list[str]) of length capacity, with two indices gap_start and gap_end. Characters before gap_start and after gap_end are real content; the range [gap_start, gap_end) is unused. The cursor position is gap_start. Insert at cursor: write into the gap, advance gap_start. Delete left: rewind gap_start (the deleted characters are now in the gap, no copying needed). Move left by k: shift k characters from before gap_start to after gap_end - 1 (the gap moves toward the start). Move right by k: symmetric. Resize when the gap shrinks to zero — double the capacity.

Final Expected Approach

A bytearray buf of size capacity. Indices gap_start (left edge of gap) and gap_end (right edge, exclusive). Invariants: 0 ≤ gap_start ≤ gap_end ≤ capacity. Document length = capacity - (gap_end - gap_start). Cursor = gap_start. Operations manipulate the indices and copy small ranges of bytes; total work for cursor-local edits is bounded by the edit size, not the document size.

Data Structures Used

  • bytearray (or list[str]) for the storage buffer.
  • Two int indices gap_start and gap_end.
  • A capacity bookkeeping value.
  • A _grow helper that doubles capacity when the gap is exhausted.

Correctness Argument

After every operation: the document is buf[:gap_start] + buf[gap_end:capacity] decoded. insert(text): ensure the gap holds len(text) slots (grow if needed); copy text into buf[gap_start:gap_start + len(text)]; advance gap_start by len(text). The document grows by exactly len(text) and the cursor moves to the end of the insertion. delete_left(n): cap n by gap_start (cursor is gap_start, so the leftmost left-deletable count is gap_start); rewind gap_start by n. The document shrinks by exactly n. move_left(k): shift min(k, gap_start) bytes from buf[gap_start - k:gap_start] to buf[gap_end - k:gap_end]; subtract k from both indices. The visible document is unchanged, only the gap moved.

Complexity

OperationTimeSpace
insert(t)O(t
delete_left(n)O(1)O(1) extra
move_left(k) / move_right(k)O(k)O(1)
text()O(N)O(N)

Implementation Requirements

class TextEditor:
    def __init__(self, initial_capacity: int = 16):
        self._buf = bytearray(initial_capacity)
        self._gap_start = 0
        self._gap_end = initial_capacity

    @property
    def _capacity(self) -> int:
        return len(self._buf)

    @property
    def _length(self) -> int:
        return self._capacity - (self._gap_end - self._gap_start)

    def _grow(self, needed: int):
        new_cap = max(self._capacity * 2, self._capacity + needed)
        new_buf = bytearray(new_cap)
        # left segment unchanged, right segment shifted to end of new buffer
        new_buf[: self._gap_start] = self._buf[: self._gap_start]
        right_size = self._capacity - self._gap_end
        new_buf[new_cap - right_size :] = self._buf[self._gap_end :]
        self._buf = new_buf
        self._gap_end = new_cap - right_size

    def insert(self, text: str):
        b = text.encode("utf-8")
        if self._gap_end - self._gap_start < len(b):
            self._grow(len(b))
        self._buf[self._gap_start : self._gap_start + len(b)] = b
        self._gap_start += len(b)

    def delete_left(self, n: int) -> int:
        d = min(n, self._gap_start)
        self._gap_start -= d
        return d

    def _move_left(self, k: int):
        k = min(k, self._gap_start)
        if k == 0:
            return
        # copy k bytes from before gap to after gap (right side)
        src_end = self._gap_start
        src_start = src_end - k
        dst_end = self._gap_end
        dst_start = dst_end - k
        # work right-to-left to handle overlap
        for i in range(k - 1, -1, -1):
            self._buf[dst_start + i] = self._buf[src_start + i]
        self._gap_start -= k
        self._gap_end -= k

    def _move_right(self, k: int):
        right_avail = self._capacity - self._gap_end
        k = min(k, right_avail)
        if k == 0:
            return
        # copy k bytes from after gap (right side) to before gap (left side)
        for i in range(k):
            self._buf[self._gap_start + i] = self._buf[self._gap_end + i]
        self._gap_start += k
        self._gap_end += k

    def _last_10_left(self) -> str:
        start = max(0, self._gap_start - 10)
        return self._buf[start : self._gap_start].decode("utf-8", errors="replace")

    def move_left(self, k: int) -> str:
        self._move_left(k)
        return self._last_10_left()

    def move_right(self, k: int) -> str:
        self._move_right(k)
        return self._last_10_left()

    def text(self) -> str:
        left = self._buf[: self._gap_start]
        right = self._buf[self._gap_end :]
        return (left + right).decode("utf-8", errors="replace")

Tests

def test_basic_insert_delete():
    ed = TextEditor()
    ed.insert("leetcode")
    assert ed.text() == "leetcode"
    assert ed.delete_left(4) == 4
    assert ed.text() == "leet"
    ed.insert("practice")
    assert ed.text() == "leetpractice"

def test_cursor_movement_returns_last_10():
    ed = TextEditor()
    ed.insert("practice")
    assert ed.move_right(3) == "practice"   # cursor at end already; last 10 left = "practice"
    assert ed.move_left(8) == ""
    assert ed.delete_left(10) == 0
    ed.insert("leet")
    assert ed.text() == "leetpractice"
    assert ed.move_left(2) == "le"

def test_lc_2296_canonical():
    ed = TextEditor()
    ed.insert("leetcode")
    assert ed.delete_left(4) == 4
    ed.insert("practice")
    assert ed.move_right(3) == "etpractice"
    assert ed.move_left(8) == "leet"
    assert ed.delete_left(10) == 4
    assert ed.move_left(2) == ""

def test_grow_buffer():
    ed = TextEditor(initial_capacity=4)
    ed.insert("a" * 100)
    assert ed.text() == "a" * 100

def test_delete_more_than_left():
    ed = TextEditor()
    ed.insert("ab")
    assert ed.delete_left(10) == 2
    assert ed.text() == ""

def test_move_clamps():
    ed = TextEditor()
    ed.insert("hello")
    ed.move_left(100)        # clamped to 0
    ed.move_right(100)       # back to end
    assert ed.text() == "hello"

Follow-up Questions

  1. What is the relationship to a piece table? Gap buffer is one contiguous buffer with one gap; piece table is two buffers (original + append-only) and a list of pieces. Insert at cursor in piece table = append to “added” buffer, splice the piece list — O(1) amortized regardless of cursor position. The downside: random-access reads are O(log P) where P is the number of pieces (binary-search the piece list). Use a piece table when edits are non-local; use a gap buffer when most edits cluster.
  2. How would you make it thread-safe? Wrap public methods with a Lock (or use a single-writer model — most editors are single-threaded on the editing buffer for exactly this reason; rendering and saving happen on background threads with snapshots).
  3. How would you persist state across restarts? On every K seconds or after every N keystrokes, write the current text to a temporary file, then atomically rename it. For more granular crash recovery, append every operation to a log; replay on boot.
  4. What configuration knobs would you expose? initial_capacity, max_document_size. The growth factor (currently 2×) is a sensible default; don’t expose unless you’ve measured.
  5. How would you handle a poison-pill input? A multi-megabyte single insert(text). Reject text longer than max_insert (e.g., 1 MiB). Total document size capped by max_document_size. Return errors, don’t OOM.
  6. What metrics would you emit? inserts_total, deletes_total, cursor_moves_total, buffer_grows_total, document_size_bytes gauge, gap_size_bytes gauge. Useful for tracking edit patterns and tuning capacity defaults.

Product Extension

Real editors layer many things on top: undo/redo (each operation pushes an inverse onto a stack), syntax highlighting (incremental tree-sitter passes), multi-cursor (a list of gap-buffer-style cursors), collaborative editing (operational transforms or CRDTs over the same buffer). The buffer is the bottom; the rest is composition.

Language/Runtime Follow-ups

  • Python: as above. bytearray is the right primitive; avoid string concatenation inside the hot path.
  • Java: char[] plus int gapStart, int gapEnd. StringBuilder is internally a char[] but lacks gap-buffer semantics.
  • Go: []byte (or []rune for Unicode-aware editors). The growth pattern matches Go’s slice append.
  • C++: std::vector<char>. For piece tables, std::vector<Piece> of (buffer_id, offset, length).
  • JS/TS: Uint8Array is the dense representation; string concatenation is O(N) and should be avoided.

Common Bugs

  1. Forgetting to grow when the gap is exhausted — silent overwrite of right-segment data.
  2. Off-by-one when shifting bytes during cursor moves — left-to-right copy on overlapping ranges loses data; copy right-to-left on left-shifts.
  3. Using len(self._buf) after grow without updating cached references — always recompute capacity post-grow.
  4. Returning the full text() on every move_* call when the spec only wants the last 10 characters left of cursor.
  5. Encoding inconsistencies — mixing str and bytearray. Pick one (here we use UTF-8 bytes; document the choice; reject mid-codepoint splits in real implementations).
  6. Initializing the buffer too small (e.g., capacity 1) — every keystroke triggers a regrow. Default capacity 16 amortizes well.

Debugging Strategy

When text() is wrong: print (buf, gap_start, gap_end, capacity) after each operation; the bug is almost always a forgotten index update. When cursor moves leak data: print the bytes copied and the index ranges; right-to-left vs left-to-right copy direction is the most common bug. When grow fails: assert len(buf) == capacity after every operation.

Mastery Criteria

  • Stated the gap-buffer invariant in <30 seconds.
  • Named when piece table is preferred (non-local edits) without prompting.
  • Implemented gap buffer with insert/delete/move/text in <30 minutes from blank screen.
  • Wrote a regrow test that exercises the doubling.
  • Articulated the overlap-direction bug for cursor moves and named the fix.
  • Solved LC 2296 unaided in <40 minutes including all tests.
  • Listed three real editors (Emacs, VS Code, Word) and which structure each uses.