Lab 02 — String Mechanics: Reverse Words In A String

Goal

Master string immutability, builder patterns, encoding gotchas, and the cost of naive concatenation. The deliverable: reverse the order of words in a sentence efficiently in your interview language, demonstrating you understand why the “obvious” solution can be O(N²) in some languages and O(N) in others.

Background Concepts

String immutability and the resulting cost of s += c loops; StringBuilder / strings.Builder / "".join() patterns; substring complexity; whitespace tokenization; Unicode pitfalls. Review the Strings section of the Phase 1 README and item 4 in the runtime concepts (mutable vs immutable).

Interview Context

A staple of Microsoft, Amazon, and Bloomberg phone screens. The trap is candidates who write result = ""; for w in reversed(words): result += w + " " and don’t realize they just shipped O(N²) code in Java or Python. Strong candidates state the immutability fact aloud and choose a builder pattern.

Problem Statement

Given a string s representing a sentence, return a new string with the order of words reversed. A “word” is a maximal run of non-space characters. Multiple spaces between words and leading/trailing spaces must be collapsed to single spaces; the output has no leading/trailing space.

Constraints

  • 1 ≤ |s| ≤ 10^4
  • s contains printable ASCII characters and spaces.
  • s contains at least one word.

Clarifying Questions

  1. Is the input ASCII or arbitrary Unicode? (Affects iteration model; ASCII is the default unless stated.)
  2. Should multiple internal spaces be preserved or collapsed? (Standard problem says collapse; confirm.)
  3. Trim leading/trailing whitespace? (Yes — output has none.)
  4. Punctuation: is "hello," one word? (Per problem: a “word” is non-space-separated; "hello," is one word.)
  5. Can I allocate a new string, or must I work in place? (For Python/Java/JS — strings are immutable, so a new string is unavoidable. For C++ std::string, in-place is feasible.)

Examples

InputOutput
"the sky is blue""blue is sky the"
" hello world ""world hello"
"a good example""example good a"
"single""single"
" "invalid per constraints

Initial Brute Force

Split on whitespace, reverse the list, join with single spaces.

def reverse_words(s):
    return " ".join(reversed(s.split()))

s.split() (no arg) collapses runs of whitespace and trims, which is exactly the spec. This is one line in Python — but the interviewer wants you to explain what it does.

Brute Force Complexity

Time: O(N) — split is one linear pass, reversed is O(k) where k is word count, join is one linear pass. Space: O(N) for the list of words and the output. This is already optimal asymptotically.

Optimization Path

For interviews where the one-liner is “too easy,” the interviewer escalates: “Do it without split/join; use only character-level operations.” Or: “Reverse in place in a char[] with O(1) extra memory.”

The classic in-place trick on a mutable buffer: reverse the entire buffer, then reverse each word, then collapse internal whitespace. This is the same three-reversal identity from Lab 01, applied to characters.

Final Expected Approach

State the one-liner first. Then offer the manual two-pointer approach for languages with mutable strings or as an “I-understand-the-internals” demonstration.

def reverse_words(s):
    # tokenize without builtins
    words = []
    i = 0
    n = len(s)
    while i < n:
        while i < n and s[i] == ' ':
            i += 1
        j = i
        while j < n and s[j] != ' ':
            j += 1
        if j > i:
            words.append(s[i:j])
        i = j
    # reverse and join via builder
    out = []
    for w in reversed(words):
        out.append(w)
    return ' '.join(out)

In Java, replace the final join with StringBuilder. In Go, with strings.Builder.

Data Structures Used

A list of word strings (or substrings); a builder for the output. No advanced structures.

Correctness Argument

Tokenization invariant: at each outer iteration, i points at the start of unscanned input; the inner loops skip whitespace and capture a word. Each character is examined O(1) times, so tokenization is O(N). Reversed iteration over words produces them in opposite order; joining with ' ' produces single-space separation; no leading/trailing space because we never push empty words and we don’t terminate with a separator (Python join handles this).

Complexity

Time: O(N) — single tokenization pass plus single output assembly pass. Space: O(N) — output and word list.

Implementation Requirements

  • Use an explicit builder (StringBuilder, strings.Builder, [].join, ''.join).
  • Never use += to build the output in a loop in Java/Python/JS.
  • Don’t rely on regex unless the interviewer is explicitly fine with it (s.split(/\s+/) works but is overkill).
  • Verify trimming works on " word ".
  • Verify multiple internal spaces collapse on "a b".

Tests

  • Smoke: "the sky is blue""blue is sky the".
  • Unit: leading/trailing spaces; multiple internal spaces; single-word input.
  • Edge: all-spaces input (per constraints invalid; handle gracefully if extended); single character; punctuation as part of word.
  • Large: N = 10^4 input with 10^3 words; assert no quadratic behavior (time it).
  • Random: randomly generate space-and-letter strings; cross-check against " ".join(reversed(s.split())).
  • Invalid: non-ASCII in extended versions (define behavior per language’s iteration model).

Follow-up Questions

  • “Reverse character order within each word as well.” (Reverse each word in place after splitting.)
  • “Reverse in O(1) extra space on a mutable char[].” (The three-reversal trick.)
  • “Handle Unicode where ‘word’ is grapheme-cluster bounded.” (Need an ICU library or equivalent.)
  • “Preserve original whitespace runs.” (Don’t collapse; keep separator tokens.)
  • “What if the string is huge and streamed?” (Process word-by-word from a buffered reader.)

Product Extension

A search-engine query normalizer. Inputs from users have inconsistent whitespace, varying word order. Reverse word order is a feature for “did-you-mean” inversion testing. In production: keep the original for display, normalize for indexing, and accept that the cost of String immutability in Java means hot paths use StringBuilder or even byte arrays directly.

Language/Runtime Follow-ups

  • Python: s.split() (no args) is the magical normalizer. "".join(...) is a single allocation; never use += on str in a loop.
  • Java: String.split("\\s+") returns an array; String.trim() separately. Output via StringBuilder. String.join(" ", parts) is the modern one-liner.
  • Go: strings.Fields(s) splits on any whitespace and trims; strings.Join(parts, " ") rebuilds. Both are O(N).
  • C++: std::stringstream for tokenizing; build via std::string += , which has small-string optimization but still amortized O(N).
  • JS/TS: s.trim().split(/\s+/).reverse().join(' '). Beware: the empty-string case "".split(/\s+/) returns [""], not [].
  • Unicode subtlety: in Java/JS, length counts UTF-16 code units; '😀' is length 2. Doesn’t matter here unless emoji-as-word.

Common Bugs

  1. Off-by-one on whitespace — leaving a trailing space after join.
  2. s += c in a loop — O(N²) in Python, Java, JS. Catastrophic on large input.
  3. Splitting by ' ' instead of by whitespace"a b".split(' ') returns ["a", "", "b"] in Java/JS; you get empty tokens.
  4. Trim missed — leading whitespace becomes a leading empty token.
  5. Mutating the input — in some languages strings are immutable so this is a type error; in C++/Go-byte-slice it’s a semantic bug.

Debugging Strategy

  • Print the tokenized list. If it has empty strings, your splitter doesn’t collapse.
  • If output has trailing space, your join builds it manually rather than via the standard library.
  • Time on a 10^4 input. If it’s > 1 second, you have a quadratic concat hidden somewhere.

Mastery Criteria

  • Wrote the one-liner in 30 seconds and explained why each piece is needed.
  • Wrote the manual tokenizer in under 5 minutes.
  • Stated aloud “in this language, strings are immutable, so I will use a builder.”
  • Identified the difference between split(' ') and split('\\s+')/split() (no-arg).
  • Acknowledged the in-place char[] three-reversal alternative without writing it (or wrote it on follow-up).
  • Tested with " a b " and confirmed clean output.