Lab 02 — String Mechanics: Reverse Words In A String

Goal

Master string immutability, builder patterns, encoding gotchas, and the cost of naive concatenation. The deliverable: reverse the order of words in a sentence efficiently in your interview language, demonstrating you understand why the “obvious” solution can be O(N²) in some languages and O(N) in others.

Background Concepts

String immutability and the resulting cost of s += c loops; StringBuilder / strings.Builder / "".join() patterns; substring complexity; whitespace tokenization; Unicode pitfalls. Review the Strings section of the Phase 1 README and item 4 in the runtime concepts (mutable vs immutable).

Interview Context

A staple of Microsoft, Amazon, and Bloomberg phone screens. The trap is candidates who write result = ""; for w in reversed(words): result += w + " " and don’t realize they just shipped O(N²) code in Java or Python. Strong candidates state the immutability fact aloud and choose a builder pattern.

Problem Statement

Given a string s representing a sentence, return a new string with the order of words reversed. A “word” is a maximal run of non-space characters. Multiple spaces between words and leading/trailing spaces must be collapsed to single spaces; the output has no leading/trailing space.

Constraints

1 ≤ |s| ≤ 10^4
s contains printable ASCII characters and spaces.
s contains at least one word.

Clarifying Questions

Is the input ASCII or arbitrary Unicode? (Affects iteration model; ASCII is the default unless stated.)
Should multiple internal spaces be preserved or collapsed? (Standard problem says collapse; confirm.)
Trim leading/trailing whitespace? (Yes — output has none.)
Punctuation: is "hello," one word? (Per problem: a “word” is non-space-separated; "hello," is one word.)
Can I allocate a new string, or must I work in place? (For Python/Java/JS — strings are immutable, so a new string is unavoidable. For C++ std::string, in-place is feasible.)

Examples

Input	Output
`"the sky is blue"`	`"blue is sky the"`
`" hello world "`	`"world hello"`
`"a good example"`	`"example good a"`
`"single"`	`"single"`
`" "`	invalid per constraints

Initial Brute Force

Split on whitespace, reverse the list, join with single spaces.

def reverse_words(s):
    return " ".join(reversed(s.split()))

s.split() (no arg) collapses runs of whitespace and trims, which is exactly the spec. This is one line in Python — but the interviewer wants you to explain what it does.

Brute Force Complexity

Time: O(N) — split is one linear pass, reversed is O(k) where k is word count, join is one linear pass. Space: O(N) for the list of words and the output. This is already optimal asymptotically.

Optimization Path

For interviews where the one-liner is “too easy,” the interviewer escalates: “Do it without split/join; use only character-level operations.” Or: “Reverse in place in a char[] with O(1) extra memory.”

The classic in-place trick on a mutable buffer: reverse the entire buffer, then reverse each word, then collapse internal whitespace. This is the same three-reversal identity from Lab 01, applied to characters.

Final Expected Approach

State the one-liner first. Then offer the manual two-pointer approach for languages with mutable strings or as an “I-understand-the-internals” demonstration.

def reverse_words(s):
    # tokenize without builtins
    words = []
    i = 0
    n = len(s)
    while i < n:
        while i < n and s[i] == ' ':
            i += 1
        j = i
        while j < n and s[j] != ' ':
            j += 1
        if j > i:
            words.append(s[i:j])
        i = j
    # reverse and join via builder
    out = []
    for w in reversed(words):
        out.append(w)
    return ' '.join(out)

In Java, replace the final join with StringBuilder. In Go, with strings.Builder.

Data Structures Used

A list of word strings (or substrings); a builder for the output. No advanced structures.

Correctness Argument

Tokenization invariant: at each outer iteration, i points at the start of unscanned input; the inner loops skip whitespace and capture a word. Each character is examined O(1) times, so tokenization is O(N). Reversed iteration over words produces them in opposite order; joining with ' ' produces single-space separation; no leading/trailing space because we never push empty words and we don’t terminate with a separator (Python join handles this).

Complexity

Time: O(N) — single tokenization pass plus single output assembly pass. Space: O(N) — output and word list.

Implementation Requirements

Use an explicit builder (StringBuilder, strings.Builder, [].join, ''.join).
Never use += to build the output in a loop in Java/Python/JS.
Don’t rely on regex unless the interviewer is explicitly fine with it (s.split(/\s+/) works but is overkill).
Verify trimming works on " word ".
Verify multiple internal spaces collapse on "a b".

Tests

Smoke: "the sky is blue" → "blue is sky the".
Unit: leading/trailing spaces; multiple internal spaces; single-word input.
Edge: all-spaces input (per constraints invalid; handle gracefully if extended); single character; punctuation as part of word.
Large: N = 10^4 input with 10^3 words; assert no quadratic behavior (time it).
Random: randomly generate space-and-letter strings; cross-check against " ".join(reversed(s.split())).
Invalid: non-ASCII in extended versions (define behavior per language’s iteration model).

Follow-up Questions

“Reverse character order within each word as well.” (Reverse each word in place after splitting.)
“Reverse in O(1) extra space on a mutable char[].” (The three-reversal trick.)
“Handle Unicode where ‘word’ is grapheme-cluster bounded.” (Need an ICU library or equivalent.)
“Preserve original whitespace runs.” (Don’t collapse; keep separator tokens.)
“What if the string is huge and streamed?” (Process word-by-word from a buffered reader.)

Product Extension

A search-engine query normalizer. Inputs from users have inconsistent whitespace, varying word order. Reverse word order is a feature for “did-you-mean” inversion testing. In production: keep the original for display, normalize for indexing, and accept that the cost of String immutability in Java means hot paths use StringBuilder or even byte arrays directly.

Language/Runtime Follow-ups

Python: s.split() (no args) is the magical normalizer. "".join(...) is a single allocation; never use += on str in a loop.
Java: String.split("\\s+") returns an array; String.trim() separately. Output via StringBuilder. String.join(" ", parts) is the modern one-liner.
Go: strings.Fields(s) splits on any whitespace and trims; strings.Join(parts, " ") rebuilds. Both are O(N).
C++: std::stringstream for tokenizing; build via std::string += , which has small-string optimization but still amortized O(N).
JS/TS: s.trim().split(/\s+/).reverse().join(' '). Beware: the empty-string case "".split(/\s+/) returns [""], not [].
Unicode subtlety: in Java/JS, length counts UTF-16 code units; '😀' is length 2. Doesn’t matter here unless emoji-as-word.

Common Bugs

Off-by-one on whitespace — leaving a trailing space after join.
s += c in a loop — O(N²) in Python, Java, JS. Catastrophic on large input.
Splitting by ' ' instead of by whitespace — "a b".split(' ') returns ["a", "", "b"] in Java/JS; you get empty tokens.
Trim missed — leading whitespace becomes a leading empty token.
Mutating the input — in some languages strings are immutable so this is a type error; in C++/Go-byte-slice it’s a semantic bug.

Debugging Strategy

Print the tokenized list. If it has empty strings, your splitter doesn’t collapse.
If output has trailing space, your join builds it manually rather than via the standard library.
Time on a 10^4 input. If it’s > 1 second, you have a quadratic concat hidden somewhere.

Mastery Criteria

Wrote the one-liner in 30 seconds and explained why each piece is needed.
Wrote the manual tokenizer in under 5 minutes.
Stated aloud “in this language, strings are immutable, so I will use a builder.”
Identified the difference between split(' ') and split('\\s+')/split() (no-arg).
Acknowledged the in-place char[] three-reversal alternative without writing it (or wrote it on follow-up).
Tested with " a b " and confirmed clean output.

LeetCode Interview — Extreme Coding