Lab 02 — String Mechanics: Reverse Words In A String
Goal
Master string immutability, builder patterns, encoding gotchas, and the cost of naive concatenation. The deliverable: reverse the order of words in a sentence efficiently in your interview language, demonstrating you understand why the “obvious” solution can be O(N²) in some languages and O(N) in others.
Background Concepts
String immutability and the resulting cost of s += c loops; StringBuilder / strings.Builder / "".join() patterns; substring complexity; whitespace tokenization; Unicode pitfalls. Review the Strings section of the Phase 1 README and item 4 in the runtime concepts (mutable vs immutable).
Interview Context
A staple of Microsoft, Amazon, and Bloomberg phone screens. The trap is candidates who write result = ""; for w in reversed(words): result += w + " " and don’t realize they just shipped O(N²) code in Java or Python. Strong candidates state the immutability fact aloud and choose a builder pattern.
Problem Statement
Given a string s representing a sentence, return a new string with the order of words reversed. A “word” is a maximal run of non-space characters. Multiple spaces between words and leading/trailing spaces must be collapsed to single spaces; the output has no leading/trailing space.
Constraints
1 ≤ |s| ≤ 10^4scontains printable ASCII characters and spaces.scontains at least one word.
Clarifying Questions
- Is the input ASCII or arbitrary Unicode? (Affects iteration model; ASCII is the default unless stated.)
- Should multiple internal spaces be preserved or collapsed? (Standard problem says collapse; confirm.)
- Trim leading/trailing whitespace? (Yes — output has none.)
- Punctuation: is
"hello,"one word? (Per problem: a “word” is non-space-separated;"hello,"is one word.) - Can I allocate a new string, or must I work in place? (For Python/Java/JS — strings are immutable, so a new string is unavoidable. For C++
std::string, in-place is feasible.)
Examples
| Input | Output |
|---|---|
"the sky is blue" | "blue is sky the" |
" hello world " | "world hello" |
"a good example" | "example good a" |
"single" | "single" |
" " | invalid per constraints |
Initial Brute Force
Split on whitespace, reverse the list, join with single spaces.
def reverse_words(s):
return " ".join(reversed(s.split()))
s.split() (no arg) collapses runs of whitespace and trims, which is exactly the spec. This is one line in Python — but the interviewer wants you to explain what it does.
Brute Force Complexity
Time: O(N) — split is one linear pass, reversed is O(k) where k is word count, join is one linear pass. Space: O(N) for the list of words and the output. This is already optimal asymptotically.
Optimization Path
For interviews where the one-liner is “too easy,” the interviewer escalates: “Do it without split/join; use only character-level operations.” Or: “Reverse in place in a char[] with O(1) extra memory.”
The classic in-place trick on a mutable buffer: reverse the entire buffer, then reverse each word, then collapse internal whitespace. This is the same three-reversal identity from Lab 01, applied to characters.
Final Expected Approach
State the one-liner first. Then offer the manual two-pointer approach for languages with mutable strings or as an “I-understand-the-internals” demonstration.
def reverse_words(s):
# tokenize without builtins
words = []
i = 0
n = len(s)
while i < n:
while i < n and s[i] == ' ':
i += 1
j = i
while j < n and s[j] != ' ':
j += 1
if j > i:
words.append(s[i:j])
i = j
# reverse and join via builder
out = []
for w in reversed(words):
out.append(w)
return ' '.join(out)
In Java, replace the final join with StringBuilder. In Go, with strings.Builder.
Data Structures Used
A list of word strings (or substrings); a builder for the output. No advanced structures.
Correctness Argument
Tokenization invariant: at each outer iteration, i points at the start of unscanned input; the inner loops skip whitespace and capture a word. Each character is examined O(1) times, so tokenization is O(N). Reversed iteration over words produces them in opposite order; joining with ' ' produces single-space separation; no leading/trailing space because we never push empty words and we don’t terminate with a separator (Python join handles this).
Complexity
Time: O(N) — single tokenization pass plus single output assembly pass. Space: O(N) — output and word list.
Implementation Requirements
- Use an explicit builder (
StringBuilder,strings.Builder,[].join,''.join). - Never use
+=to build the output in a loop in Java/Python/JS. - Don’t rely on regex unless the interviewer is explicitly fine with it (
s.split(/\s+/)works but is overkill). - Verify trimming works on
" word ". - Verify multiple internal spaces collapse on
"a b".
Tests
- Smoke:
"the sky is blue"→"blue is sky the". - Unit: leading/trailing spaces; multiple internal spaces; single-word input.
- Edge: all-spaces input (per constraints invalid; handle gracefully if extended); single character; punctuation as part of word.
- Large: N = 10^4 input with 10^3 words; assert no quadratic behavior (time it).
- Random: randomly generate space-and-letter strings; cross-check against
" ".join(reversed(s.split())). - Invalid: non-ASCII in extended versions (define behavior per language’s iteration model).
Follow-up Questions
- “Reverse character order within each word as well.” (Reverse each word in place after splitting.)
- “Reverse in O(1) extra space on a mutable
char[].” (The three-reversal trick.) - “Handle Unicode where ‘word’ is grapheme-cluster bounded.” (Need an ICU library or equivalent.)
- “Preserve original whitespace runs.” (Don’t collapse; keep separator tokens.)
- “What if the string is huge and streamed?” (Process word-by-word from a buffered reader.)
Product Extension
A search-engine query normalizer. Inputs from users have inconsistent whitespace, varying word order. Reverse word order is a feature for “did-you-mean” inversion testing. In production: keep the original for display, normalize for indexing, and accept that the cost of String immutability in Java means hot paths use StringBuilder or even byte arrays directly.
Language/Runtime Follow-ups
- Python:
s.split()(no args) is the magical normalizer."".join(...)is a single allocation; never use+=onstrin a loop. - Java:
String.split("\\s+")returns an array;String.trim()separately. Output viaStringBuilder.String.join(" ", parts)is the modern one-liner. - Go:
strings.Fields(s)splits on any whitespace and trims;strings.Join(parts, " ")rebuilds. Both are O(N). - C++:
std::stringstreamfor tokenizing; build viastd::string +=, which has small-string optimization but still amortized O(N). - JS/TS:
s.trim().split(/\s+/).reverse().join(' '). Beware: the empty-string case"".split(/\s+/)returns[""], not[]. - Unicode subtlety: in Java/JS,
lengthcounts UTF-16 code units;'😀'is length 2. Doesn’t matter here unless emoji-as-word.
Common Bugs
- Off-by-one on whitespace — leaving a trailing space after
join. s += cin a loop — O(N²) in Python, Java, JS. Catastrophic on large input.- Splitting by
' 'instead of by whitespace —"a b".split(' ')returns["a", "", "b"]in Java/JS; you get empty tokens. - Trim missed — leading whitespace becomes a leading empty token.
- Mutating the input — in some languages strings are immutable so this is a type error; in C++/Go-byte-slice it’s a semantic bug.
Debugging Strategy
- Print the tokenized list. If it has empty strings, your splitter doesn’t collapse.
- If output has trailing space, your join builds it manually rather than via the standard library.
- Time on a 10^4 input. If it’s > 1 second, you have a quadratic concat hidden somewhere.
Mastery Criteria
- Wrote the one-liner in 30 seconds and explained why each piece is needed.
- Wrote the manual tokenizer in under 5 minutes.
- Stated aloud “in this language, strings are immutable, so I will use a builder.”
- Identified the difference between
split(' ')andsplit('\\s+')/split()(no-arg). - Acknowledged the in-place
char[]three-reversal alternative without writing it (or wrote it on follow-up). - Tested with
" a b "and confirmed clean output.