Go Runtime Deep Dive
Target audience: candidates interviewing in Go for infrastructure, distributed systems, cloud-native (Kubernetes / Docker / etcd ecosystems), or any backend role where the interviewer asks “explain how goroutines actually work.”
Scope: gc (the standard Go compiler) on Go 1.22+. gccgo and TinyGo only mentioned where they change interview-grade answers.
Go’s surface looks small. The runtime is not. The interview gap appears immediately when an interviewer asks “what’s the difference between a goroutine and an OS thread?”, “what does nil != nil even mean?”, or “why does this loop variable do that?”. This guide trains the answers.
1. Runtime Overview — M:N Scheduling
Go runs your code under a runtime linked into every binary. The runtime owns:
- The goroutine scheduler (M:N — many goroutines onto few OS threads).
- The garbage collector (concurrent tri-color mark-sweep).
- The memory allocator (TCMalloc-derived, per-P caches).
- Channel and sync primitives, network poller, timers, profilers.
A “goroutine” is not an OS thread. It’s a small (~2 KB initial stack) cooperatively-scheduled task multiplexed onto a pool of OS threads. The runtime can have thousands of goroutines on a handful of threads.
// A million goroutines is normal.
for i := 0; i < 1_000_000; i++ {
go func() { /* ... */ }()
}
This is feasible because each goroutine starts with ~2 KB of stack (vs ~1 MB for an OS thread default) and the stack grows as needed.
Stack growth
Goroutine stacks are segmented / split historically, contiguous-grow since 1.4: when the stack overflows, the runtime allocates a bigger stack and copies all frames + adjusts pointers. This is the reason taking the address of a stack-allocated variable is safe in Go: even if the stack moves, references stay valid.
2. The GMP Scheduler
Three runtime objects:
| Stands for | What it is | |
|---|---|---|
| G | Goroutine | A goroutine: stack + program counter + status |
| M | Machine | An OS thread |
| P | Processor | A logical scheduler context; holds a runnable G queue |
Number of P’s = GOMAXPROCS (default: number of CPUs). Each P has a local runnable queue. M’s bind to a P to execute G’s; an M without a P cannot run Go code.
P0 [G G G G ...] P1 [G G ...] P2 [G G G G G ...]
│ │ │
M0 M1 M2 (OS threads)
Steal work
When a P’s queue is empty, it steals half from a random other P’s queue. Keeps cores busy without a global lock.
What happens on a blocking syscall
The M making the syscall detaches from its P and blocks. The P picks up another M (creating one if needed) and keeps scheduling. When the syscall returns, the original M tries to reacquire a P; if none is free it parks the G on the global queue.
This is why read(fd, ...) on a regular file blocks an OS thread but does not block your other goroutines — they keep running on other M’s.
Network poller
Network I/O is epoll/kqueue/IOCP under the hood. A goroutine doing conn.Read parks itself, registers with the poller, and another goroutine runs. When the fd is readable, the poller wakes the parked G. No M is consumed while parked. This is why Go scales to 100K+ concurrent network connections trivially.
Preemption
Up to Go 1.13, goroutines yielded only at function preludes (so a tight CPU loop without function calls could starve others). Since 1.14, asynchronous preemption uses signals to interrupt a goroutine mid-instruction.
// Pre-1.14, this could starve everything else; today it's preempted.
go func() { for {} }()
Interview framing
“What’s the difference between a goroutine and a thread?”
Goroutine: ~2KB stack, cooperative + signal-preempted, scheduled by Go runtime onto a pool of OS threads. Thread: ~1MB stack, OS-scheduled, costlier context switches. Goroutines are the unit you think about; M’s are an implementation detail.
3. Goroutines vs Threads — Practical Implications
// I/O fanout pattern
results := make(chan Result, len(urls))
for _, u := range urls {
u := u // pre-1.22: required to capture
go func() {
results <- fetch(u)
}()
}
for range urls {
r := <-results
process(r)
}
Costs:
- Goroutine creation: ~1µs.
- Channel ops: ~50–100ns uncontended; mutexes similar.
- Context switch: ~200ns within Go runtime; blocking syscalls add OS thread cost.
Sharp edge: unlike OS threads, goroutines do not have IDs. By design — they discourage thread-local-state patterns. This breaks naïve port of Java idioms.
4. Channels — Buffered, Unbuffered, select
A channel is a typed bounded queue with built-in synchronization.
| Construct | Behavior |
|---|---|
make(chan T) | Unbuffered: send and recv must rendezvous. Sender blocks until a receiver is ready. |
make(chan T, n) | Buffered: sender blocks only when buffer is full. |
close(ch) | Recvs drain remaining values, then receive zero values. Send to closed → panic. |
ch := make(chan int, 2)
ch <- 1
ch <- 2
close(ch)
for v := range ch { fmt.Println(v) } // 1, 2
select
Multiplexes channels — picks any ready case (random tie-break). Unblocks composing producers/consumers, timeouts, cancellation.
select {
case v := <-in:
use(v)
case out <- val:
// ...
case <-time.After(2 * time.Second):
return errors.New("timeout")
case <-ctx.Done():
return ctx.Err()
}
Nil-channel pattern
A nil channel blocks forever. Setting a case’s channel to nil disables it:
var done chan struct{} = nil
// case <-done: never fires
Useful when iterating over multiple channels and “turning off” one as it completes.
Closing semantics
- Receivers detect close with
v, ok := <-ch; okis false on closed-and-drained. - Only the sender should close. Closing on the receiver side requires extra coordination because closing a channel that someone else may send to → panic.
- Don’t close a channel just to “free” it; let the GC handle that.
5. Sync Primitives
| Use for | |
|---|---|
sync.Mutex | Mutual exclusion |
sync.RWMutex | Many readers / few writers (do measure — RW often loses to plain Mutex) |
sync.Once | Idempotent one-time init |
sync.WaitGroup | Wait for N goroutines |
sync.Cond | Condition variable; rarely needed (channels usually clearer) |
sync/atomic | CAS, atomic add/load/store on int32/int64/pointer |
sync.Map | Concurrent map only when read-mostly with disjoint key sets |
var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
// CS
sync.Map is not always faster
It’s optimized for two specific patterns:
- Stable disjoint key sets per goroutine.
- Mostly reads, rare writes.
For everything else, a regular map[K]V + sync.Mutex (or shards) is faster and clearer.
WaitGroup
var wg sync.WaitGroup
for _, x := range data {
wg.Add(1)
go func(x Item) {
defer wg.Done()
process(x)
}(x)
}
wg.Wait()
Trap: wg.Add must happen before the goroutine starts running, never inside it.
6. Memory Model
Go has a documented memory model (re-articulated in 2022 for clarity). Key rules:
- A read sees writes that happen before it.
- Within a goroutine: program order.
- Goroutine creation happens before its first instruction.
- A send on a channel happens before the corresponding receive completes.
- Close of a channel happens before a receive that returns the zero value due to close.
m.Unlockhappens before subsequentm.Lock.sync/atomic: each atomic op is sequentially consistent; pairs ordered by HB.
// Without sync, this is racy.
var data []int
var ready bool
go func() {
data = makeData()
ready = true // RACE — no HB to the reader
}()
for !ready {} // may loop forever (compiler/CPU can hoist)
use(data)
Fix with channel, mutex, or atomic.
Race detector
Always run tests with -race:
go test -race ./...
It instruments memory accesses, catches actual data races (not just suspicious code). Cheap insurance; one of Go’s killer features.
7. Garbage Collector — Concurrent Tri-color Mark-Sweep
Go’s GC is concurrent, non-moving, tri-color mark-sweep with write barriers.
- Tri-color: white = not yet visited, grey = visited but children not, black = done.
- Write barrier: intercepts pointer writes during mark to maintain invariants while the mutator runs.
- Non-moving: objects don’t relocate. Pointers stay stable. (Trade-off: no compaction, more fragmentation.)
Pause time
Sub-millisecond STW for stack scanning + write-barrier setup. Most marking happens concurrently with your program. No “young generation” — Go’s GC is uniform.
Pacing
GC triggers when heap doubles since last collection (GOGC=100 default). Lower for less footprint at the cost of CPU; higher for less GC at the cost of memory.
GOGC=200 ./app # GC less often
GOGC=off ./app # disable (for benchmarks)
Soft / hard memory limits
runtime/debug.SetMemoryLimit(n) (Go 1.19+) sets a soft limit; the GC trades CPU for staying under it. Useful in containers — set it to 0.9 * cgroup_limit to avoid OOM-kills.
Escape analysis
The compiler decides at compile time whether a value can stay on the stack. If a pointer “escapes” the function, the value is heap-allocated.
func f() *int {
x := 1
return &x // escapes — heap allocation
}
go run -gcflags='-m' main.go
// prints: x escapes to heap
Knowing what allocates lets you avoid GC pressure in hot paths. Stack allocation is essentially free; heap allocation costs ~30ns + future GC scan.
8. Slice Internals
A slice is a 3-word struct: (ptr *T, len int, cap int). Slicing does not copy — it’s a view.
a := []int{1, 2, 3, 4, 5}
b := a[1:4] // [2 3 4], cap=4 (from index 1 to end of underlying array)
b[0] = 99
fmt.Println(a) // [1 99 3 4 5] — shared backing array!
append semantics
b = append(b, 10) // if len < cap: in place; else allocate new backing array
Growth: double up to ~256 elements, then ~1.25× (Go 1.18 changed the heuristics slightly). The new slice’s backing array is independent of any older slice that still points at the old one.
a := make([]int, 4, 4)
b := a[:2]
c := append(b, 99) // overwrites a[2]
fmt.Println(a, c) // [1 1 99 1] [1 1 99]
d := append(c, 1, 2, 3) // reallocates; d disjoint from a
This is the slice aliasing gotcha that loses interviews. The fix is to be explicit:
b := append([]int(nil), source...) // explicit copy
Three-index slice
a[lo:hi:max] caps the new slice’s cap at max - lo. Use it when handing out a slice you don’t want the receiver to extend into your data.
9. Map Internals
map[K]V is a hash table with bucket chaining (each bucket holds 8 entries, then chains overflow buckets). Hash is randomized per map (security + iteration order).
m := make(map[string]int, 1000) // pre-size to avoid grows
m["a"] = 1
delete(m, "a")
v, ok := m["a"]
Iteration order is randomized
Every for k := range m produces a different order, even within one run. Don’t depend on it.
for k, v := range m {
// unspecified order
}
Concurrent access
Plain map is not safe for concurrent read/write. Go’s race detector and runtime both panic on detection. Use sync.RWMutex or sync.Map (with caveats from §5).
fatal error: concurrent map writes
nil map
A nil map can be read (returns zero) but not written. A common bug:
var m map[string]int
m["a"] = 1 // PANIC
Use m := map[string]int{} or make(map[string]int).
Complexity
| Op | Avg | Worst |
|---|---|---|
m[k] | O(1) | O(N) under collisions |
m[k] = v | O(1) amortized | O(N) on grow |
delete(m, k) | O(1) | O(N) |
range m | O(N) | O(N) |
Maps shrink lazily — deleting most keys does not return memory. Re-create the map if you care.
10. Strings — Bytes vs Runes
A string is an immutable byte slice. No internal length-of-runes — indexing returns bytes.
s := "héllo"
len(s) // 6 — UTF-8 bytes (é is 2)
s[0] // 'h' (a byte)
s[1] // first byte of é, NOT é
Iterate with range to get runes (decoded code points):
for i, r := range s {
// i: byte index, r: rune (int32 code point)
}
To get rune count: utf8.RuneCountInString(s).
[]byte ↔ string conversion
Both directions copy by default (so the immutability invariant holds).
b := []byte(s) // copy
s2 := string(b) // copy
Hot paths can use unsafe.String / unsafe.Slice (Go 1.20+) for zero-copy, but it’s a footgun — only if you can prove the underlying bytes won’t be mutated.
String concat
a + b + c allocates each step → O(N²) in a loop. Use strings.Builder:
var sb strings.Builder
for _, p := range parts {
sb.WriteString(p)
}
return sb.String()
strings.Builder reuses its buffer and avoids the final copy via unsafe.
11. Interfaces — itab, the nil != nil Trap
An interface value is two words: (itab *itab, data *void). The itab holds the dynamic type + method table; data is the concrete value (or pointer to it).
type io.Reader interface { Read(p []byte) (int, error) }
var r io.Reader // itab = nil, data = nil → r == nil
r = (*os.File)(nil) // itab ≠ nil, data = nil → r != nil !!
This is the Go gotcha. Rule: an interface is nil only when both its halves are nil. A typed nil pointer assigned to an interface is not nil.
The footgun in real code:
func mightFail() error {
var e *MyError = nil
if condition() { e = &MyError{...} }
return e // returning a typed-nil pointer -> caller sees != nil
}
Fix:
func mightFail() error {
if condition() { return &MyError{...} }
return nil // explicit nil interface
}
Type assertions and type switches
v, ok := x.(string) // safe assertion
switch v := x.(type) {
case int: use(v)
case string: use(v)
default: ...
}
Type assertions are O(1) for non-empty interfaces (one slot in the itab). For empty interfaces (any), the runtime walks the method table — still fast but not free.
12. Error Handling
Errors are values. Everything else is style.
v, err := operation()
if err != nil {
return fmt.Errorf("operation failed: %w", err)
}
Wrapping
%w (Go 1.13+) wraps an error, building a chain.
errors.Is(err, io.EOF) // walks the chain
var pathErr *os.PathError
errors.As(err, &pathErr) // unwraps to a specific type
Sentinel errors
var ErrNotFound = errors.New("not found")
return ErrNotFound
Compare with errors.Is, not == — wrapping breaks ==.
panic / recover
panic unwinds stack frames running deferred functions. recover (in a deferred func) catches it. Use only for truly unexpected conditions (programmer bugs, “should never happen”). Not for control flow.
defer func() {
if r := recover(); r != nil {
log.Printf("recovered: %v", r)
}
}()
13. defer
defer schedules a call to run when the surrounding function returns.
f, err := os.Open(path)
if err != nil { return err }
defer f.Close()
Cost and gotchas
-
Pre-1.14, defer was ~50ns. Since 1.14, “open-coded defers” are inlined for many cases — essentially free.
-
Args are evaluated at the
defercall site, not at execution:i := 1 defer fmt.Println(i) // prints 1 i = 2 -
LIFO ordering — deferred calls run in reverse.
-
deferin a loop accumulates. Don’tdefer f.Close()insideforover thousands of files; close manually or wrap the body in a function.
14. Context
context.Context propagates deadlines, cancellation, and request-scoped values across API boundaries.
ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel()
resp, err := http.NewRequestWithContext(ctx, "GET", url, nil)
Rules
- Pass
ctxas the first parameter, never store it in a struct field for long-lived state. - Always call
cancel— even on success — to release resources.defer cancel()is the pattern. - Don’t pass
nilctx; usecontext.TODO()if you don’t have one yet. ctx.Valueis for request-scoped data (auth principal, request ID), not for optional config.- A child context is cancelled when its parent is cancelled.
Detecting cancellation
select {
case <-ctx.Done():
return ctx.Err()
case v := <-work:
return process(v)
}
15. Goroutine Leaks
A goroutine leak happens when a goroutine blocks forever on a channel that never receives, a mutex never released, etc. The runtime never reclaims it. In long-running services, leaks compound.
Common shape
func bad() <-chan int {
out := make(chan int) // unbuffered
go func() {
out <- expensive() // blocks forever if caller drops the chan
}()
return out
}
Fixes:
- Buffer the channel for one value (drop on send if no receiver).
- Use
selectwithctx.Done().
go func() {
select {
case out <- expensive():
case <-ctx.Done():
}
}()
Detecting leaks
go testwithgoleak(Uber library) at the end of tests.runtime.NumGoroutine()in production — a steadily growing number is a leak.pprofgoroutine profile:curl http://localhost:6060/debug/pprof/goroutine?debug=2.
16. Testing and Benchmarking
Tests
func TestAdd(t *testing.T) {
if got := Add(1, 2); got != 3 {
t.Errorf("Add(1,2) = %d, want 3", got)
}
}
Table-driven tests
Idiomatic Go — readable, easy to extend.
tests := []struct {
name string
in, want int
}{
{"zero", 0, 0},
{"pos", 1, 2},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := Double(tt.in); got != tt.want {
t.Errorf("got %d want %d", got, tt.want)
}
})
}
Benchmarks
func BenchmarkX(b *testing.B) {
for i := 0; i < b.N; i++ {
X()
}
}
go test -bench=. runs them. b.ReportAllocs() includes alloc counts. Always look at allocs/op — the JIT here doesn’t exist; allocations directly drive GC pressure.
go test -bench=. -benchmem
Fuzzing (Go 1.18+)
func FuzzParse(f *testing.F) {
f.Add("hello")
f.Fuzz(func(t *testing.T, s string) {
Parse(s)
})
}
Use for parsers, decoders, anything taking adversarial input.
17. Common Interview Gotchas
Loop variable capture (pre-1.22)
for _, v := range items {
go func() { process(v) }() // pre-1.22: all goroutines see last v
}
Fix pre-1.22: shadow v := v inside the loop. Go 1.22 fixed this — each iteration has its own copy. State which Go version you’re on.
Slice aliasing
See §8.
Map iteration order
Randomized. Don’t rely on it. Don’t rely on it. Don’t rely on it.
Nil interface vs typed-nil pointer
See §11.
== on slices / maps / functions
Compile error. Slices/maps/funcs aren’t comparable. Use reflect.DeepEqual or write per-field comparison.
defer in a loop
for _, p := range paths {
f, _ := os.Open(p)
defer f.Close() // Hundreds of open files — close at func return
}
Wrap the body in a function or close explicitly.
Range over a channel
for v := range ch continues until ch is closed. If never closed, leaks.
Goroutine started with shared mutable state
data := []int{1, 2, 3}
go modify(&data) // race unless guarded
Always guard with mutex or send via channel.
18. Performance Hot Tips
-
Pre-size slices and maps:
make([]T, 0, n),make(map[K]V, n). Avoid resize churn. -
Avoid heap allocations in hot loops. Use
-gcflags='-m'to find escape culprits. Reuse buffers viasync.Pool.var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }} buf := bufPool.Get().(*bytes.Buffer) defer func() { buf.Reset(); bufPool.Put(buf) }() -
strings.Builderfor concatenation,bytes.Bufferfor byte building. -
Prefer fixed-size arrays / structs over slices in tight code when size is known.
-
Goroutines aren’t free. Spawning one per CPU-microtask in a 1ns loop is slower than the loop. They shine for IO and large work units.
-
Avoid
interface{}in hot paths. Boxing primitives heap-allocates and adds an itab indirection per call. -
Profile.
go test -cpuprofile,pprof, the runtime tracer (go tool trace).
go test -bench=. -cpuprofile=cpu.out
go tool pprof -http=:8080 cpu.out
runtime.GC()anddebug.SetGCPercentare levers, not solutions. Reduce allocation first.sync.Poolis not a general-purpose cache; the runtime drops its contents on every GC. Use it for short-lived reusable buffers.
What To Memorize Cold
- GMP scheduler. Goroutines (~2KB) ≠ OS threads. M:N. P count =
GOMAXPROCS. - Goroutine stacks grow by copy. Network I/O via runtime poller, no M consumed.
- Channels: unbuffered = rendezvous. Send to closed → panic. Nil chan blocks forever.
- Memory model: race detector with
-race. Channel send happens-before recv. Mutex unlock HB next lock. - GC: concurrent tri-color mark-sweep, non-moving, sub-ms pauses.
GOGCandSetMemoryLimit. - Slices =
(ptr, len, cap).appendmay alias or reallocate. Aliasing bugs are common. - Maps: randomized iteration, not concurrent-safe, not comparable, panic on nil-map write.
- Strings: immutable bytes. Range over string yields runes.
- Interface = (itab, data). Typed-nil pointer in interface ≠ nil interface.
- Loop var capture fixed in Go 1.22.
defercheap since 1.14, args eval at scheduling time.contextfirst arg, alwaysdefer cancel().- Goroutine leaks via blocked channels —
selectonctx.Done(). - Pre-size slices/maps.
sync.Poolfor buffer reuse.pproffor everything else.
When any of those is hazy, write a 10-line program that tickles it. The race detector and -gcflags='-m' are unusually fast feedback loops compared to other languages.