Go Runtime Deep Dive

Target audience: candidates interviewing in Go for infrastructure, distributed systems, cloud-native (Kubernetes / Docker / etcd ecosystems), or any backend role where the interviewer asks “explain how goroutines actually work.”

Scope: gc (the standard Go compiler) on Go 1.22+. gccgo and TinyGo only mentioned where they change interview-grade answers.

Go’s surface looks small. The runtime is not. The interview gap appears immediately when an interviewer asks “what’s the difference between a goroutine and an OS thread?”, “what does nil != nil even mean?”, or “why does this loop variable do that?”. This guide trains the answers.


1. Runtime Overview — M:N Scheduling

Go runs your code under a runtime linked into every binary. The runtime owns:

  • The goroutine scheduler (M:N — many goroutines onto few OS threads).
  • The garbage collector (concurrent tri-color mark-sweep).
  • The memory allocator (TCMalloc-derived, per-P caches).
  • Channel and sync primitives, network poller, timers, profilers.

A “goroutine” is not an OS thread. It’s a small (~2 KB initial stack) cooperatively-scheduled task multiplexed onto a pool of OS threads. The runtime can have thousands of goroutines on a handful of threads.

// A million goroutines is normal.
for i := 0; i < 1_000_000; i++ {
    go func() { /* ... */ }()
}

This is feasible because each goroutine starts with ~2 KB of stack (vs ~1 MB for an OS thread default) and the stack grows as needed.

Stack growth

Goroutine stacks are segmented / split historically, contiguous-grow since 1.4: when the stack overflows, the runtime allocates a bigger stack and copies all frames + adjusts pointers. This is the reason taking the address of a stack-allocated variable is safe in Go: even if the stack moves, references stay valid.


2. The GMP Scheduler

Three runtime objects:

Stands forWhat it is
GGoroutineA goroutine: stack + program counter + status
MMachineAn OS thread
PProcessorA logical scheduler context; holds a runnable G queue

Number of P’s = GOMAXPROCS (default: number of CPUs). Each P has a local runnable queue. M’s bind to a P to execute G’s; an M without a P cannot run Go code.

    P0 [G G G G ...]    P1 [G G ...]   P2 [G G G G G ...]
     │                   │                │
     M0                  M1               M2     (OS threads)

Steal work

When a P’s queue is empty, it steals half from a random other P’s queue. Keeps cores busy without a global lock.

What happens on a blocking syscall

The M making the syscall detaches from its P and blocks. The P picks up another M (creating one if needed) and keeps scheduling. When the syscall returns, the original M tries to reacquire a P; if none is free it parks the G on the global queue.

This is why read(fd, ...) on a regular file blocks an OS thread but does not block your other goroutines — they keep running on other M’s.

Network poller

Network I/O is epoll/kqueue/IOCP under the hood. A goroutine doing conn.Read parks itself, registers with the poller, and another goroutine runs. When the fd is readable, the poller wakes the parked G. No M is consumed while parked. This is why Go scales to 100K+ concurrent network connections trivially.

Preemption

Up to Go 1.13, goroutines yielded only at function preludes (so a tight CPU loop without function calls could starve others). Since 1.14, asynchronous preemption uses signals to interrupt a goroutine mid-instruction.

// Pre-1.14, this could starve everything else; today it's preempted.
go func() { for {} }()

Interview framing

“What’s the difference between a goroutine and a thread?”

Goroutine: ~2KB stack, cooperative + signal-preempted, scheduled by Go runtime onto a pool of OS threads. Thread: ~1MB stack, OS-scheduled, costlier context switches. Goroutines are the unit you think about; M’s are an implementation detail.


3. Goroutines vs Threads — Practical Implications

// I/O fanout pattern
results := make(chan Result, len(urls))
for _, u := range urls {
    u := u                   // pre-1.22: required to capture
    go func() {
        results <- fetch(u)
    }()
}
for range urls {
    r := <-results
    process(r)
}

Costs:

  • Goroutine creation: ~1µs.
  • Channel ops: ~50–100ns uncontended; mutexes similar.
  • Context switch: ~200ns within Go runtime; blocking syscalls add OS thread cost.

Sharp edge: unlike OS threads, goroutines do not have IDs. By design — they discourage thread-local-state patterns. This breaks naïve port of Java idioms.


4. Channels — Buffered, Unbuffered, select

A channel is a typed bounded queue with built-in synchronization.

ConstructBehavior
make(chan T)Unbuffered: send and recv must rendezvous. Sender blocks until a receiver is ready.
make(chan T, n)Buffered: sender blocks only when buffer is full.
close(ch)Recvs drain remaining values, then receive zero values. Send to closed → panic.
ch := make(chan int, 2)
ch <- 1
ch <- 2
close(ch)
for v := range ch { fmt.Println(v) }   // 1, 2

select

Multiplexes channels — picks any ready case (random tie-break). Unblocks composing producers/consumers, timeouts, cancellation.

select {
case v := <-in:
    use(v)
case out <- val:
    // ...
case <-time.After(2 * time.Second):
    return errors.New("timeout")
case <-ctx.Done():
    return ctx.Err()
}

Nil-channel pattern

A nil channel blocks forever. Setting a case’s channel to nil disables it:

var done chan struct{} = nil
// case <-done: never fires

Useful when iterating over multiple channels and “turning off” one as it completes.

Closing semantics

  • Receivers detect close with v, ok := <-ch; ok is false on closed-and-drained.
  • Only the sender should close. Closing on the receiver side requires extra coordination because closing a channel that someone else may send to → panic.
  • Don’t close a channel just to “free” it; let the GC handle that.

5. Sync Primitives

Use for
sync.MutexMutual exclusion
sync.RWMutexMany readers / few writers (do measure — RW often loses to plain Mutex)
sync.OnceIdempotent one-time init
sync.WaitGroupWait for N goroutines
sync.CondCondition variable; rarely needed (channels usually clearer)
sync/atomicCAS, atomic add/load/store on int32/int64/pointer
sync.MapConcurrent map only when read-mostly with disjoint key sets
var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
// CS

sync.Map is not always faster

It’s optimized for two specific patterns:

  1. Stable disjoint key sets per goroutine.
  2. Mostly reads, rare writes.

For everything else, a regular map[K]V + sync.Mutex (or shards) is faster and clearer.

WaitGroup

var wg sync.WaitGroup
for _, x := range data {
    wg.Add(1)
    go func(x Item) {
        defer wg.Done()
        process(x)
    }(x)
}
wg.Wait()

Trap: wg.Add must happen before the goroutine starts running, never inside it.


6. Memory Model

Go has a documented memory model (re-articulated in 2022 for clarity). Key rules:

  • A read sees writes that happen before it.
  • Within a goroutine: program order.
  • Goroutine creation happens before its first instruction.
  • A send on a channel happens before the corresponding receive completes.
  • Close of a channel happens before a receive that returns the zero value due to close.
  • m.Unlock happens before subsequent m.Lock.
  • sync/atomic: each atomic op is sequentially consistent; pairs ordered by HB.
// Without sync, this is racy.
var data []int
var ready bool

go func() {
    data = makeData()
    ready = true        // RACE — no HB to the reader
}()

for !ready {}           // may loop forever (compiler/CPU can hoist)
use(data)

Fix with channel, mutex, or atomic.

Race detector

Always run tests with -race:

go test -race ./...

It instruments memory accesses, catches actual data races (not just suspicious code). Cheap insurance; one of Go’s killer features.


7. Garbage Collector — Concurrent Tri-color Mark-Sweep

Go’s GC is concurrent, non-moving, tri-color mark-sweep with write barriers.

  • Tri-color: white = not yet visited, grey = visited but children not, black = done.
  • Write barrier: intercepts pointer writes during mark to maintain invariants while the mutator runs.
  • Non-moving: objects don’t relocate. Pointers stay stable. (Trade-off: no compaction, more fragmentation.)

Pause time

Sub-millisecond STW for stack scanning + write-barrier setup. Most marking happens concurrently with your program. No “young generation” — Go’s GC is uniform.

Pacing

GC triggers when heap doubles since last collection (GOGC=100 default). Lower for less footprint at the cost of CPU; higher for less GC at the cost of memory.

GOGC=200 ./app   # GC less often
GOGC=off ./app   # disable (for benchmarks)

Soft / hard memory limits

runtime/debug.SetMemoryLimit(n) (Go 1.19+) sets a soft limit; the GC trades CPU for staying under it. Useful in containers — set it to 0.9 * cgroup_limit to avoid OOM-kills.

Escape analysis

The compiler decides at compile time whether a value can stay on the stack. If a pointer “escapes” the function, the value is heap-allocated.

func f() *int {
    x := 1
    return &x      // escapes — heap allocation
}

go run -gcflags='-m' main.go
// prints: x escapes to heap

Knowing what allocates lets you avoid GC pressure in hot paths. Stack allocation is essentially free; heap allocation costs ~30ns + future GC scan.


8. Slice Internals

A slice is a 3-word struct: (ptr *T, len int, cap int). Slicing does not copy — it’s a view.

a := []int{1, 2, 3, 4, 5}
b := a[1:4]      // [2 3 4], cap=4 (from index 1 to end of underlying array)
b[0] = 99
fmt.Println(a)   // [1 99 3 4 5] — shared backing array!

append semantics

b = append(b, 10)   // if len < cap: in place; else allocate new backing array

Growth: double up to ~256 elements, then ~1.25× (Go 1.18 changed the heuristics slightly). The new slice’s backing array is independent of any older slice that still points at the old one.

a := make([]int, 4, 4)
b := a[:2]
c := append(b, 99)        // overwrites a[2]
fmt.Println(a, c)         // [1 1 99 1] [1 1 99]
d := append(c, 1, 2, 3)   // reallocates; d disjoint from a

This is the slice aliasing gotcha that loses interviews. The fix is to be explicit:

b := append([]int(nil), source...)   // explicit copy

Three-index slice

a[lo:hi:max] caps the new slice’s cap at max - lo. Use it when handing out a slice you don’t want the receiver to extend into your data.


9. Map Internals

map[K]V is a hash table with bucket chaining (each bucket holds 8 entries, then chains overflow buckets). Hash is randomized per map (security + iteration order).

m := make(map[string]int, 1000)  // pre-size to avoid grows
m["a"] = 1
delete(m, "a")
v, ok := m["a"]

Iteration order is randomized

Every for k := range m produces a different order, even within one run. Don’t depend on it.

for k, v := range m {
    // unspecified order
}

Concurrent access

Plain map is not safe for concurrent read/write. Go’s race detector and runtime both panic on detection. Use sync.RWMutex or sync.Map (with caveats from §5).

fatal error: concurrent map writes

nil map

A nil map can be read (returns zero) but not written. A common bug:

var m map[string]int
m["a"] = 1   // PANIC

Use m := map[string]int{} or make(map[string]int).

Complexity

OpAvgWorst
m[k]O(1)O(N) under collisions
m[k] = vO(1) amortizedO(N) on grow
delete(m, k)O(1)O(N)
range mO(N)O(N)

Maps shrink lazily — deleting most keys does not return memory. Re-create the map if you care.


10. Strings — Bytes vs Runes

A string is an immutable byte slice. No internal length-of-runes — indexing returns bytes.

s := "héllo"
len(s)     // 6 — UTF-8 bytes (é is 2)
s[0]       // 'h' (a byte)
s[1]       // first byte of é, NOT é

Iterate with range to get runes (decoded code points):

for i, r := range s {
    // i: byte index, r: rune (int32 code point)
}

To get rune count: utf8.RuneCountInString(s).

[]bytestring conversion

Both directions copy by default (so the immutability invariant holds).

b := []byte(s)      // copy
s2 := string(b)     // copy

Hot paths can use unsafe.String / unsafe.Slice (Go 1.20+) for zero-copy, but it’s a footgun — only if you can prove the underlying bytes won’t be mutated.

String concat

a + b + c allocates each step → O(N²) in a loop. Use strings.Builder:

var sb strings.Builder
for _, p := range parts {
    sb.WriteString(p)
}
return sb.String()

strings.Builder reuses its buffer and avoids the final copy via unsafe.


11. Interfaces — itab, the nil != nil Trap

An interface value is two words: (itab *itab, data *void). The itab holds the dynamic type + method table; data is the concrete value (or pointer to it).

type io.Reader interface { Read(p []byte) (int, error) }

var r io.Reader      // itab = nil, data = nil  → r == nil
r = (*os.File)(nil)  // itab ≠ nil, data = nil  → r != nil  !!

This is the Go gotcha. Rule: an interface is nil only when both its halves are nil. A typed nil pointer assigned to an interface is not nil.

The footgun in real code:

func mightFail() error {
    var e *MyError = nil
    if condition() { e = &MyError{...} }
    return e   // returning a typed-nil pointer -> caller sees != nil
}

Fix:

func mightFail() error {
    if condition() { return &MyError{...} }
    return nil   // explicit nil interface
}

Type assertions and type switches

v, ok := x.(string)         // safe assertion
switch v := x.(type) {
case int:    use(v)
case string: use(v)
default:     ...
}

Type assertions are O(1) for non-empty interfaces (one slot in the itab). For empty interfaces (any), the runtime walks the method table — still fast but not free.


12. Error Handling

Errors are values. Everything else is style.

v, err := operation()
if err != nil {
    return fmt.Errorf("operation failed: %w", err)
}

Wrapping

%w (Go 1.13+) wraps an error, building a chain.

errors.Is(err, io.EOF)            // walks the chain
var pathErr *os.PathError
errors.As(err, &pathErr)          // unwraps to a specific type

Sentinel errors

var ErrNotFound = errors.New("not found")
return ErrNotFound

Compare with errors.Is, not == — wrapping breaks ==.

panic / recover

panic unwinds stack frames running deferred functions. recover (in a deferred func) catches it. Use only for truly unexpected conditions (programmer bugs, “should never happen”). Not for control flow.

defer func() {
    if r := recover(); r != nil {
        log.Printf("recovered: %v", r)
    }
}()

13. defer

defer schedules a call to run when the surrounding function returns.

f, err := os.Open(path)
if err != nil { return err }
defer f.Close()

Cost and gotchas

  • Pre-1.14, defer was ~50ns. Since 1.14, “open-coded defers” are inlined for many cases — essentially free.

  • Args are evaluated at the defer call site, not at execution:

    i := 1
    defer fmt.Println(i)   // prints 1
    i = 2
    
  • LIFO ordering — deferred calls run in reverse.

  • defer in a loop accumulates. Don’t defer f.Close() inside for over thousands of files; close manually or wrap the body in a function.


14. Context

context.Context propagates deadlines, cancellation, and request-scoped values across API boundaries.

ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel()

resp, err := http.NewRequestWithContext(ctx, "GET", url, nil)

Rules

  1. Pass ctx as the first parameter, never store it in a struct field for long-lived state.
  2. Always call cancel — even on success — to release resources. defer cancel() is the pattern.
  3. Don’t pass nil ctx; use context.TODO() if you don’t have one yet.
  4. ctx.Value is for request-scoped data (auth principal, request ID), not for optional config.
  5. A child context is cancelled when its parent is cancelled.

Detecting cancellation

select {
case <-ctx.Done():
    return ctx.Err()
case v := <-work:
    return process(v)
}

15. Goroutine Leaks

A goroutine leak happens when a goroutine blocks forever on a channel that never receives, a mutex never released, etc. The runtime never reclaims it. In long-running services, leaks compound.

Common shape

func bad() <-chan int {
    out := make(chan int)         // unbuffered
    go func() {
        out <- expensive()        // blocks forever if caller drops the chan
    }()
    return out
}

Fixes:

  • Buffer the channel for one value (drop on send if no receiver).
  • Use select with ctx.Done().
go func() {
    select {
    case out <- expensive():
    case <-ctx.Done():
    }
}()

Detecting leaks

  • go test with goleak (Uber library) at the end of tests.
  • runtime.NumGoroutine() in production — a steadily growing number is a leak.
  • pprof goroutine profile: curl http://localhost:6060/debug/pprof/goroutine?debug=2.

16. Testing and Benchmarking

Tests

func TestAdd(t *testing.T) {
    if got := Add(1, 2); got != 3 {
        t.Errorf("Add(1,2) = %d, want 3", got)
    }
}

Table-driven tests

Idiomatic Go — readable, easy to extend.

tests := []struct {
    name    string
    in, want int
}{
    {"zero", 0, 0},
    {"pos",  1, 2},
}
for _, tt := range tests {
    t.Run(tt.name, func(t *testing.T) {
        if got := Double(tt.in); got != tt.want {
            t.Errorf("got %d want %d", got, tt.want)
        }
    })
}

Benchmarks

func BenchmarkX(b *testing.B) {
    for i := 0; i < b.N; i++ {
        X()
    }
}

go test -bench=. runs them. b.ReportAllocs() includes alloc counts. Always look at allocs/op — the JIT here doesn’t exist; allocations directly drive GC pressure.

go test -bench=. -benchmem

Fuzzing (Go 1.18+)

func FuzzParse(f *testing.F) {
    f.Add("hello")
    f.Fuzz(func(t *testing.T, s string) {
        Parse(s)
    })
}

Use for parsers, decoders, anything taking adversarial input.


17. Common Interview Gotchas

Loop variable capture (pre-1.22)

for _, v := range items {
    go func() { process(v) }()    // pre-1.22: all goroutines see last v
}

Fix pre-1.22: shadow v := v inside the loop. Go 1.22 fixed this — each iteration has its own copy. State which Go version you’re on.

Slice aliasing

See §8.

Map iteration order

Randomized. Don’t rely on it. Don’t rely on it. Don’t rely on it.

Nil interface vs typed-nil pointer

See §11.

== on slices / maps / functions

Compile error. Slices/maps/funcs aren’t comparable. Use reflect.DeepEqual or write per-field comparison.

defer in a loop

for _, p := range paths {
    f, _ := os.Open(p)
    defer f.Close()             // Hundreds of open files — close at func return
}

Wrap the body in a function or close explicitly.

Range over a channel

for v := range ch continues until ch is closed. If never closed, leaks.

Goroutine started with shared mutable state

data := []int{1, 2, 3}
go modify(&data)              // race unless guarded

Always guard with mutex or send via channel.


18. Performance Hot Tips

  • Pre-size slices and maps: make([]T, 0, n), make(map[K]V, n). Avoid resize churn.

  • Avoid heap allocations in hot loops. Use -gcflags='-m' to find escape culprits. Reuse buffers via sync.Pool.

    var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }}
    buf := bufPool.Get().(*bytes.Buffer)
    defer func() { buf.Reset(); bufPool.Put(buf) }()
    
  • strings.Builder for concatenation, bytes.Buffer for byte building.

  • Prefer fixed-size arrays / structs over slices in tight code when size is known.

  • Goroutines aren’t free. Spawning one per CPU-microtask in a 1ns loop is slower than the loop. They shine for IO and large work units.

  • Avoid interface{} in hot paths. Boxing primitives heap-allocates and adds an itab indirection per call.

  • Profile. go test -cpuprofile, pprof, the runtime tracer (go tool trace).

go test -bench=. -cpuprofile=cpu.out
go tool pprof -http=:8080 cpu.out
  • runtime.GC() and debug.SetGCPercent are levers, not solutions. Reduce allocation first.
  • sync.Pool is not a general-purpose cache; the runtime drops its contents on every GC. Use it for short-lived reusable buffers.

What To Memorize Cold

  • GMP scheduler. Goroutines (~2KB) ≠ OS threads. M:N. P count = GOMAXPROCS.
  • Goroutine stacks grow by copy. Network I/O via runtime poller, no M consumed.
  • Channels: unbuffered = rendezvous. Send to closed → panic. Nil chan blocks forever.
  • Memory model: race detector with -race. Channel send happens-before recv. Mutex unlock HB next lock.
  • GC: concurrent tri-color mark-sweep, non-moving, sub-ms pauses. GOGC and SetMemoryLimit.
  • Slices = (ptr, len, cap). append may alias or reallocate. Aliasing bugs are common.
  • Maps: randomized iteration, not concurrent-safe, not comparable, panic on nil-map write.
  • Strings: immutable bytes. Range over string yields runes.
  • Interface = (itab, data). Typed-nil pointer in interface ≠ nil interface.
  • Loop var capture fixed in Go 1.22.
  • defer cheap since 1.14, args eval at scheduling time.
  • context first arg, always defer cancel().
  • Goroutine leaks via blocked channels — select on ctx.Done().
  • Pre-size slices/maps. sync.Pool for buffer reuse. pprof for everything else.

When any of those is hazy, write a 10-line program that tickles it. The race detector and -gcflags='-m' are unusually fast feedback loops compared to other languages.