Lab 17 — Metrics Collector (Counter / Gauge / Histogram)

Goal

Implement a thread-safe in-process metrics registry that supports the three canonical metric types — counter, gauge, histogram — with bounded memory, label support, and an export format suitable for Prometheus scraping. After this lab you should be able to write the registry from a blank screen in <25 minutes and articulate the difference between summary and histogram in <60 seconds.

Background Concepts

The four metric types in the Prometheus / OpenMetrics ecosystem are counter (monotonic non-decreasing total — requests, errors, bytes), gauge (a current value — queue depth, active connections, memory in use), histogram (a count of observations bucketed by upper bound, used to compute quantiles server-side), and summary (client-side quantiles, harder to aggregate). The first three cover ~95% of production needs. Counters answer “how many?”, gauges answer “how much right now?”, histograms answer “what’s the distribution and the p99?”.

A metric is identified by (name, label_set). The same name with different labels (e.g., http_requests_total{method="GET"} vs http_requests_total{method="POST"}) is a different time series. The number of label combinations is the metric’s cardinality. Unbounded cardinality (e.g., a label per user_id) is the most common production memory leak in metric systems — protect against it.

Histograms are tricky. Two-pass naive implementations (store all observations, sort on export) explode memory. The Prometheus model: pre-declare a fixed set of bucket upper bounds ([0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] is the default) and increment a counter per bucket. Quantile estimation happens at the query layer using bucket interpolation. Memory is O(buckets) per series — cheap.

Interview Context

A staple practical-engineering question at Datadog, Grafana, Honeycomb, and any observability-aware team (which is most of them now). Also asked at Stripe and Uber as a “we want to see how you reason about cardinality” question. Candidates fail this round when they reach for “store every observation in a list” — that betrays a lack of production exposure.

Problem Statement

Implement MetricsRegistry with:

  • counter(name, labels=None).inc(by=1) — monotonically incrementing.
  • gauge(name, labels=None).set(value) / .inc(by=1) / .dec(by=1).
  • histogram(name, labels=None, buckets=...).observe(value).
  • registry.snapshot() returns a list of (name, labels, type, value) tuples or the OpenMetrics text format.
  • Thread-safe; bounded cardinality (configurable maximum number of label combinations per metric name).

Constraints

  • Must support concurrent observations from many threads.
  • Histogram bucket increments must be atomic (no torn reads of bucket_count and sum).
  • Cardinality cap: when a new label combination would exceed max_labels_per_metric, drop and emit an internal counter (metrics_drops_total).
  • Counter and gauge: O(1) per operation; histogram: O(log buckets) per observation (binary search the bucket).

Clarifying Questions

  1. Do we need timestamped exposition (Prometheus exposition format)? (Yes, but the timestamp can be implicit — Prometheus assigns the scrape timestamp.)
  2. Are histogram buckets shared across all label combinations or per combination? (Per combination — different label values may have different distributions.)
  3. Are we exposing percentiles client-side or letting the server compute them? (Server-side — the histogram type is exactly this.)
  4. Should counters reset on process restart? (Yes — Prometheus handles this with the rate() function and the reset detection in counter math.)
  5. What’s the maximum max_labels_per_metric we should default to? (1000 is generous; 100 is conservative. Make it configurable.)

Examples

reg = MetricsRegistry()
reg.counter("http_requests_total", {"method": "GET", "status": "200"}).inc()
reg.gauge("queue_depth", {"queue": "ingest"}).set(42)
reg.histogram("request_latency_s", {"endpoint": "/api"}, buckets=[0.01, 0.1, 1, 10]).observe(0.04)
print(reg.snapshot_openmetrics())
# # HELP http_requests_total
# # TYPE http_requests_total counter
# http_requests_total{method="GET",status="200"} 1
# # TYPE queue_depth gauge
# queue_depth{queue="ingest"} 42
# ...

Initial Brute Force

class NaiveMetrics:
    def __init__(self):
        self.metrics = {}
    def counter(self, name, labels=None):
        key = (name, frozenset((labels or {}).items()))
        self.metrics.setdefault(key, 0)
        self.metrics[key] += 1

This conflates increment and registration, has no thread safety, no histogram (impossible to compute p99 from a counter), no cardinality cap, and uses one global dict so every metric type collides on key shape.

Brute Force Complexity

O(1) per increment under no contention. With concurrent writers, races on dict.__setitem__ and += corrupt counts. Memory unbounded.

Optimization Path

Separate types into separate sub-registries (Counter, Gauge, Histogram) to avoid type-pun bugs. Add a per-metric Lock (or atomic primitive). Use bisect_left on a sorted bucket array to find the histogram bucket. Cap cardinality with a per-metric-name combination counter. Define an exposition format.

Final Expected Approach

A MetricsRegistry holds a dict name → MetricFamily. A MetricFamily stores the metric type, the bucket schedule (for histograms), and a dict labels_tuple → MetricInstance. Each MetricInstance is a small thread-safe object: Counter has an int and a Lock; Gauge has a float and a Lock; Histogram has a list[int] of bucket counts, a float sum, an int count, and a Lock. On increment, hash the label tuple, look up or create the instance (with cardinality check), acquire its Lock, mutate. Snapshot iterates families and instances under their locks and emits an exposition string.

Data Structures Used

  • dict[str, MetricFamily] for the registry.
  • dict[tuple[tuple[str, str], ...], MetricInstance] per family for label combinations.
  • list[float] (sorted) for histogram bucket boundaries.
  • list[int] for histogram bucket counts.
  • threading.Lock per instance.

Correctness Argument

Counter: monotonic by construction (only inc(by) with by ≥ 0 allowed). Gauge: set is the last writer’s value; inc/dec are atomic under the lock. Histogram: each observation falls into exactly one bucket (the smallest bucket whose upper bound is ≥ observation; the last bucket is +Inf). Sum and count are incremented under the same lock as the bucket count, so a snapshot sees consistent values.

Complexity

OpTime
counter.incO(1) lock-and-increment
gauge.set/inc/decO(1)
histogram.observeO(log B) for bucket lookup
snapshotO(N · B) where N is total instances

Space: O(name + label-cardinality · (1 for counter/gauge or B+2 for histogram)).

Implementation Requirements

import threading
from bisect import bisect_left
from typing import Optional

DEFAULT_BUCKETS = (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10)


def _label_key(labels: Optional[dict]) -> tuple:
    if not labels:
        return ()
    return tuple(sorted(labels.items()))


class Counter:
    __slots__ = ("_v", "_lock")
    def __init__(self):
        self._v = 0
        self._lock = threading.Lock()
    def inc(self, by: float = 1):
        if by < 0:
            raise ValueError("counter cannot decrease")
        with self._lock:
            self._v += by
    def value(self) -> float:
        with self._lock:
            return self._v


class Gauge:
    __slots__ = ("_v", "_lock")
    def __init__(self):
        self._v = 0.0
        self._lock = threading.Lock()
    def set(self, v: float):
        with self._lock:
            self._v = v
    def inc(self, by: float = 1):
        with self._lock:
            self._v += by
    def dec(self, by: float = 1):
        with self._lock:
            self._v -= by
    def value(self) -> float:
        with self._lock:
            return self._v


class Histogram:
    __slots__ = ("_buckets", "_counts", "_sum", "_count", "_lock")
    def __init__(self, buckets):
        self._buckets = tuple(sorted(buckets))
        self._counts = [0] * (len(self._buckets) + 1)  # +1 for +Inf
        self._sum = 0.0
        self._count = 0
        self._lock = threading.Lock()
    def observe(self, v: float):
        idx = bisect_left(self._buckets, v)
        if idx == len(self._buckets) and v > self._buckets[-1]:
            idx = len(self._buckets)  # +Inf bucket
        with self._lock:
            self._counts[idx] += 1
            self._sum += v
            self._count += 1
    def snapshot(self) -> dict:
        with self._lock:
            cumulative = []
            running = 0
            for i, b in enumerate(self._buckets):
                running += self._counts[i]
                cumulative.append((b, running))
            running += self._counts[-1]
            cumulative.append((float("inf"), running))
            return {"buckets": cumulative, "sum": self._sum, "count": self._count}


class MetricFamily:
    def __init__(self, name: str, kind: str, buckets=None, max_labels: int = 1000):
        self.name = name
        self.kind = kind  # "counter" | "gauge" | "histogram"
        self.buckets = buckets
        self.instances: dict = {}
        self.max_labels = max_labels
        self.dropped = 0
        self.lock = threading.Lock()
    def get(self, labels_key: tuple):
        with self.lock:
            inst = self.instances.get(labels_key)
            if inst is not None:
                return inst
            if len(self.instances) >= self.max_labels:
                self.dropped += 1
                return None
            if self.kind == "counter":
                inst = Counter()
            elif self.kind == "gauge":
                inst = Gauge()
            else:
                inst = Histogram(self.buckets or DEFAULT_BUCKETS)
            self.instances[labels_key] = inst
            return inst


class _NullCounter:
    def inc(self, by=1): pass


class MetricsRegistry:
    def __init__(self, max_labels_per_metric: int = 1000):
        self._families: dict[str, MetricFamily] = {}
        self._lock = threading.Lock()
        self._max_labels = max_labels_per_metric

    def _family(self, name: str, kind: str, buckets=None) -> MetricFamily:
        with self._lock:
            f = self._families.get(name)
            if f is None:
                f = MetricFamily(name, kind, buckets, self._max_labels)
                self._families[name] = f
            elif f.kind != kind:
                raise ValueError(f"metric {name} already registered as {f.kind}")
            return f

    def counter(self, name: str, labels: Optional[dict] = None) -> Counter:
        f = self._family(name, "counter")
        inst = f.get(_label_key(labels))
        return inst if inst is not None else _NullCounter()

    def gauge(self, name: str, labels: Optional[dict] = None) -> Gauge:
        f = self._family(name, "gauge")
        return f.get(_label_key(labels)) or Gauge()  # caller-detached fallback

    def histogram(self, name: str, labels: Optional[dict] = None, buckets=None) -> Histogram:
        f = self._family(name, "histogram", buckets)
        return f.get(_label_key(labels)) or Histogram(buckets or DEFAULT_BUCKETS)

    def snapshot_openmetrics(self) -> str:
        lines = []
        with self._lock:
            families = list(self._families.values())
        for f in families:
            lines.append(f"# TYPE {f.name} {f.kind}")
            with f.lock:
                instances = list(f.instances.items())
            for labels_key, inst in instances:
                lbl = "{" + ",".join(f'{k}="{v}"' for k, v in labels_key) + "}" if labels_key else ""
                if isinstance(inst, (Counter, Gauge)):
                    lines.append(f"{f.name}{lbl} {inst.value()}")
                elif isinstance(inst, Histogram):
                    snap = inst.snapshot()
                    for b, cum in snap["buckets"]:
                        b_str = "+Inf" if b == float("inf") else f"{b}"
                        bucket_lbl = labels_key + (("le", b_str),)
                        bl = "{" + ",".join(f'{k}="{v}"' for k, v in bucket_lbl) + "}"
                        lines.append(f"{f.name}_bucket{bl} {cum}")
                    lines.append(f"{f.name}_sum{lbl} {snap['sum']}")
                    lines.append(f"{f.name}_count{lbl} {snap['count']}")
        return "\n".join(lines)

Tests

def test_counter_increments():
    r = MetricsRegistry()
    c = r.counter("hits")
    c.inc(); c.inc(2)
    assert c.value() == 3

def test_counter_rejects_negative():
    r = MetricsRegistry()
    try: r.counter("x").inc(-1)
    except ValueError: pass
    else: assert False

def test_gauge_set_inc_dec():
    g = MetricsRegistry().gauge("depth")
    g.set(10); g.inc(5); g.dec(3)
    assert g.value() == 12

def test_histogram_buckets():
    h = MetricsRegistry().histogram("lat", buckets=[0.1, 1, 10])
    for v in [0.05, 0.5, 1.5, 100]: h.observe(v)
    snap = h.snapshot()
    assert snap["count"] == 4
    assert snap["sum"] == 102.05
    assert snap["buckets"][0] == (0.1, 1)   # ≤ 0.1
    assert snap["buckets"][1] == (1, 2)     # ≤ 1
    assert snap["buckets"][2] == (10, 3)    # ≤ 10
    assert snap["buckets"][3][1] == 4       # +Inf

def test_concurrent_counter():
    import threading
    r = MetricsRegistry()
    c = r.counter("racy")
    def inc():
        for _ in range(10_000): c.inc()
    threads = [threading.Thread(target=inc) for _ in range(8)]
    for t in threads: t.start()
    for t in threads: t.join()
    assert c.value() == 80_000

def test_cardinality_cap():
    r = MetricsRegistry(max_labels_per_metric=2)
    r.counter("uid", {"id": "a"}).inc()
    r.counter("uid", {"id": "b"}).inc()
    r.counter("uid", {"id": "c"}).inc()  # dropped
    assert len(r._families["uid"].instances) == 2
    assert r._families["uid"].dropped == 1

def test_type_conflict():
    r = MetricsRegistry()
    r.counter("x")
    try: r.gauge("x")
    except ValueError: pass
    else: assert False

Follow-up Questions

  1. How would you make it thread-safe? Per-instance locks (counters and gauges have one each; histograms have one per (name, labels) instance). The registry-level lock only guards family creation. Result: counter increments from different label tuples never block each other, which is the expected hot path.
  2. What metrics would you emit (about the metrics system itself)? metrics_dropped_total{name=...} (cardinality drops); metrics_active_series (gauge of total instances); metrics_scrape_duration_seconds (histogram of snapshot_openmetrics latency). Self-instrumentation is a sign of mature instrumentation.
  3. What is the eviction policy? None for active series — they live forever. For TTL’d metrics (rare), an external sweeper deletes instances unobserved for N minutes. Design so the sweeper is optional, not on the hot path.
  4. What configuration knobs would you expose? max_labels_per_metric (cardinality cap), default histogram buckets, the registry’s exposition format. Don’t expose the per-instance lock granularity — it’s an implementation detail.
  5. How would you handle backpressure? The hot path is lock-bounded; a write that contends waits microseconds. If a histogram’s lock becomes hot, switch to per-bucket atomics or sharded counters (8 shards keyed by thread_id % 8, summed at scrape).
  6. What’s the difference between summary and histogram? Summary computes quantiles client-side using a streaming algorithm (Greenwald-Khanna). Pros: exact-ish percentiles per series. Cons: cannot aggregate across series. Histogram pushes the work to the query layer; aggregation across series is just bucket-wise addition. Histograms are the right default in modern observability.

Product Extension

This is exactly the data model exposed by the Prometheus client libraries. Real implementations add: gauge track_inprogress (decorator/context manager that increments on enter, decrements on exit), summary type, exemplars (trace IDs attached to histogram observations), native histograms (sparse representation that auto-tunes bucket boundaries).

Language/Runtime Follow-ups

  • Python: as above. The prometheus_client library is the production reference. Be aware of GIL implications: counter increments aren’t atomic at the bytecode level for floats, but the explicit lock makes them so.
  • Java: use LongAdder for counters (avoids contention via per-thread cells); DoubleAdder for histogram sums. Micrometer is the production reference.
  • Go: counters and gauges as atomic.Int64/atomic.Float64. Histograms with sync.Mutex per metric. The prometheus/client_golang library does exactly this.
  • C++: std::atomic<uint64_t> for counter; std::mutex for histogram. Cardinality maps require careful design — tbb::concurrent_hash_map is one option.
  • JS/TS: single-threaded — no locks needed in Node. The prom-client package is the production reference.

Common Bugs

  1. Histogram bucket lookup off-by-one — bisect_left is correct only if buckets are pre-sorted.
  2. Sharing histogram buckets across label combinations — different distributions have different optimal buckets. Each instance gets its own bucket array.
  3. Forgetting the +Inf bucket — observations larger than the largest bucket are silently dropped.
  4. Using int for histogram sum — overflows for high-throughput histograms after a few hours. Use float.
  5. Type confusion — registering the same name as both counter and gauge corrupts exposition. The registry must reject this.
  6. Unbounded cardinality — a label per request_id creates a new series per request. The cardinality cap is the safety net.

Debugging Strategy

When totals look wrong: confirm the counter.inc(by) has by ≥ 0 and the operation is under the lock. When percentiles are off: dump the bucket cumulative counts; the math should be cumulative[i] = sum(counts[0..i]). When concurrency tests are flaky: add time.sleep(0) between increments to expose races; if the test passes, your locking is correct.

Mastery Criteria

  • Wrote the three metric types with correct semantics in <25 minutes.
  • Articulated histogram vs summary in <60 seconds.
  • Stated the cardinality risk without prompting and showed the cap in code.
  • Wrote a concurrent counter test that verifies no lost updates.
  • Produced a Prometheus-compatible exposition string.
  • Listed three metrics-about-metrics to emit (drops, active series, scrape duration).
  • Explained why per-instance locks scale better than a global registry lock.