Kdb RFC 002 opentelemetry instrumentation

RFC002 — OpenTelemetry instrumentation for kdb and kdbnext

Field Value
Status *raft*
Author(s) Rodrigo (with Claude as scribe)
Date 20260410
Target module platform/kdb/ (Go) and platform/kdb/next/ (Rust)
Related RFC001 §10 (crosscutting observability), backlog #046
Supersedes Adhoc Prometheusonly metrics in the current Go kdb

1. Summary

This RFC defines how the Koder database ("kdb") exposes *races, metrics and logs*to operators, following the *penTelemetry*specification (OTel) endtoend. The goal is that an operator running a Koder platform on top of kdb can plug any OTel-compatible collector (Grafana TempoMimirLoki, Honeycomb, Jaeger, Datadog, New Relic, Chronosphere, …) and get:

  • request-level distributed traces across the full query path,
  • per-tenant RED (rateerrorsduration) metrics,
  • structured audit and error logs correlated by trace/span ids,

without touching code, only configuration.

Today the Go kdb exposes Prometheus-compatible metrics and very limited internal tracing. kdb-next (Rust) has no instrumentation yet. Neither emits OTLP. That blocks every downstream Koder product (id v2, flow, talk, bull, kortex, …) from using a unified observability backend — a requirement called out in RFC-001 §2.5 and promised in the product pitch.

2. Goals

  1. *ingle spec, two implementations.*The Go kdb and Rust kdb-next

    both emit data that conforms to the OTel semantic conventions for databases. A dashboard written against one works against the other.

  2. *TLP-first.*gRPC OTLP is the primary export protocol. HTTP/JSON

    OTLP is supported for constrained environments. Prometheus scrape is kept as a secondary compatibility surface — not removed.

  3. *ero-config baseline.*Running kdb with default config, on a

    host where no collector is reachable, must still work: instrumentation is cheap enough to keep on by default, and exporters degrade to no-op when the collector is unreachable.

  4. *enancy is a label, not a freeforall.*Every metric, span and

    log record carries tenant_id as an attribute, but cardinality is bounded by the budget defined in RFC-001 §6.8 (metrics cardinality budget). High-cardinality attributes (query text, row ids) are never used as metric labels.

  5. *ontext propagation across the whole request.*A query that

    enters via gRPC must carry the W3C traceparent through: parser → planner → executor → storage → replication RPC → follower. Async work (background compaction, vacuum, WAL ship) gets its own root span tagged with the originating tenant when applicable.

  6. *rivacy by default.*Query text, parameter values, row contents

    and credential material are *ever*attached to spans or logs by default. An explicit opt-in (otel.capture_query_text = true) is required, and turning it on emits a warning in the audit log.

3. Non-goals

  • *undling a collector.*kdb emits OTLP; it does not run

    otelcollector inprocess. Operators are expected to run a collector (sidecar or daemonset) in their environment.

  • *eplacing the access log / audit log.*Structured logs continue

    to live in their current place; they are also emitted via OTel logs for correlation, but the file-based access log stays as the source of truth for compliance audits.

  • *DK vendoring gymnastics.*We use the upstream OTel SDKs as-is

    (go.opentelemetry.io/otel for Go; opentelemetry crate family for Rust). We do not fork them.

4. Semantic conventions

We follow the OTel *atabase semantic conventions*(v1.27+):

Attribute Meaning
db.system "kdb" (new value, to be registered upstream; until then, custom)
db.name logical database name within the tenant
db.operation SELECT / INSERT / UPDATE / DELETE / DDL / KV_GET / …
db.statement the query text — *nly if otel.capture_query_text = true*
db.kdb.tenant_id tenant (org) id — Koder-specific
db.kdb.shard_id shard serving the request
db.kdb.model relational / document / object / keyvalue / graph
db.kdb.isolation_level read_committed / repeatable_read / serializable / snapshot
db.kdb.plan_hash hash of the chosen query plan (for plan stability analysis)
db.kdb.rows_examined rows read from storage
db.kdb.rows_returned rows returned to client
db.kdb.cache_hit boolean — plan cache or result cache hit

Attributes with the db.kdb. prefix are Koder-specific extensions; the rest are standard OTel. If/when OTel registers a kdb system value, we'll drop the custom prefix.

5. Spans

5.1 Required spans (every query)

kdb.query                        — root span for each incoming request
├── kdb.parse                    — tokenize + parse (KQL/SQL)
├── kdb.plan                     — query planner
├── kdb.execute                  — top-level executor
│   ├── kdb.storage.read         — per storage access
│   ├── kdb.storage.write        — per storage write
│   └── kdb.replication.append   — Raft log append (writes only)
└── kdb.serialize                — result marshaling

5.2 Required spans (background work)

kdb.bg.compaction                — storage compaction batch
kdb.bg.vacuum                    — per-tenant vacuum pass
kdb.bg.wal_ship                  — WAL streaming (see RFC-043)
kdb.bg.backup                    — backup job

Background spans are tagged with db.kdb.tenant_id when the work is scoped to a single tenant; otherwise they carry db.kdb.tenant_id = "<shared>".

5.3 Context propagation

  • *nbound (client → kdb):*W3C traceparent / tracestate headers

    are read from gRPC metadata and HTTP headers.

  • *ntra-kdb (node → node):*propagated via gRPC metadata on every

    internal RPC (Raft, replication, Record API — see RFC-047).

  • *sync handoff:*when the executor enqueues async work, it stores

    the parent span context in the job descriptor; the worker starts a new span with that context as parent.

6. Metrics

All metrics are emitted via OTel metrics SDK and, in parallel, exposed on the existing Prometheus scrape endpoint (same numbers, same labels, so existing dashboards keep working during the migration).

6.1 RED metrics (per operation)

Name Type Unit Labels
kdb.query.count Counter 1 operation, model, status, tenant_bucket
kdb.query.duration Histogram s operation, model, tenant_bucket
kdb.query.rows_returned Histogram 1 operation, model
kdb.query.errors Counter 1 operation, error_kind

tenant_bucket is a bounded-cardinality label: tenants are hashed into N buckets (default N=64). This keeps per-tenant visibility for p99/p999 analysis without blowing up label cardinality. Raw tenant_id is only carried on spans, where cardinality doesn't compound.

6.2 Resource metrics

Name Type Unit
kdb.storage.bytes_used Gauge bytes
kdb.storage.rows Gauge 1
kdb.connections.active Gauge 1
kdb.connections.idle Gauge 1
kdb.txn.active Gauge 1
kdb.txn.aborted Counter 1
kdb.replication.lag Gauge s
kdb.cache.hit_ratio Gauge 1

6.3 Cardinality budget

The cardinality budget from RFC-001 §6.8 applies: the total number of unique (metric × label set) tuples per node must not exceed 500k. Adding a new label to an existing metric requires updating this budget in this RFC and benchmarking the worst case.

7. Logs

  • Every log record is emitted via the OTel logs SDK and includes the

    active trace_id and span_id, so logs can be jumped to from a trace view.

  • Severity levels follow OTel (TRACE, DEBUG, INFO, WARN,

    ERROR, FATAL).

  • The existing file-based access log (src/log/access.kmd) remains

    the source of truth for audit; OTel logs are for operator observability, not compliance.

  • Sensitive fields (credentials, query parameter values, row data)

    are redacted at the SDK layer via an attribute processor, not by asking callers to be careful.

8. Configuration

New section in kdb.toml:

[otel]
enabled         = true
service_name    = "kdb"
service_version = "0.9.0"       # filled by build
resource_attrs  = { deployment_environment = "prod" }

# Exporter: "otlp-grpc" (default) | "otlp-http" | "none"
exporter        = "otlp-grpc"
endpoint        = "http://otel-collector:4317"
insecure        = false
headers         = { }           # e.g. auth tokens

# Sampling
traces_sampler  = "parentbased_traceidratio"
traces_ratio    = 0.01          # 1% of root spans
metrics_interval = "10s"
logs_enabled    = true

# Privacy knobs — off by default
capture_query_text = false
capture_params     = false

Environment variables follow the standard OTel names (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, …) and override the TOML.

9. Sampling strategy

  • *efault:*parent-based trace ID ratio at 1%.
  • *orced sampling:*any request with traceparent already sampled

    by an upstream caller is always sampled (we respect the parent).

  • *rror boost:*if a query returns an error, the span is upgraded

    to "sampled" regardless of the ratio (tail-based, applied at export time via a custom processor).

  • *low query boost:*same as error boost, but for queries exceeding

    the peroperation p99 target in RFC001 §5.

10. Implementation plan

Phase 1 — Go kdb baseline (2 weeks)

  1. Add go.opentelemetry.io/otel + otlpgrpc exporter deps.
  2. Wire TracerProvider, MeterProvider, LoggerProvider from

    kdb.toml.

  3. Instrument the gRPC entry point to create the kdb.query root

    span and extract inbound context.

  4. Instrument parser, planner, executor with child spans.
  5. Port existing Prometheus metrics to OTel metrics via the

    Prometheus bridge (go.opentelemetry.io/otel/exporters/prometheus) so the scrape endpoint keeps working and OTLP also ships.

  6. Add the RED metrics from §6.1 where missing.
  7. Redaction processor for sensitive attributes.

Phase 2 — kdb-next (Rust) parity (1 week, can run in parallel)

*tatus on 20260410:*crates/kdb-obs/ already has a working traces pipeline: tracing_setup.rs wires an OTLP gRPC exporter with parentbased 1% sampling and a 100 ms slowrequest threshold, and bridges the existing tracing spans via tracing-opentelemetry. Metrics are still Prometheus-only (metrics.rs uses the prometheus crate directly, with tiered tenant bucketing: top100 → `tiera, next-1000 → tier-b, rest → other`). Logs are not wired through OTel yet.

Remaining work:

  1. *etrics via OTLP.*Add opentelemetry-otlp metrics exporter and

    mirror the existing Prometheus metrics into it. Keep the Prometheus scrape endpoint on as compatibility.

  2. *lign metric names and label sets with §6*(today the crate uses

    its own ad-hoc names like rpc_duration; they need to become kdb.query.duration etc. — keep the Prometheus names as a translation layer to avoid breaking current dashboards).

  3. *ogs via OTel.*Wire opentelemetry-appender-tracing (or

    equivalent) so log records carry trace_id / span_id and are exported via OTLP. Keep the file sink for the access log.

  4. *ropagate context beyond the gateway.*Today only the gateway

    RPC entry point creates spans. Extend instrumentation into kdb-record, kdb-kv-trait and replication RPCs so the full query tree from §5.1 shows up in a trace, not just the root rpc span.

  5. *dopt the §4 semantic conventions*as attributes on existing

    spans (db.kdb.tenant_id, db.kdb.model, db.kdb.shard_id, …). The current crate only sets tenant.id and rpc.service / rpc.method.

Phase 3 — Replication and background (1 week)

  1. Propagate context through Raft RPCs.
  2. Background span roots for compaction, vacuum, WAL ship, backup.
  3. Add kdb.replication.lag gauge.

Phase 4 — Dashboards and smoke tests (1 week)

  1. Ship a reference Grafana dashboard JSON under

    observe/dashboards/kdb.json.

  2. Integration test: run kdb against an in-process OTLP collector,

    fire N queries, assert spans, metrics, logs are received with the expected attributes.

  3. Cardinality test: generate 1M distinct tenants and assert metric

    cardinality stays under budget.

11. Migration and compatibility

  • The existing Prometheus scrape endpoint stays on for at least two

    minor versions after Phase 1 ships. Only after downstream Koder products have moved their dashboards to OTLP will we consider removing it — and that removal is a separate RFC.

  • The existing access log file format is untouched.
  • No breaking changes to kdb.toml: the new [otel] section is

    additive; defaults are backwards compatible (enabled = true, exporter = "otlp-grpc", but endpoint is empty by default and the exporter degrades to no-op when empty).

12. Open questions

  1. *o we register db.system = "kdb" upstream with OTel?*Doing so

    makes us a recognized database in every OTel-aware APM tool. It's a ~3-month process. Proposed: yes, start in parallel with Phase 1.

  2. *ow do we bound the tenant_bucket label at 100M tenants?*With

    64 buckets, each bucket averages ~1.5M tenants, which is fine for p99 but not for "which tenant is slow". Proposed: spans carry raw tenant_id, metrics carry bucketed; operators doing single-tenant investigation use the trace view.

  3. *ampling in control plane vs data plane.*Control plane calls

    (auth, schema lookups) are cheap but very frequent. Should they have a lower sample ratio than data plane queries? Proposed: yes, a separate traces_ratio_control_plane = 0.001 knob in Phase 3.

  4. *og volume.*Emitting every log record via OTel could 10x log

    volume on operators who don't want it. Proposed: logs_enabled = true by default but level-gated at INFO; debug logs are only emitted to the file sink.

13. References

  • OpenTelemetry Semantic Conventions for Database Client Calls —

    https:/pentelemetry.iodocsspecssemconvdatabase/

  • OpenTelemetry Protocol (OTLP) —

    https:/pentelemetry.iodocsspecsotlp

  • RFC-001 §2.5, §6.8, §10 (this repo)
  • backlogpending046opentelemetryinstrumentation.kmd

14. Decision log

Date Decision Notes
20260410 Drafted Claude scribe, awaiting Rodrigo review
20260410 Corrected Phase 2 rewritten after auditing crates/kdb-obs/: traces pipeline already exists; remaining work is metrics+logs OTLP and spec alignment

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/kdb-RFC-002-opentelemetry-instrumentation.md