Kdb RFC 002 opentelemetry instrumentation
RFC002 — OpenTelemetry instrumentation for kdb and kdbnext
| Field | Value |
|---|---|
| Status | *raft* |
| Author(s) | Rodrigo (with Claude as scribe) |
| Date | 2026 |
| Target module | platform/kdb/ (Go) and platform/kdb/next/ (Rust) |
| Related | RFC |
| Supersedes | Ad |
1. Summary
This RFC defines how the Koder database ("kdb") exposes *races, metrics and logs*to operators, following the *penTelemetry*specification (OTel) endtoend. The goal is that an operator running a Koder platform on top of kdb can plug any OTel-compatible collector (Grafana TempoMimirLoki, Honeycomb, Jaeger, Datadog, New Relic, Chronosphere, …) and get:
- request-level distributed traces across the full query path,
- per-tenant RED (rateerrorsduration) metrics,
- structured audit and error logs correlated by trace/span ids,
without touching code, only configuration.
Today the Go kdb exposes Prometheus-compatible metrics and very limited internal tracing. kdb-next (Rust) has no instrumentation yet. Neither emits OTLP. That blocks every downstream Koder product (id v2, flow, talk, bull, kortex, …) from using a unified observability backend — a requirement called out in RFC-001 §2.5 and promised in the product pitch.
2. Goals
- *ingle spec, two implementations.*The Go kdb and Rust kdb-next
both emit data that conforms to the OTel semantic conventions for databases. A dashboard written against one works against the other.
- *TLP-first.*gRPC OTLP is the primary export protocol. HTTP/JSON
OTLP is supported for constrained environments. Prometheus scrape is kept as a secondary compatibility surface — not removed.
- *ero-config baseline.*Running
kdbwith default config, on ahost where no collector is reachable, must still work: instrumentation is cheap enough to keep on by default, and exporters degrade to no-op when the collector is unreachable.
- *enancy is a label, not a free
forall.*Every metric, span andlog record carries
tenant_idas an attribute, but cardinality is bounded by the budget defined in RFC-001 §6.8 (metrics cardinality budget). High-cardinality attributes (query text, row ids) are never used as metric labels. - *ontext propagation across the whole request.*A query that
enters via gRPC must carry the W3C
traceparentthrough: parser → planner → executor → storage → replication RPC → follower. Async work (background compaction, vacuum, WAL ship) gets its own root span tagged with the originating tenant when applicable. - *rivacy by default.*Query text, parameter values, row contents
and credential material are *ever*attached to spans or logs by default. An explicit opt-in (
otel.capture_query_text = true) is required, and turning it on emits a warning in the audit log.
3. Non-goals
- *undling a collector.*kdb emits OTLP; it does not run
otel
collector inprocess. Operators are expected to run a collector (sidecar or daemonset) in their environment. - *eplacing the access log / audit log.*Structured logs continue
to live in their current place; they are also emitted via OTel logs for correlation, but the file-based access log stays as the source of truth for compliance audits.
- *DK vendoring gymnastics.*We use the upstream OTel SDKs as-is
(
go.opentelemetry.io/otelfor Go;opentelemetrycrate family for Rust). We do not fork them.
4. Semantic conventions
We follow the OTel *atabase semantic conventions*(v1.27+):
| Attribute | Meaning |
|---|---|
db.system |
"kdb" (new value, to be registered upstream; until then, custom) |
db.name |
logical database name within the tenant |
db.operation |
SELECT / INSERT / UPDATE / DELETE / DDL / KV_GET / … |
db.statement |
the query text — *nly if otel.capture_query_text = true* |
db.kdb.tenant_id |
tenant (org) id — Koder-specific |
db.kdb.shard_id |
shard serving the request |
db.kdb.model |
relational / document / object / keyvalue / graph |
db.kdb.isolation_level |
read_committed / repeatable_read / serializable / snapshot |
db.kdb.plan_hash |
hash of the chosen query plan (for plan stability analysis) |
db.kdb.rows_examined |
rows read from storage |
db.kdb.rows_returned |
rows returned to client |
db.kdb.cache_hit |
boolean — plan cache or result cache hit |
Attributes with the db.kdb. prefix are Koder-specific extensions; the rest are standard OTel. If/when OTel registers a kdb system value, we'll drop the custom prefix.
5. Spans
5.1 Required spans (every query)
kdb.query — root span for each incoming request
├── kdb.parse — tokenize + parse (KQL/SQL)
├── kdb.plan — query planner
├── kdb.execute — top-level executor
│ ├── kdb.storage.read — per storage access
│ ├── kdb.storage.write — per storage write
│ └── kdb.replication.append — Raft log append (writes only)
└── kdb.serialize — result marshaling5.2 Required spans (background work)
kdb.bg.compaction — storage compaction batch
kdb.bg.vacuum — per-tenant vacuum pass
kdb.bg.wal_ship — WAL streaming (see RFC-043)
kdb.bg.backup — backup jobBackground spans are tagged with db.kdb.tenant_id when the work is scoped to a single tenant; otherwise they carry db.kdb.tenant_id =
"<shared>".
5.3 Context propagation
- *nbound (client → kdb):*W3C
traceparent/tracestateheadersare read from gRPC metadata and HTTP headers.
- *ntra-kdb (node → node):*propagated via gRPC metadata on every
internal RPC (Raft, replication, Record API — see RFC-047).
- *sync handoff:*when the executor enqueues async work, it stores
the parent span context in the job descriptor; the worker starts a new span with that context as parent.
6. Metrics
All metrics are emitted via OTel metrics SDK and, in parallel, exposed on the existing Prometheus scrape endpoint (same numbers, same labels, so existing dashboards keep working during the migration).
6.1 RED metrics (per operation)
| Name | Type | Unit | Labels |
|---|---|---|---|
kdb.query.count |
Counter | 1 | operation, model, status, tenant_bucket |
kdb.query.duration |
Histogram | s | operation, model, tenant_bucket |
kdb.query.rows_returned |
Histogram | 1 | operation, model |
kdb.query.errors |
Counter | 1 | operation, error_kind |
tenant_bucket is a bounded-cardinality label: tenants are hashed into N buckets (default N=64). This keeps per-tenant visibility for p99/p999 analysis without blowing up label cardinality. Raw tenant_id is only carried on spans, where cardinality doesn't compound.
6.2 Resource metrics
| Name | Type | Unit |
|---|---|---|
kdb.storage.bytes_used |
Gauge | bytes |
kdb.storage.rows |
Gauge | 1 |
kdb.connections.active |
Gauge | 1 |
kdb.connections.idle |
Gauge | 1 |
kdb.txn.active |
Gauge | 1 |
kdb.txn.aborted |
Counter | 1 |
kdb.replication.lag |
Gauge | s |
kdb.cache.hit_ratio |
Gauge | 1 |
6.3 Cardinality budget
The cardinality budget from RFC-001 §6.8 applies: the total number of unique (metric × label set) tuples per node must not exceed 500k. Adding a new label to an existing metric requires updating this budget in this RFC and benchmarking the worst case.
7. Logs
- Every log record is emitted via the OTel logs SDK and includes the
active
trace_idandspan_id, so logs can be jumped to from a trace view. - Severity levels follow OTel (
TRACE,DEBUG,INFO,WARN,ERROR,FATAL). - The existing file-based access log (
src/log/access.kmd) remainsthe source of truth for audit; OTel logs are for operator observability, not compliance.
- Sensitive fields (credentials, query parameter values, row data)
are redacted at the SDK layer via an attribute processor, not by asking callers to be careful.
8. Configuration
New section in kdb.toml:
[otel]
enabled = true
service_name = "kdb"
service_version = "0.9.0" # filled by build
resource_attrs = { deployment_environment = "prod" }
# Exporter: "otlp-grpc" (default) | "otlp-http" | "none"
exporter = "otlp-grpc"
endpoint = "http://otel-collector:4317"
insecure = false
headers = { } # e.g. auth tokens
# Sampling
traces_sampler = "parentbased_traceidratio"
traces_ratio = 0.01 # 1% of root spans
metrics_interval = "10s"
logs_enabled = true
# Privacy knobs — off by default
capture_query_text = false
capture_params = falseEnvironment variables follow the standard OTel names (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, …) and override the TOML.
9. Sampling strategy
- *efault:*parent-based trace ID ratio at 1%.
- *orced sampling:*any request with
traceparentalready sampledby an upstream caller is always sampled (we respect the parent).
- *rror boost:*if a query returns an error, the span is upgraded
to "sampled" regardless of the ratio (tail-based, applied at export time via a custom processor).
- *low query boost:*same as error boost, but for queries exceeding
the per
operation p99 target in RFC001 §5.
10. Implementation plan
Phase 1 — Go kdb baseline (2 weeks)
- Add
go.opentelemetry.io/otel+otlpgrpcexporter deps. - Wire
TracerProvider,MeterProvider,LoggerProviderfromkdb.toml. - Instrument the gRPC entry point to create the
kdb.queryrootspan and extract inbound context.
- Instrument parser, planner, executor with child spans.
- Port existing Prometheus metrics to OTel metrics via the
Prometheus bridge (
go.opentelemetry.io/otel/exporters/prometheus) so the scrape endpoint keeps working and OTLP also ships. - Add the RED metrics from §6.1 where missing.
- Redaction processor for sensitive attributes.
Phase 2 — kdb-next (Rust) parity (1 week, can run in parallel)
*tatus on 20260410:*crates/kdb-obs/ already has a working traces pipeline: tracing_setup.rs wires an OTLP gRPC exporter with parentbased 1% sampling and a 100 ms slowrequest threshold, and bridges the existing tracing spans via tracing-opentelemetry. Metrics are still Prometheus-only (metrics.rs uses the prometheus crate directly, with tiered tenant bucketing: top100 → `tiera,
next-1000 → tier-b, rest → other`). Logs are not wired through OTel yet.
Remaining work:
- *etrics via OTLP.*Add
opentelemetry-otlpmetrics exporter andmirror the existing Prometheus metrics into it. Keep the Prometheus scrape endpoint on as compatibility.
- *lign metric names and label sets with §6*(today the crate uses
its own ad-hoc names like
rpc_duration; they need to becomekdb.query.durationetc. — keep the Prometheus names as a translation layer to avoid breaking current dashboards). - *ogs via OTel.*Wire
opentelemetry-appender-tracing(orequivalent) so log records carry
trace_id/span_idand are exported via OTLP. Keep the file sink for the access log. - *ropagate context beyond the gateway.*Today only the gateway
RPC entry point creates spans. Extend instrumentation into
kdb-record,kdb-kv-traitand replication RPCs so the full query tree from §5.1 shows up in a trace, not just the rootrpcspan. - *dopt the §4 semantic conventions*as attributes on existing
spans (
db.kdb.tenant_id,db.kdb.model,db.kdb.shard_id, …). The current crate only setstenant.idandrpc.service/rpc.method.
Phase 3 — Replication and background (1 week)
- Propagate context through Raft RPCs.
- Background span roots for compaction, vacuum, WAL ship, backup.
- Add
kdb.replication.laggauge.
Phase 4 — Dashboards and smoke tests (1 week)
- Ship a reference Grafana dashboard JSON under
observe/dashboards/kdb.json. - Integration test: run kdb against an in-process OTLP collector,
fire N queries, assert spans, metrics, logs are received with the expected attributes.
- Cardinality test: generate 1M distinct tenants and assert metric
cardinality stays under budget.
11. Migration and compatibility
- The existing Prometheus scrape endpoint stays on for at least two
minor versions after Phase 1 ships. Only after downstream Koder products have moved their dashboards to OTLP will we consider removing it — and that removal is a separate RFC.
- The existing access log file format is untouched.
- No breaking changes to kdb.toml: the new
[otel]section isadditive; defaults are backwards compatible (
enabled = true,exporter = "otlp-grpc", butendpointis empty by default and the exporter degrades to no-op when empty).
12. Open questions
- *o we register
db.system = "kdb"upstream with OTel?*Doing somakes us a recognized database in every OTel-aware APM tool. It's a ~3-month process. Proposed: yes, start in parallel with Phase 1.
- *ow do we bound the
tenant_bucketlabel at 100M tenants?*With64 buckets, each bucket averages ~1.5M tenants, which is fine for p99 but not for "which tenant is slow". Proposed: spans carry raw
tenant_id, metrics carry bucketed; operators doing single-tenant investigation use the trace view. - *ampling in control plane vs data plane.*Control plane calls
(auth, schema lookups) are cheap but very frequent. Should they have a lower sample ratio than data plane queries? Proposed: yes, a separate
traces_ratio_control_plane = 0.001knob in Phase 3. - *og volume.*Emitting every log record via OTel could 10x log
volume on operators who don't want it. Proposed:
logs_enabled = trueby default but level-gated atINFO; debug logs are only emitted to the file sink.
13. References
- OpenTelemetry Semantic Conventions for Database Client Calls —
https:/pentelemetry.iodocsspecssemconvdatabase/
- OpenTelemetry Protocol (OTLP) —
https:/pentelemetry.iodocsspecsotlp
- RFC-001 §2.5, §6.8, §10 (this repo)
- backlogpending046
opentelemetryinstrumentation.kmd
14. Decision log
| Date | Decision | Notes |
|---|---|---|
| 2026 |
Drafted | Claude scribe, awaiting Rodrigo review |
| 2026 |
Corrected | Phase 2 rewritten after auditing crates/kdb-obs/: traces pipeline already exists; remaining work is metrics+logs OTLP and spec alignment |