Kdb

KDB — Koder Database

  • *rea:*Data Platform
  • *ath:*infra/data/kdb (the kdb Rust component lives under infra/data/kdb/ during the transitional period — path move tracked in ticket #730a'..#730f)
  • *ind:*Twoinone (transitional) — observability DB legacy (Go, moving to infra/observe/kdb-obs-legacy/ per #730e) + general-purpose relational engine (Rust, the canonical kdb)
  • *tatus:**ieldtest*(since 202605-22 per #730a) — promoted from experimental because the Koder Stack has no users yet and the remaining gates (stability, production_proven) only close via adoption

The kdb name (20260522 — ticket #730 reconciliation)

Per meta/docs/stack/registries/component-names.md, the canonical name of this component is *db*(display + bare + slug). The kdb-next label is the transitional name of the *ust rewrite while it coexisted*with the Go kdb 1.x — that coexistence is ending. References to "kdb-next" in older docs and commits are historical; new docs use "kdb".

Two KDBs, one name — transitional split

Legacy Go kdb (moving to infra/observe/kdb-obs-legacy/ per #730e)

The Go engine historically at infra/data/kdb (Go binary koder-kdb + kdb-soak) is a specialized multimodel database for the *oder Observability platform* It ingests diagnostic data from `kodertraced agents and serves queries to the observe/observability` dashboard.

The Go binary's *QLKV path*never reached production (per KMCP003.1 autonomous probe 202605-22 of s.khost1) and will be archived. The Go binary's *bservability subset*(Prometheus scraper, alerting, log aggregation, agent registry) moves to `infraobservekdbobslegacy` — it's a service of observability that historically lived next to kdb, not a property of kdb itself.

*eatures (legacy Go obs subset):*

  • *ata Ingestion*— REST API for metrics, ping, HTTP, DNS, traceroute, port, MTR, browser timing, and system metrics
  • *uery Languages*— KQL (Koder Query Language), PromQL, GraphQL
  • *lerting Engine*— Rule-based, threshold + duration
  • *ealth Scoring*— Per-site score (0–100) from multiple dimensions
  • *nomaly Detection*— Z-score based, with trend prediction
  • *rafana Compatibility*— Prometheus remote writeread, `metrics` exposition
  • *etention Management*— Automatic cleanup

kdb (Rust, canonical) — General-purpose relational, hyperscale

The Rust kdb, defined in *FC001*and currently living under infra/data/kdb/ (path move tracked in #730a'..#730f), is a fromscratch relational engine targeting 100M+ tenants on a TiKV substrate with Raft consensus. It is the substrate for every Koder SaaS that needs relational storage — Flow (Phase 7.1 done), Talk (Phase 7.2 in progress), Kortex (Phase 7.3 pending), and beyond.

*urrent version:*v0.1.0 — softpromoted to `fieldtest 2026-05-22 (#730a, version bump from 0.0.14). Built on v0.0.13 performance wave (#676, #680, #683, #685, #686, #690, #699 + admin closures), v0.0.12 RFC #548 god-file decomposition, and v0.0.11 RFC-009 active-active observability + RFC-012 vector/graph substrates + multi-table 2PC fix. Phase 7 migration of koderflow/kodertalk/koderkortex` from Go kdb to kdb as primary is in progress; 7.1 (flow flip 20260504) done, 7.2 (talk flip) and 7.3 (kortex activation) tracked under tickets #547 and #545. Bench v0.0.13 dev smoke: large_scan 4.4× ops/s, p99 6.8× faster vs v0.0.12 (1.32× pgfaster, gate satisfied).

*isk format break window* until 20260630, the on-disk format may change without a migration tool — no user data exists to lose. After that date, format changes require a migration tool.

*lient SDKs*— all standard PostgreSQL drivers work natively. kdbspecific multitenant wrappers:

Language Package Location
Go go.koder.dev/kdb (pgx/v5 wrapper) infra/data/kdb/sdk/go-pgwire/
Python kdb-next (asyncpg + psycopg2) infra/data/kdb/sdk/python/
JavaScript/TS @koder/kdb (node-postgres wrapper) infra/data/kdb/sdk/js/
Ruby/Rails kdb-next gem (pg + ActiveRecord adapter) infra/data/kdb/sdk/ruby/

Recent changes

kdb backlog sweep — graph SQL functions, kdbtx analysis, pg_dump split (20260511, `/kgo kdb`)

Triagem dos 3 tickets pendentes do backlog (infra/data/kdb/backlog/pending/):

  • *316 (kdbtx extract)*— fechado como *on'tdo*após auditoria. Premissa do ticket estava errada: módulos tx/mvcc/lock/deadlock não existem em kdb-record/src/ (apenas audit, branch, catalog, cow_kv, error, fingerprint_history, keys, merge, migrate, schema, tenant, tier, wire). A tx API já vive em kdb-kv-trait (KvTx trait + IsolationLevel); MVCClocksdeadlock são delegados aos backends (sled local, TiKV Percolator 2PC); RecordTx é wrapper fino sobre Box<dyn KvTx>. Tickets #163#167 shipados na kvtraitbackends, não em kdbrecord. RFC001 §Roadmap não lista kdb-tx nas subRFCs pendentes. Criar crate vazio de reexports violaria hyperscale-first sem benefício. Análise completa em `backlogdone/316kdbtxextractcrate.kmd`.
  • *326 (graph SQL functions)*— shipado. 4 algoritmos do kdb-graph agora expostos como tablevalued functions SQL: pagerank('graphs', 1), connected_components('graphs', 1), louvain('graphs', 1), shortest_path_weighted('graphs', 1, from, to). Arquitetura segue o padrão Plan::VectorSearch (#062): novo plan node Plan::GraphCall { function, args, output_columns }, novo trait method RowSource::graph_function(name, args), builder build_graph_call em build.rs, impl RecordAdapter::graph_function em `kdbadaptersrclib.rs que abre kdbgraph::Graph::new(kv, tenant, tableid) via blockinplace + rt.blockon. **Divergência consciente do contrato proposto**: arg table TEXT adicional à esquerda do graphid porque kdb-graph e kdb-record compartilham o namespace (tenant, tableid) sem registry SQL-level de "qual record-table backs qual graph". Quando shipar CREATE GRAPH (futuro), overload sem table arg fica óbvio. Test roundtrip em kdb-adaptertestsintegration.rs::testgraphsqlfunctionsviaadapter valida shape de coluna, cardinalidade e invariantes do algoritmo (PageRank sum=1.0, CC=2 componentes, shortest-path resolve 1→2→3 na cycle + zero rows no caso desconectado). Pg_catalog match sites atualizados: outputnames, foldplan, substouterplan, bindparamsinplan, planhasouterrefs, nodelabel/jsonnodetype/children (explain), collectseqscantables/findtableinplan/findlockrowsclause (pgwire plan_walk). 17/17 adapter tests verdes; workspace inteiro cargo build -orkspace` verde. Followups: roundtrip via psql socket (gateway integ tooling, ticket futuro) e otimizer pass pra GraphCall (perf gate; #327+).
  • *333 (pg_dump v17 compat)*— fechado como *plit* Escopo grande demais pra prioridade baixa num único PR; quebrado em 4 sub-tickets independentes em backlog/pending/:
    • *339*— Audit do query stream do pg_dump v17 contra Postgres real (PG17 docker + tcpdump + diff report vs kdbgateway v0.0.13). Spec normativa em `metadocsstackspecspostgrescompat/pgdumpv17.kmd` (a criar). Bloqueia #341 e #342.
    • *340*— Wire-protocol COPY <table> TO STDOUT (CopyOutResponse + CopyData text + CopyDone). Hoje só COPY ... TO '<file>' existe; comment em copy_file.rs:248 referencia pgwire_traits.rs que não existe (stale).
    • *341*— pgcatalog coverage (pgattribute v17 rows, pgsequence, pggetserialsequence, pggetfunctionarguments, pgconstraint, pg_type) — depende de #339.
    • *342*— Roundtrip test harness pg_dump | pg_restore contra Postgres vanilla v17 — depende de #339 + #340 + #341. CI gate weekly + ondemand.

#260 (que marcou pgdumprestore-compat done prematuramente) fica efetivamente reaberto pelo split.

Estado atual do backlog kdb (postsweep): 0 pending dos 3 originais; 4 pending novos (#339#342) com dependências explicitamente declaradas.

Graph perf runbook — infra/data/kdb/docs/runbooks/graph-pagerank-perf.kmd (20260510, /k-go kdb #329)

Runbook humano operacional ata o tripé #318 (algoritmos) + #327 (bench) + #328 (CI gate). 7 seções: como rodar o bench, como ler output, como interpretar baseline, como atualizar baseline, como diagnosticar regressão (4step ladder: phase isolation, git bisect com comando pronto, flamegraph, CI artifact diff), quando triggerar full gate, followups conhecidos. Path canônico via policies/content-location.kmd.

Graph perf CI gate — .gitea/workflows/kdb-graph-perf-gate.yml (20260510, /k-go kdb #328)

Workflow novo gateando regressão de perf do PageRank em CI. 2 jobs:

  • *quick** — 100k/1M, gate snapshot+pr ≤ 5s` (baseline 1.380s). Push/PR.
  • *full** — 1M/10M, gate snapshot+pr ≤ 60s (RFC-012 promise, baseline 25.062s). workflow_dispatch` apenas.

Baseline em infra/data/kdb/crates/kdb-graph/perf/baseline.json (4 escalas + thresholds + notes). Parser bash extrai walltime do stderr do bench; awk faz comparison sem dep de bc. Followup #329: runbook humano em `docsperformancegraphpagerank.kmd`.

Graph perf validation — RFC012 promise empiricamente confirmada (20260510, `/kgo kdb` #327)

kdb-graph/examples/pagerank_scale.rs shipou. cargo run --release --example pagerank_scale -p kdb-graph mede endtoend. RFC012 promise (1M nodes / 10M edges < 60s) *alidado: 25.062s*no laptop do dev (sled local). Scaling linear; PageRank é 78.5% do walltime, snapshot 21.5%. Followup #328 plug em CI nightly.

Graph v2 — PageRank + Louvain + connected components + Dijkstra (20260510, /k-go kdb #318)

kdb-graph cumpre a promessa do README v0.8.0 §"Graph" (PageRank + community detection + shortest path). 4 algoritmos novos em algorithms.rs:

  • *Graph::pagerank`*— power iteration com dangling-node mass redistribution. Convergence L1, final renorm.
  • *Graph::connected_components`*— union-find na projeção undirected, ordering determinístico.
  • *Graph::louvain`*— greedy modularity (singlelevel), tiebreak por community id.
  • *Graph::shortestpathweighted`*— Dijkstra min-heap, rejeita pesos negativos.
  • *Graph::snapshot`*— primitive público pra reuso entre algoritmos no mesmo grafo (1 substrate scan).

37 testes verdes em kdb-graph (eram 23). Pesos lidos de ("weight", "<float>") prop; ausente ⇒ 1.0 (unweighted). Follow-ups: #326 (SQL function exposure), #327 (bench em escala 1M/10M).

Observability — RFC002 §6§7§4 fechados (20260510, `/kgo kdb`)

5 tickets fechados num único lote: *321*(backlog status/location policy), *312*(OTLP metrics exporter Rust), *313*(OTel logs wiring Rust), *315*(semantic conventions §4), *046*(umbrella close — lado Rust 100%).

  • *kdbobs::metrics::OtelInstruments** + **GatewayMetrics::newwithotel(meter_provider)`*— Prometheus → OTLP mirror. Nomes RFC002 §6.1 (kdb.query.duration, kdb.rpc.count, etc.) emitidos via OTel; aliases Prometheus (kdb_*_seconds) preservados pra compat de dashboards.
  • *kdbgateway::main** ganha 6 CLI flags otel* que controlam o pipeline. Quando enabled, chama inittracing(OtelConfig) (já existia em kdb-obs::tracingsetup) e passa meter_provider pro GatewayMetrics. Init failure é non-fatal (fallback fmt-only`).
  • *emantic conventions §4*plugadas em 8 spans da SQL chain: db.kdb.model = "relational" cross-stack; db.kdb.tenant_id nos spans pgwire (do_query simple/extended); placeholders pra db.kdb.rows_returned, db.kdb.cache_hit, db.kdb.plan_hash (recordáveis pelos call sites).
  • *policiesbacklog.kmd § Status ↔ location consistency** — regra nova + script metacontextscriptsauditbacklogstatuslocation.sh`. kdb cleanup: 119 mismatches → 0. Stackwide drift (2637) tracked em projects/koder-stack#125.

Lado Go (#314) fica como track separado pósPhase 7.4. 10 testes em `kdbobs (eram 7), 357 em kdb-pgwire` (sem regressão).

Multitenancy — #322 lote 2 done (20260510, `/kgo kdb`)

FoundationID v2 bearer JWT chega na surface pgwire. Fecha o ticket #322 — lote 1 já tinha estabelecido koder_user_idworkspace_id no AuthContext e o helper Tenant::from_user_workspace; lote 2 plugou tudo no StartupMessage.

  • *PgwireState.jwt_validator: OptionArc<JwtValidator>** — campo novo (espelha metrics do #310). Single canonical ArcJwtValidator` por processo, partilhado com os 6 interceptors gRPC do #320 — JWKS cache amortizada cross-surface.
  • *KdbScramStartupHandler + KdbStartupHandler** — short-circuit Foundation/ID v2 antes do challenge SCRAM/cleartext: cliente passa koderidtoken=jwt no StartupMessage, validador checa, finish_authentication` fecha o handshake. Sem token ⇒ caminho SCRAM legado 100% inalterado. Token inválido ⇒ FATAL/SQLSTATE 28000 antes do banner.
  • *rosstenant gate*— quando state.tenant_id != 0 (listener bound a um tenant kdb específico), o tenant canônico derivado do token tem que casar; senão rejeita com status=cross_tenant. Cumpre `specs/multitenancy/contract.kmd § error model "404 not 403"`.
  • *canonicaltenantid(&AuthContext) > u64** — ponto de adoção do Tenant::fromuserworkspace: usa o helper quando koderuserid` está presente, fallback pro claims.tid legado v1.
  • *udit log dedicado*(tracing::info!(target: "kdb.audit.jwt", …)) — toda validação JWT emite linha estruturada (timestamp, user_id, tenant_id, token_jti, status, source_ip). 8 status estáveis pra parsing.

Tests: *0 verdes em auth_handler*(eram 14) — 8 roundtrip com JWKS server stub (T1–T8 do `specsidentityloginresolutiontesttemplate.kmd` + crosstenant T9 do contract) + 8 unit tests pros helpers (canonical, AuthError mapping, validate edge cases). 26 em kdbauth sem regressão.

Multitenancy — #322 lote 1 (20260510, `/kgo kdb`)

Foundations da identidade canônica do policies/multi-tenant-by-default.kmd.

  • *AuthContext** ganha koderuserid + workspace_id` (OptionString).
  • *JwtClaims** deserializa os campos novos com #[serde(default)]` (compat com Foundation/ID v1).
  • *HmacValidator** popula os campos novos como None`.
  • *Tenant::fromuserworkspace(userid, workspaceid)** — single canonical mapping (BLAKE3-truncated, domain-separation byte, catalog-namespaced, nunca colide com TENANT_SYSTEM`).

26 kdb-auth + 6 novos tenant tests verdes. Lote 2 fechado na mesma janela (entrada anterior).

Security wave — #320 done (20260510, /k-go kdb)

Foundation/ID JWKS integration no gateway gRPC.

  • *kdb-gateway::auth** ganha modos with_jwt(secret, jwt)` e

    jwt_only(jwt) além do new(secret) legacy. Token detection por shape (looks_like_jwt).

  • *kdbgateway::main** ganha CLI flags jwtjwks-url` /

    --jwt-issuer / --jwt-refresh-secs. Arc<JwtValidator> único reusado pelos 6 interceptors gRPC.

  • O crate kdb-auth (Phase 2.4) já tinha JWT/JWKS validation completa

    — o gap real era apenas wiring. 24 tests passando.

  • *322*novo (prioridade alta · seg, filho de #320): pgwire bearer

    JWT no StartupMessage + claim-mapping koder_user_id/workspace_id pra atender policies/multi-tenant-by-default.kmd + specs/multi-tenancy/contract.kmd.

Observability evolve wave — lote 2 (20260509, /k-go kdb)

Mesma sessão estendeu o #311 para cobrir a SQL chain inteira:

  • *kdbsql::parse_sql** ganha span kdb.parse` (RFC002 §5.1).
  • *kdb-planner::build + buildwithcatalogandparams** ganham span kdb.plan`.
  • *kdb-planner::exec::{execute, executewithparams, execute_dml}** ganham span kdb.execute`.
  • *kdb-pgwire Simple + Extended do_query** ganham span root kdb.query com surface = "pgwire.simple"|"pgwire.extended"`.

O pipeline RFC002 §5.1 está agora coberto pontaa-ponta: kdb.querykdb.parsekdb.plankdb.executekdb.storage.read|write. *315*(adoção das semantic conventions §4 nos spans existentes) está agora *esbloqueado*

Tests verdes: 90 kdbsql + 340 kdbpgwire + 653/655 kdbplanner (2 falhas préexistentes documentadas abaixo em v0.0.13).

Observability evolve wave (20260509, /k-evolve)

First lote do gap *bservability*levantado pela RFC-002 (OpenTelemetry):

  • *310 done*— kdb-gateway agora expõe /metrics em port 9301 quando

    --metrics-bind (ou KDB_METRICS_BIND) é setado. GatewayMetrics é construído como Arc e compartilhado com PgwireState, então pgwire query histograms passam a aparecer no scrape Prometheus do gateway.

  • *311 done (lote 1)*— #[tracing::instrument] aplicado em

    Record::{put,get,delete,scan}_stamped, Catalog::{ensure,lookup}_{tenant,table} e nos handlers gRPC Record.{Put,Get,Delete,Scan}. Spans seguem RFC-002 §5.1 (kdb.querykdb.storage.read|write, kdb.catalog.*) e carregam os attributes §4 (db.system, db.operation, db.kdb.tenant_id, table_id).

  • *046 voltou pra pending

Source: ../home/koder/dev/koder/meta/docs/stack/modules/kdb.md