Kdb
KDB — Koder Database
- *rea:*Data Platform
- *ath:*
infra/data/kdb(the kdb Rust component lives underinfra/data/kdb/during the transitional period — path move tracked in ticket #730a'..#730f) - *ind:*Two
inone (transitional) — observability DB legacy (Go, moving toinfra/observe/kdb-obs-legacy/per #730e) + general-purpose relational engine (Rust, the canonical kdb) - *tatus:**ield
test*(since 202605-22 per #730a) — promoted fromexperimentalbecause the Koder Stack has no users yet and the remaining gates (stability,production_proven) only close via adoption
The kdb name (20260522 — ticket #730 reconciliation)
Per meta/docs/stack/registries/component-names.md, the canonical name of this component is *db*(display + bare + slug). The kdb-next label is the transitional name of the *ust rewrite while it coexisted*with the Go kdb 1.x — that coexistence is ending. References to "kdb-next" in older docs and commits are historical; new docs use "kdb".
Two KDBs, one name — transitional split
Legacy Go kdb (moving to infra/observe/kdb-obs-legacy/ per #730e)
The Go engine historically at infra/data/kdb (Go binary koder-kdb + kdb-soak) is a specialized multimodel database for the *oder Observability platform* It ingests diagnostic data from `kodertraced agents and serves queries to the observe/observability` dashboard.
The Go binary's *QLKV path*never reached production (per KMCP003.1 autonomous probe 202605-22 of s.khost1) and will be archived. The Go binary's *bservability subset*(Prometheus scraper, alerting, log aggregation, agent registry) moves to `infraobservekdbobslegacy` — it's a service of observability that historically lived next to kdb, not a property of kdb itself.
*eatures (legacy Go obs subset):*
- *ata Ingestion*— REST API for metrics, ping, HTTP, DNS, traceroute, port, MTR, browser timing, and system metrics
- *uery Languages*— KQL (Koder Query Language), PromQL, GraphQL
- *lerting Engine*— Rule-based, threshold + duration
- *ealth Scoring*— Per-site score (0–100) from multiple dimensions
- *nomaly Detection*— Z-score based, with trend prediction
- *rafana Compatibility*— Prometheus remote writeread, `metrics` exposition
- *etention Management*— Automatic cleanup
kdb (Rust, canonical) — General-purpose relational, hyperscale
The Rust kdb, defined in *FC001*and currently living under scratch relational engine targeting 100M+ tenants on a TiKV substrate with Raft consensus. It is the substrate for every Koder SaaS that needs relational storage — Flow (Phase 7.1 done), Talk (Phase 7.2 in progress), Kortex (Phase 7.3 pending), and beyond.infra/data/kdb/ (path move tracked in #730a'..#730f), is a from
*urrent version:*v0.1.0 — softpromoted to `fieldtest 2026-05-22 (#730a, version bump from 0.0.14). Built on v0.0.13 performance wave (#676, #680, #683, #685, #686, #690, #699 + admin closures), v0.0.12 RFC #548 god-file decomposition, and v0.0.11 RFC-009 active-active observability + RFC-012 vector/graph substrates + multi-table 2PC fix. Phase 7 migration of koderflowtalk/koder/koderkortex` from Go kdb to kdb as primary is in progress; 7.1 (flow flip 20260504) done, 7.2 (talk flip) and 7.3 (kortex activation) tracked under tickets #547 and #545. Bench v0.0.13 dev smoke: faster, gate satisfied).large_scan 4.4× ops/s, p99 6.8× faster vs v0.0.12 (1.32× pg
*isk format break window* until 20260630, the on-disk format may change without a migration tool — no user data exists to lose. After that date, format changes require a migration tool.
*lient SDKs*— all standard PostgreSQL drivers work natively. kdbspecific multitenant wrappers:
| Language | Package | Location |
|---|---|---|
| Go | go.koder.dev/kdb (pgx/v5 wrapper) |
infra/data/kdb/sdk/go-pgwire/ |
| Python | kdb-next (asyncpg + psycopg2) |
infra/data/kdb/sdk/python/ |
| JavaScript/TS | @koder/kdb (node-postgres wrapper) |
infra/data/kdb/sdk/js/ |
| Ruby/Rails | kdb-next gem (pg + ActiveRecord adapter) |
infra/data/kdb/sdk/ruby/ |
Recent changes
kdb backlog sweep — graph SQL functions, kdbtx analysis, pg_dump split (20260511, `/kgo kdb`)
Triagem dos 3 tickets pendentes do backlog (infra/data/kdb/backlog/pending/):
- *316 (kdb
tx extract)*— fechado como *on'tdo*após auditoria. Premissa do ticket estava errada: módulostx/mvcc/lock/deadlocknão existem emkdb-record/src/(apenasaudit,branch,catalog,cow_kv,error,fingerprint_history,keys,merge,migrate,schema,tenant,tier,wire). A tx API já vive emkdb-kv-trait(KvTxtrait +IsolationLevel); MVCClocksdeadlock são delegados aos backends (sled local, TiKV Percolator 2PC);RecordTxé wrapper fino sobreBox<dyn KvTx>. Tickets #163#167 shipados na kvtraitbackends, não em kdbrecord. RFC001 §Roadmap não listakdb-txnas subRFCs pendentes. Criar crate vazio de reexports violaria hyperscale-first sem benefício. Análise completa em `backlogdone/316kdbtxextractcrate.kmd`. - *326 (graph SQL functions)*— shipado. 4 algoritmos do
kdb-graphagora expostos como tablevalued functions SQL:adaptersrclib.rspagerank('graphs', 1),connected_components('graphs', 1),louvain('graphs', 1),shortest_path_weighted('graphs', 1, from, to). Arquitetura segue o padrãoPlan::VectorSearch(#062): novo plan nodePlan::GraphCall { function, args, output_columns }, novo trait methodRowSource::graph_function(name, args), builderbuild_graph_callembuild.rs, implRecordAdapter::graph_functionem `kdbque abrekdbgraph::Graph::new(kv, tenant, tableid)viablockinplace + rt.blockon. **Divergência consciente do contrato proposto**: argtable TEXTadicional à esquerda dographidporque kdb-graph e kdb-record compartilham o namespace(tenant, tableid)sem registry SQL-level de "qual record-table backs qual graph". Quando shipar CREATE GRAPH (futuro), overload sem table arg fica óbvio. Test roundtrip emkdb-adaptertestsintegration.rs::testgraphsqlfunctionsviaadaptervalida shape de coluna, cardinalidade e invariantes do algoritmo (PageRank sum=1.0, CC=2 componentes, shortest-path resolve 1→2→3 na cycle + zero rows no caso desconectado). Pg_catalog match sites atualizados:outputnames,foldplan,substouterplan,bindparamsinplan,planhasouterrefs,nodelabel/jsonnodetype/children(explain),collectseqscantables/findtableinplan/findlockrowsclause(pgwire plan_walk). 17/17 adapter tests verdes; workspace inteirocargo build -orkspace` verde. Followups: roundtrip via psql socket (gateway integ tooling, ticket futuro) e otimizer pass pra GraphCall (perf gate; #327+). - *333 (pg_dump v17 compat)*— fechado como *plit* Escopo grande demais pra prioridade baixa num único PR; quebrado em 4 sub-tickets independentes em
backlog/pending/:- *339*— Audit do query stream do pg_dump v17 contra Postgres real (PG17 docker + tcpdump + diff report vs kdb
gateway v0.0.13). Spec normativa em `metadocsstackspecspostgrescompat/pgdumpv17.kmd` (a criar). Bloqueia #341 e #342. - *340*— Wire-protocol
COPY <table> TO STDOUT(CopyOutResponse + CopyData text + CopyDone). Hoje sóCOPY ... TO '<file>'existe; comment emcopy_file.rs:248referenciapgwire_traits.rsque não existe (stale). - *341*— pgcatalog coverage (pgattribute v17 rows, pgsequence, pggetserialsequence, pggetfunctionarguments, pgconstraint, pg_type) — depende de #339.
- *342*— Round
trip test harnessdemand.pg_dump | pg_restorecontra Postgres vanilla v17 — depende de #339 + #340 + #341. CI gate weekly + on
- *339*— Audit do query stream do pg_dump v17 contra Postgres real (PG17 docker + tcpdump + diff report vs kdb
#260 (que marcou pgdumprestore-compat done prematuramente) fica efetivamente reaberto pelo split.
Estado atual do backlog kdb (postsweep): 0 pending dos 3 originais; 4 pending novos (#339#342) com dependências explicitamente declaradas.
Graph perf runbook — infra/data/kdb/docs/runbooks/graph-pagerank-perf.kmd (20260510, /k-go kdb #329)
Runbook humano operacional ata o tripé #318 (algoritmos) + #327 (bench) + #328 (CI gate). 7 seções: como rodar o bench, como ler output, como interpretar baseline, como atualizar baseline, como diagnosticar regressão (4step ladder: phase isolation, git bisect com comando pronto, flamegraph, CI artifact diff), quando triggerar full gate, followups conhecidos. Path canônico via policies/content-location.kmd.
Graph perf CI gate — .gitea/workflows/kdb-graph-perf-gate.yml (20260510, /k-go kdb #328)
Workflow novo gateando regressão de perf do PageRank em CI. 2 jobs:
- *quick
** — 100k/1M, gatesnapshot+pr ≤ 5s` (baseline 1.380s). Push/PR. - *full
** — 1M/10M, gatesnapshot+pr ≤ 60s(RFC-012 promise, baseline 25.062s).workflow_dispatch` apenas.
Baseline em infra/data/kdb/crates/kdb-graph/perf/baseline.json (4 escalas + thresholds + notes). Parser bash extrai walltime do stderr do bench; awk faz comparison sem dep de bc. Followup #329: runbook humano em `docsperformancegraphpagerank.kmd`.
Graph perf validation — RFC012 promise empiricamente confirmada (20260510, `/kgo kdb` #327)
kdb-graph/examples/pagerank_scale.rs shipou. cargo run --release --example pagerank_scale -p kdb-graph mede endtoend. RFC012 promise (1M nodes / 10M edges < 60s) *alidado: 25.062s*no laptop do dev (sled local). Scaling linear; PageRank é 78.5% do walltime, snapshot 21.5%. Followup #328 plug em CI nightly.
Graph v2 — PageRank + Louvain + connected components + Dijkstra (20260510, /k-go kdb #318)
kdb-graph cumpre a promessa do README v0.8.0 §"Graph" (PageRank + community detection + shortest path). 4 algoritmos novos em algorithms.rs:
- *Graph::pagerank`*— power iteration com dangling-node mass redistribution. Convergence L1, final renorm.
- *Graph::connected_components`*— union-find na projeção undirected, ordering determinístico.
- *Graph::louvain`*— greedy modularity (single
level), tiebreak por community id. - *Graph::shortestpathweighted`*— Dijkstra min-heap, rejeita pesos negativos.
- *Graph::snapshot`*— primitive público pra reuso entre algoritmos no mesmo grafo (1 substrate scan).
37 testes verdes em kdb-graph (eram 23). Pesos lidos de ("weight", "<float>") prop; ausente ⇒ 1.0 (unweighted). Follow-ups: #326 (SQL function exposure), #327 (bench em escala 1M/10M).
Observability — RFC002 §6§7§4 fechados (20260510, `/kgo kdb`)
5 tickets fechados num único lote: *321*(backlog status/location policy), *312*(OTLP metrics exporter Rust), *313*(OTel logs wiring Rust), *315*(semantic conventions §4), *046*(umbrella close — lado Rust 100%).
- *kdb
obs::metrics::OtelInstruments002 §6.1 (** + **GatewayMetrics::newwithotel(meter_provider)`*— Prometheus → OTLP mirror. Nomes RFCkdb.query.duration,kdb.rpc.count, etc.) emitidos via OTel; aliases Prometheus (kdb_*_seconds) preservados pra compat de dashboards. - *kdb
gateway::main** ganha 6 CLI flagsotel*que controlam o pipeline. Quando enabled, chamainittracing(OtelConfig)(já existia emkdb-obs::tracingsetup) e passameter_providerproGatewayMetrics. Init failure é non-fatal (fallbackfmt-only`). - *emantic conventions §4*plugadas em 8 spans da SQL chain:
db.kdb.model = "relational"cross-stack;db.kdb.tenant_idnos spans pgwire (do_querysimple/extended); placeholders pradb.kdb.rows_returned,db.kdb.cache_hit,db.kdb.plan_hash(recordáveis pelos call sites). - *policiesbacklog.kmd § Status ↔ location consistency
** — regra nova + scriptmetacontextscriptsauditbacklogstatuslocation.sh`. kdb cleanup: 119 mismatches → 0. Stackwide drift (2637) tracked emprojects/koder-stack#125.
Lado Go (#314) fica como track separado pósPhase 7.4. 10 testes em `kdbobs (eram 7), 357 em kdb-pgwire` (sem regressão).
Multitenancy — #322 lote 2 done (20260510, `/kgo kdb`)
FoundationID v2 bearer JWT chega na surface pgwire. Fecha o ticket #322 — lote 1 já tinha estabelecido koder_user_idworkspace_id no AuthContext e o helper Tenant::from_user_workspace; lote 2 plugou tudo no StartupMessage.
- *PgwireState.jwt_validator: OptionArc<JwtValidator>
** — campo novo (espelhametricsdo #310). Single canonicalArcJwtValidator` por processo, partilhado com os 6 interceptors gRPC do #320 — JWKS cache amortizada cross-surface. - *KdbScramStartupHandler
+KdbStartupHandler** — short-circuit Foundation/ID v2 antes do challenge SCRAM/cleartext: cliente passakoderidtoken=jwtnoStartupMessage, validador checa,finish_authentication` fecha o handshake. Sem token ⇒ caminho SCRAM legado 100% inalterado. Token inválido ⇒ FATAL/SQLSTATE 28000 antes do banner. - *ross
tenant gate*— quandotenancy/contract.kmd § error model "404 not 403"`.state.tenant_id != 0(listener bound a um tenant kdb específico), o tenant canônico derivado do token tem que casar; senão rejeita comstatus=cross_tenant. Cumpre `specs/multi - *canonicaltenantid(&AuthContext)
> u64back pro** — ponto de adoção doTenant::fromuserworkspace: usa o helper quandokoderuserid` está presente, fallclaims.tidlegado v1. - *udit log dedicado*(
tracing::info!(target: "kdb.audit.jwt", …)) — toda validação JWT emite linha estruturada(timestamp, user_id, tenant_id, token_jti, status, source_ip). 8 status estáveis pra parsing.
Tests: *0 verdes em auth_handler*(eram 14) — 8 roundtrip com JWKS server stub (T1–T8 do `specsidentityloginresolutiontesttemplate.kmd` + crosstenant T9 do contract) + 8 unit tests pros helpers (canonical, AuthError mapping, validate edge cases). 26 em kdbauth sem regressão.
Multitenancy — #322 lote 1 (20260510, `/kgo kdb`)
Foundations da identidade canônica do policies/multi-tenant-by-default.kmd.
- *AuthContext
** ganhakoderuserid+workspace_id` (OptionString). - *JwtClaims
** deserializa os campos novos com#[serde(default)]` (compat com Foundation/ID v1). - *HmacValidator
** popula os campos novos comoNone`. - *Tenant::fromuserworkspace(userid, workspaceid)
** — single canonical mapping (BLAKE3-truncated, domain-separation byte, catalog-namespaced, nunca colide comTENANT_SYSTEM`).
26 kdb-auth + 6 novos tenant tests verdes. Lote 2 fechado na mesma janela (entrada anterior).
Security wave — #320 done (20260510, /k-go kdb)
Foundation/ID JWKS integration no gateway gRPC.
- *kdb-gateway::auth
** ganha modoswith_jwt(secret, jwt)` ejwt_only(jwt)além donew(secret)legacy. Token detection por shape (looks_like_jwt). - *kdb
gateway::main** ganha CLI flagsjwtjwks-url` /--jwt-issuer/--jwt-refresh-secs.Arc<JwtValidator>único reusado pelos 6 interceptors gRPC. - O crate
kdb-auth(Phase 2.4) já tinha JWT/JWKS validation completa— o gap real era apenas wiring. 24 tests passando.
- *322*novo (prioridade alta · seg, filho de #320): pgwire bearer
JWT no
StartupMessage+ claim-mappingkoder_user_id/workspace_idpra atenderpolicies/multi-tenant-by-default.kmd+specs/multi-tenancy/contract.kmd.
Observability evolve wave — lote 2 (20260509, /k-go kdb)
Mesma sessão estendeu o #311 para cobrir a SQL chain inteira:
- *kdb
sql::parse_sql002 §5.1).** ganha spankdb.parse` (RFC - *kdb-planner::build
+buildwithcatalogandparams** ganham spankdb.plan`. - *kdb-planner::exec::{execute, executewithparams, execute_dml}
** ganham spankdb.execute`. - *kdb-pgwire
Simple + Extendeddo_query** ganham span rootkdb.querycomsurface = "pgwire.simple"|"pgwire.extended"`.
O pipeline RFC002 §5.1 está agora coberto pontaa-ponta: kdb.query → kdb.parse → kdb.plan → kdb.execute → kdb.storage.read|write. *315*(adoção das semantic conventions §4 nos spans existentes) está agora *esbloqueado*
Tests verdes: 90 kdbsql + 340 kdbpgwire + 653/655 kdbplanner (2 falhas préexistentes documentadas abaixo em v0.0.13).
Observability evolve wave (20260509, /k-evolve)
First lote do gap *bservability*levantado pela RFC-002 (OpenTelemetry):
- *310 done*—
kdb-gatewayagora expõe/metricsem port 9301 quando--metrics-bind(ouKDB_METRICS_BIND) é setado.GatewayMetricsé construído comoArce compartilhado comPgwireState, então pgwire query histograms passam a aparecer no scrape Prometheus do gateway. - *311 done (lote 1)*—
#[tracing::instrument]aplicado emRecord::{put,get,delete,scan}_stamped,Catalog::{ensure,lookup}_{tenant,table}e nos handlers gRPCRecord.{Put,Get,Delete,Scan}. Spans seguem RFC-002 §5.1 (kdb.query→kdb.storage.read|write,kdb.catalog.*) e carregam os attributes §4 (db.system,db.operation,db.kdb.tenant_id,table_id). - *046 voltou pra pending