Kdb RFC 001 kdb next hyperscale architecture
RFC001 — kdbnext: hyperscale architecture for the Koder platform data layer
| Field | Value |
|---|---|
| Status | *ccepted*(2026 |
| Author(s) | Rodrigo (with Claude as scribe) |
| Date | 2026 |
| Accepted | 2026 |
| Supersedes | (the aspirational docs/technical/architecture.kmd) |
| Target module | platform/kdb/ |
| Related | RFC |
1. Summary
This RFC defines the target architecture for the Koder platform data layer (originally "kdb-next", canonical name *db*per meta/docs/stack/registries/component-names.md since 20260522 — see note below): a horizontally sharded, leaderbased, multitenant record/relational database written in *ust* designed to support up to *00M+ tenants (organizations)*as the persistence substrate of every Koder platform product (koderid v2 first, then koderflow, koder-talk, koder-bull, kortex, kodersign, ticsign, …).
It is a *lean-slate design*that runs *longside*the current Go koder-kdb (which keeps serving observabilitymetricslog workloads) and gradually replaces its relational and KV responsibilities via the stranglerfig pattern. No cutover. No bigbang. Each phase delivers real value, has explicit acceptance criteria, and is gated by benchmark budgets.
*ote 2026
0522 — strangler-fig assumption superseded (ticket #730)* the "coexist for years if needed" framing of this section was written when the Koder Stack was assumed to have users whose data could not be migrated lightly. In 202605, the Stack reached prehomologation acceleration phase *ithout external users* That removed the load- bearing reason for indefinite coexistence: no production data exists to protect, no SLA exists to break. The component flipped fromexperimentaltofield-teststatus (koder.toml [self_hosted]) on 20260522; thekdb-nextname was retired in favor of the canonicalkdb. The Go kdb 1.x *bservability subset*(Prometheus scraper, alerting, log aggregation, agent registry — historically underinfra/data/kdb/{cmd,internal}Go code) will move toinfra/observe/kdb-obs-legacy/per ticket #730e; the *QL/KV subset*of the Go binary is archived (per KMCP-003.1 autonomous probe 20260522, the Go SQL/KV path was never deployed to production). The path migrationinfra/data/kdb/next/→infra/data/kdb/is in progress (tracked in #730a..#730f); during the transition, refs to the old path continue to resolve and CI workflows remain unchanged until #730c migrates them. The strangler-fig pattern still describes the historical relationship between Go kdb 1.x and Rust kdb, but active coexistence as the default branding posture has ended.
2. Goals
- *enancy is a first-class primitive* not a SQL rewrite trick.
tenant_idis the high-order bit of every key; sharding, quotas, audit, rate limiting, backup and restore are all per-tenant. - *yperscale* linear horizontal scaling up to *00M+ orgs* with
p99 read latency ≤ 10 ms and p99 write latency ≤ 30 ms at steady-state load (numbers in §6).
- *ulti
regionready, singleregionfirst* leader-based replicationper shard inside one region today, with the architecture allowing geo-replication to be added later without schema changes.
- *trong consistency by default*for control-plane data
(orgs, users, credentials, OAuth clients, sessions). Relaxed consistency offered only as an explicit opt-in for read paths (replica reads, follower reads).
- *perational sanity* backups, point
intime recovery, onlineschema migration, rolling upgrades, and observability (per-tenant metrics + traces + audit) are designed in from day 0, not bolted on.
- *luggable storage substrate* the engine code never speaks to
raw disk; it speaks to a
KvClustertrait (consistent, ordered KV with transactions). Today that trait is implemented by TiKV; later it can be implemented by a custom RocksDB+openraft cluster, by FoundationDB, or by a sledbased singlenode dev backend — without touching the SQLrecordtenancy layers. - *ustainable growth* every new feature ships with a benchmark, an
SLO impact note, and an upgrade plan. Tech debt is named when incurred and tracked in
docs/rfcs/.
3. Non-goals
- *ulti
region activeactive*in v1. Geo-replication is in scope forv2 (CRDT or quorum
based), but the v1 ships singleregion. - *eplacing the observability subsystems*of the current Go
koder-kdb(Prometheus scraper, alerting, log aggregation, monitoring agent registry). Those stay in Go for now and are re-evaluated separately.
- *ackwards compatibility with
kdb 1.0.xHTTP endpoints*at the wirelevel. kdb-next ships its own gRPC + HTTP/JSON surface; the bridge between old and new lives in the strangler proxy during transition.
- * SQL parser as feature-rich as Postgres* We implement the subset
the platform actually needs (CRUD + indexed scans + joins + simple CTEs + transactions + prepared statements), and grow it as products ask for it. We do *ot*chase Postgres compatibility for its own sake.
- *elf-managed Kubernetes operator*in v1. Deployment is systemd
units + a control script + tooling. Operator can come later.
- *eneral
purpose OLAP* kdbnext is OLTP-first; analyticalworkloads either ride on followers or get exported to a separate warehouse. We do not chase HTAP.
4. Background: where kdb is today
To make sound decisions, the gap between docs/technical/architecture.kmd and the actual implementation must be acknowledged.
4.1 What architecture.kmd describes
A 10layer "multimodel database engine" written in Koder Koda with 8 data models (Object, Relational, Document, Graph, TimeSeries, KeyValue, Vector, Log), MVCC with serializable isolation, leveled LSM-tree storage, 8 index types, RBAC + RLS, Raft consensus, consistent-hash sharding, geo-partitioning, KQLLogQLPromQL query languages, GraphQL + WebSocket + REST surfaces.
4.2 What the Go binary actually does
| Feature | Status | Notes |
|---|---|---|
HTTP server on 0.0.0.0:7900 |
✅ Real | internal/api/server.go, ~870 LOC |
internal/storage/ (Go, ~1.7k LOC) |
✅ Real | Backed by modernc.org/sqlite (pure Go, single |
| Generic SQL endpoints (kdb 1.0.6) | ✅ Real | internal/api/sql.go, ticket 040; namespace prefix + opt |
KV API (internal/api/kv.go) |
✅ Real | Single-tenant, shared SQLite |
| Prometheus scraper + metrics storage | ✅ Real | internal/storage/metrics.go, ~630 LOC |
| Alert engine | ✅ Real | internal/storage/alerts.go, ~330 LOC |
| Agent + site registries | ✅ Real | internal/storage/{agents,sites}.go |
| Static API key auth | ✅ Real | One global key in config.yaml |
| Migrations | ✅ Real | internal/storage/migrations.go |
| 8 data models | ❌ Not built | Aspirational in `src |