Kdb RFC 001 kdb next hyperscale architecture

RFC001 — kdbnext: hyperscale architecture for the Koder platform data layer

Field Value
Status *ccepted*(20260409)
Author(s) Rodrigo (with Claude as scribe)
Date 20260409
Accepted 20260409 by Rodrigo
Supersedes (the aspirational docs/technical/architecture.kmd)
Target module platform/kdb/
Related RFC010 (koderid v2 cutover) — to be revised

1. Summary

This RFC defines the target architecture for the Koder platform data layer (originally "kdb-next", canonical name *db*per meta/docs/stack/registries/component-names.md since 20260522 — see note below): a horizontally sharded, leaderbased, multitenant record/relational database written in *ust* designed to support up to *00M+ tenants (organizations)*as the persistence substrate of every Koder platform product (koderid v2 first, then koderflow, koder-talk, koder-bull, kortex, kodersign, ticsign, …).

It is a *lean-slate design*that runs *longside*the current Go koder-kdb (which keeps serving observabilitymetricslog workloads) and gradually replaces its relational and KV responsibilities via the stranglerfig pattern. No cutover. No bigbang. Each phase delivers real value, has explicit acceptance criteria, and is gated by benchmark budgets.

*ote 20260522 — strangler-fig assumption superseded (ticket #730)* the "coexist for years if needed" framing of this section was written when the Koder Stack was assumed to have users whose data could not be migrated lightly. In 202605, the Stack reached prehomologation acceleration phase *ithout external users* That removed the load- bearing reason for indefinite coexistence: no production data exists to protect, no SLA exists to break. The component flipped from experimental to field-test status (koder.toml [self_hosted]) on 20260522; the kdb-next name was retired in favor of the canonical kdb. The Go kdb 1.x *bservability subset*(Prometheus scraper, alerting, log aggregation, agent registry — historically under infra/data/kdb/{cmd,internal} Go code) will move to infra/observe/kdb-obs-legacy/ per ticket #730e; the *QL/KV subset*of the Go binary is archived (per KMCP-003.1 autonomous probe 20260522, the Go SQL/KV path was never deployed to production). The path migration infra/data/kdb/next/infra/data/kdb/ is in progress (tracked in #730a..#730f); during the transition, refs to the old path continue to resolve and CI workflows remain unchanged until #730c migrates them. The strangler-fig pattern still describes the historical relationship between Go kdb 1.x and Rust kdb, but active coexistence as the default branding posture has ended.

2. Goals

  1. *enancy is a first-class primitive* not a SQL rewrite trick.

    tenant_id is the high-order bit of every key; sharding, quotas, audit, rate limiting, backup and restore are all per-tenant.

  2. *yperscale* linear horizontal scaling up to *00M+ orgs* with

    p99 read latency ≤ 10 ms and p99 write latency ≤ 30 ms at steady-state load (numbers in §6).

  3. *ultiregionready, singleregionfirst* leader-based replication

    per shard inside one region today, with the architecture allowing geo-replication to be added later without schema changes.

  4. *trong consistency by default*for control-plane data

    (orgs, users, credentials, OAuth clients, sessions). Relaxed consistency offered only as an explicit opt-in for read paths (replica reads, follower reads).

  5. *perational sanity* backups, pointintime recovery, online

    schema migration, rolling upgrades, and observability (per-tenant metrics + traces + audit) are designed in from day 0, not bolted on.

  6. *luggable storage substrate* the engine code never speaks to

    raw disk; it speaks to a KvCluster trait (consistent, ordered KV with transactions). Today that trait is implemented by TiKV; later it can be implemented by a custom RocksDB+openraft cluster, by FoundationDB, or by a sledbased singlenode dev backend — without touching the SQLrecordtenancy layers.

  7. *ustainable growth* every new feature ships with a benchmark, an

    SLO impact note, and an upgrade plan. Tech debt is named when incurred and tracked in docs/rfcs/.

3. Non-goals

  • *ultiregion activeactive*in v1. Geo-replication is in scope for

    v2 (CRDT or quorumbased), but the v1 ships singleregion.

  • *eplacing the observability subsystems*of the current Go koder-kdb

    (Prometheus scraper, alerting, log aggregation, monitoring agent registry). Those stay in Go for now and are re-evaluated separately.

  • *ackwards compatibility with kdb 1.0.x HTTP endpoints*at the wire

    level. kdb-next ships its own gRPC + HTTP/JSON surface; the bridge between old and new lives in the strangler proxy during transition.

  • * SQL parser as feature-rich as Postgres* We implement the subset

    the platform actually needs (CRUD + indexed scans + joins + simple CTEs + transactions + prepared statements), and grow it as products ask for it. We do *ot*chase Postgres compatibility for its own sake.

  • *elf-managed Kubernetes operator*in v1. Deployment is systemd

    units + a control script + tooling. Operator can come later.

  • *eneralpurpose OLAP* kdbnext is OLTP-first; analytical

    workloads either ride on followers or get exported to a separate warehouse. We do not chase HTAP.

4. Background: where kdb is today

To make sound decisions, the gap between docs/technical/architecture.kmd and the actual implementation must be acknowledged.

4.1 What architecture.kmd describes

A 10layer "multimodel database engine" written in Koder Koda with 8 data models (Object, Relational, Document, Graph, TimeSeries, KeyValue, Vector, Log), MVCC with serializable isolation, leveled LSM-tree storage, 8 index types, RBAC + RLS, Raft consensus, consistent-hash sharding, geo-partitioning, KQLLogQLPromQL query languages, GraphQL + WebSocket + REST surfaces.

4.2 What the Go binary actually does

Feature Status Notes
HTTP server on 0.0.0.0:7900 ✅ Real internal/api/server.go, ~870 LOC
internal/storage/ (Go, ~1.7k LOC) ✅ Real Backed by modernc.org/sqlite (pure Go, singlenode, singlewriter)
Generic SQL endpoints (kdb 1.0.6) ✅ Real internal/api/sql.go, ticket 040; namespace prefix + optin `XOrg-ID` rewriter
KV API (internal/api/kv.go) ✅ Real Single-tenant, shared SQLite
Prometheus scraper + metrics storage ✅ Real internal/storage/metrics.go, ~630 LOC
Alert engine ✅ Real internal/storage/alerts.go, ~330 LOC
Agent + site registries ✅ Real internal/storage/{agents,sites}.go
Static API key auth ✅ Real One global key in config.yaml
Migrations ✅ Real internal/storage/migrations.go
8 data models ❌ Not built Aspirational in `src

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/kdb-RFC-001-kdb-next-hyperscale-architecture.md