Kdb RFC 006 explicit er modeling

RFC006 — Explicit EntityRelationship Modeling in kdb-next

Field Value
Status *ccepted + Implemented*
Author(s) Rodrigo (with Claude as scribe)
Date 20260414
Completed 20260414
Target module platform/kdb/next/crates/kdb-er (new)
Related RFC001 §3 (multimodel target), RFC004 (SQL planner), RFC005 (catalog adapter), ticket #049

1. Summary

kdbnext is designed as a multimodel database (RFC-001 §3): relational, objectoriented, document, keyvalue, and entity-relationship. The first four are either implemented or have clear roadmaps. This RFC defines the *ntityrelationship (ER) model*as a firstclass construct in kdb-next.

An ER model consists of *ntities*(named objects with typed attributes), *elationships*(cardinality-annotated links between entities), and optionally *nheritance*(ISA hierarchies). Today these concepts are expressible in SQL but the mapping is implicit. This RFC introduces:

  1. An ER definition language (*.er.kmd) that can be stored alongside

    application code and tracked in version control.

  2. A kdb-er crate that parses .er.kmd files into an ErModel IR and

    generates the equivalent CREATE TABLE / ALTER TABLE … ADD CONSTRAINT SQL for execution by the existing kdb-planner.

  3. Integration with the migration system (future RFC) so that ER model

    diffs produce migration scripts automatically.

2. Motivation

2.1 The current situation

To model a "User has many Posts" relationship today a developer writes:

CREATE TABLE users (id UUID PRIMARY KEY, email TEXT NOT NULL UNIQUE);
CREATE TABLE posts (id UUID PRIMARY KEY, title TEXT NOT NULL,
                    author_id UUID NOT NULL REFERENCES users(id));

This is correct but loses semantic information: there is no way to know from the schema alone whether author_id is a 1:N ownership relationship, an optional attribution, or part of a composite M:N link without reading additional documentation.

2.2 What ER modeling adds

  1. *xplicit cardinality*— 1..1, 0..1, 1..*, 0..* captured in the

    model, not just enforced by a NOT NULL constraint.

  2. *utomatic junction tables*— M:N relationships generate the

    junction table and its FK columns deterministically.

  3. *emantic validation*— the ER layer can reject circular mandatory

    dependencies and other modeling errors before generating any SQL.

  4. *oundtrip fidelity*— an existing SQL schema can be reverseengineered

    into an ER model (future: kdb-er introspect).

3. ER Definition Language

Files use the extension .er.kmd and contain one or more entity, relationship, and enum blocks.

3.1 Entity

entity <Name> {
  <attr>: <type> [constraints...]
  ...
}

*ype aliases*(maps to SQL types):

ER type SQL type
id UUID DEFAULT gen_random_uuid()
text TEXT
text(n) VARCHAR(n)
int BIGINT
float DOUBLE PRECISION
bool BOOLEAN
date DATE
timestamp TIMESTAMPTZ
json JSONB
bytes BYTEA
<EnumName> <enum_name> (user-defined enum)

*ttribute constraints*(combinable):

Keyword Meaning
pk PRIMARY KEY
unique UNIQUE
not null NOT NULL
null nullable (default)
default <expr> DEFAULT <expr>
check (<expr>) CHECK (<expr>)
index Creates an index on this column

*xample:*

entity User {
  id:         id           pk
  email:      text         unique not null
  name:       text         not null
  role:       UserRole     not null  default 'member'
  created_at: timestamp    not null  default now()
  deleted_at: timestamp    null
}

Generated SQL:

CREATE TABLE users (
    id         UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
    email      TEXT         NOT NULL UNIQUE,
    name       TEXT         NOT NULL,
    role       user_role    NOT NULL DEFAULT 'member',
    created_at TIMESTAMPTZ  NOT NULL DEFAULT now(),
    deleted_at TIMESTAMPTZ
);

*aming convention:*entity names are PascalCase; generated table names are their snake_case plural equivalent (Userusers, BlogPostblog_posts). The plural can be overridden with @table "posts" at the top of the entity block.

3.2 Relationship

relationship <Name> {
  <Entity1> <card1>..<card2> <Entity2>
  [through: <JunctionTable>(<col1>, <col2>)]
  [fk: <owning_entity>.<attr> -> <referenced_entity>.<attr>]
  [on_delete: cascade | restrict | set_null | set_default | no_action]
  [on_update: cascade | restrict | set_null | set_default | no_action]
}

*ardinality notation:*

Notation Meaning
1..1 exactly one
0..1 zero or one (optional)
1..* one or more
0..* zero or more

*ode-generation rules:*

Pattern Generated structure
A 1..* -- B FK column on ba; NOT NULL
A 0..* -- B FK column on ba; nullable
A 0..1 -- B FK on b with UNIQUE constraint
A 1..1 -- B FK on b with NOT NULL + UNIQUE
A *..* B Junction table (required through: clause)

*xample — 1:N:*

relationship AuthorsPost {
  User 1..* -- Post
  on_delete: cascade
}

Generated: adds author_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE to the posts table (FK column name: <snake_case entity name>_id).

*xample — M:N with explicit junction:*

relationship PostTags {
  Post *..*  Tag
  through: post_tags(post_id, tag_id)
}

Generated:

CREATE TABLE post_tags (
    post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    tag_id  UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)
);

3.3 Enum

enum <Name> {
  <Variant1>
  <Variant2>
  ...
}

Generated SQL: CREATE TYPE <name> AS ENUM ('<v1>', '<v2>', ...).

3.4 ISA / Inheritance

entity Admin extends User {
  department: text not null
}

Generation strategy (configurable per entity):

  • *ablepertype*(default): admins table with FK to users(id) + extra columns.
  • *ableperhierarchy* all columns in users + type discriminator column.
  • *nline* no separate table; extra columns merged into parent table.

3.5 Annotations

Annotations beginning with @ provide hints to the code generator:

@table "custom_table_name"   -- override generated table name
@schema "app"                -- emit inside a specific schema
@immutable                   -- no UPDATE allowed (enforced by trigger)
@audit                       -- add created_at / updated_at / created_by automatically

4. Architecture

4.1 kdb-er crate

kdb-er/
  src/
    lib.rs         -- public API: parse(), generate_sql(), diff()
    parser.rs      -- hand-written recursive-descent parser for .er.kmd
    ir.rs          -- ErModel, Entity, Attribute, Relationship, ErEnum
    gen.rs         -- ErModel → Vec<String> (CREATE TABLE SQL statements)
    diff.rs        -- (ErModel, ErModel) → Vec<AlterStatement>
    validate.rs    -- semantic checks (duplicate names, circular deps, etc.)
    naming.rs      -- PascalCase → snake_case, pluralisation table

kdb-er has no async, no runtime, no TiKV dependency — it is a pure transformation library: ER source text → SQL text. This keeps it fast and trivially testable.

4.2 Integration with kdb-planner

Generated SQL strings are passed through kdb-sql::parse() then kdb-planner::build_dml() and executed against a MutableRowSource, exactly as if a user had typed the DDL. No new execution paths needed.

4.3 Integration with the catalog

Each entity is registered in the catalog as a table through the normal Catalog::ensure_table path. The ER model is *ot*stored in the catalog in v0.1; roundtrip introspection (`kdber introspect`) is deferred to a future RFC.

4.4 CLI

# Validate an ER file
kdb-er validate schema.er.kmd

# Print generated SQL
kdb-er generate schema.er.kmd

# Apply to a running kdb instance
kdb-er apply --addr kdb-next-dev:18090 --tenant koder schema.er.kmd

# Diff two ER models (migration preview)
kdb-er diff old.er.kmd new.er.kmd

The kdb-er CLI binary lives in kdb-er/src/bin/kdb-er.rs.

5. Validation rules

The validator (validate.rs) enforces:

  1. Entity names are unique within a file.
  2. Relationship endpoints reference declared entities.
  3. M:N relationships must have an explicit through: clause.
  4. No entity may have two FK columns with the same name (ambiguous

    reverse relationships require explicit fk: overrides).

  5. Circular mandatory dependencies (A 1..1 → B, B 1..1 → A) are rejected

    with an error pointing to both entities.

  6. Attribute names are unique within an entity.
  7. extends hierarchies are acyclic.

6. Scope and phasing

Phase 1 — Core (this RFC, immediate)

  • kdb-er crate: parser, IR, SQL generator for entities + 1:N / M:N relationships
  • Entity → CREATE TABLE
  • Relationship → FK column or junction table
  • Enum → CREATE TYPE … AS ENUM
  • Validation rules 1–6
  • CLI: validate + generate
  • 30+ unit tests (parser round-trips, SQL generation, validation errors)

Phase 2 — Integration

  • kdb-er apply — live DDL execution against kdb gateway
  • @audit annotation (auto-adds timestamps)
  • CLI: apply + diff

Phase 3 — Advanced (future RFCs)

  • ISA / inheritance (extends)
  • kdb-er introspect — reverse-engineer SQL schema → ER model
  • Migration integration (ER diff → versioned migration scripts)
  • Visual editor (separate tool)

7. Alternatives considered

Option Verdict
PlantUML ER syntax Wellknown but nontrivial to extend; no native KMD integration
DBML Good tooling but external ecosystem; adds a dependency
mermaid ER Markdown-friendly but limited cardinality semantics
Prisma schema Strong prior art; too opinionated about ORM layer
.er.kmd (this RFC) Custom but minimal, fits KMD ecosystem, fully controlled

8. Open questions

  1. *luralisation* English pluralisation (Userusers) is

    straightforward; irregular forms (Personpeople) need a lookup table. Proposed: ship a small built-in table; override with @table.

  1. *K column naming* Default <ref_entity_snake>_id; override with

    explicit fk: clause. Should the default be configurable per-file?

  1. *omposite PKs* Not covered in this RFC. Entity PKs are assumed

    singlecolumn UUID. Multicolumn PKs to be addressed in Phase 2.

  1. *rigger-based immutability*for @immutable: requires the

    kdb-planner trigger support (#115) to be implemented first.

9. Acceptance criteria

  • kdb-er validate rejects all 7 validation errors with clear messages
  • kdb-er generate produces valid PostgreSQL DDL for a schema with

    5+ entities, 2+ 1:N relationships, and 1 M:N relationship

  • Generated DDL round-trips: parse → generate → parse → generate is

    idempotent

  • cargo test -p kdb-er passes with ≥ 30 tests, zero warnings
  • Ticket #049 closed

10. Implementation record (20260414)

All acceptance criteria met. Phase 1 and Phase 2 are complete.

Deliverables

Crate File Description
kdb-er src/parser.rs Handwritten recursivedescent parser for .er.kmd
kdb-er src/ir.rs ErModel, Entity, Attribute, Relationship, ErEnum IR types
kdb-er src/gen.rs ErModel → Vec<String> (CREATE TABLE / FK SQL generation)
kdb-er src/validate.rs Semantic validation — rules 1–7
kdb-er src/naming.rs PascalCase → snake_case, pluralisation with built-in lookup table
kdb-er src/diff.rs (ErModel, ErModel) → ModelDiff; additive migration_sql() + summary()
kdb-er src/lib.rs Public API: parse(), validate(), generate_sql(), diff() + re-exports
kdb-er src/bin/kdb-er.rs CLI: validate, generate, diff (with --sql / --semicolon), apply (HTTP POST to /v1/sql)
kdb-gateway src/sql_service.rs dev_schemas in SqlState — CREATE TABLE without real adapter registers schema; subsequent queries resolve column types from dev_schemas
kdb-cli src/lib.rs CatalogClientHandle::ensure_table() + er_entity_to_proto_schema() conversion
kdb-cli src/main.rs kdbctl er apply subcommand → gRPC Catalog.EnsureTable per entity
kdb-cli tests/er_apply.rs 4 integration tests: column mapping, register all entities, idempotency, tenant isolation

Test counts (final)

Crate Tests Notes
kdb-er 65+ parser, IR, SQL gen, validation, naming, diff (18 diff tests)
kdb-gateway 50 includes 2 new CREATE TABLE dev-mode tests
kdb-cli 8 4 catalogfingerprint + 4 erapply integration tests

Deferred to Phase 3

  • ISA / inheritance (extends) — ticket #115 (trigger support) is a prerequisite
  • kdb-er introspect — reverse-engineer SQL schema → ER model
  • Migration integration — ER diff → versioned migration scripts
  • Visual editor

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/kdb-RFC-006-explicit-er-modeling.md