Kdb RFC 003 fingerprint history

RFC-003 — Schema fingerprint history

Field	Value
Status	ccepted
Author(s)	Rodrigo (with Claude as scribe)
Date	20260410
Target module	`platform/kdb/next/crates/kdb-record` + `kdb-gateway` + `kdb-cli`
Related	RFC~~001 §6.6; `docstechnicalschema~~fingerprints.md`; #045, #051

1. Summary

With backlog #045 done, the current schema fingerprint of any table is readable through kdbctl catalog fingerprint and the LookupTableResponse.fingerprint gRPC field. What the catalog does *ot*record is *istory* the sequence of fingerprints a table has had across successive migrations. This RFC proposes an append~~only per~~table fingerprint log, a new streaming RPC Catalog.GetFingerprintHistory, and a new CLI subcommand kdbctl catalog history that together let an operator answer:

"When did the acme.users schema last change, and what fingerprint did it have before?"

without dropping to git blame on schema source files or scanning external migration logs.

2. Motivation

Three use cases, called out but deferred in #045:

*orensics.*When a production query starts failing with a
fingerprint mismatch, operators want to know when the schema changed and whether the change was intended. A timestamp + actor in the catalog answers that directly.
*ersion pinning for pre-compiled clients.*A client binary
that pins a fingerprint at build time needs to know whether the current fingerprint is compatible with the pinned one. Without history, it can only check equality. With history, the client can ask "has my pinned fingerprint ever been stamped on this table?" and refuse to talk to a table whose current fingerprint is unknown to it.
*ollback decisions.*If a bad migration ships, operators can
look at the previous entry in the history log, read the previous schema shape, and decide whether rolling back is safe.

These are not hypothetical: #045 §"Use cases" lists each of them and explicitly defers them to this RFC.

3. Non-goals

*ointintime restore.*The fingerprint log is metadata; it
does not store row data. Rolling back data to a past fingerprint is a separate problem (PITR) and is not in scope.
*utomatic migration of stamped rows.*If a schema changes
from fingerprint A to B, the rows stamped with A remain stamped with A; this RFC does not propose rewriting them. Compatibility is the record layer's job, not the log's.
*ontent diffing.*The log stores schema snapshots, not
pretty~~printed diffs. A future follow~~up may layer a diff visualizer on top, but the log itself is raw.
*ulti-region consistency of the log.*Single region today,
per RFC~~001 §3 non~~goal list. When RFC~~001 Phase 6 (multi~~region) lands, the log inherits whatever model the catalog chooses for multi-region.

4. Data model

4.1 Log shape

One append-only log per (tenant_id, table_id). Each entry:

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct FingerprintHistoryEntry {
    /// Monotonically increasing generation number within this log.
    /// Starts at 1 on the first ensure_table; bumps on every
    /// schema change that produces a different fingerprint.
    pub generation: u64,
    /// The u64 fingerprint produced by Schema::fingerprint at the
    /// time this entry was written.
    pub fingerprint: u64,
    /// Unix epoch microseconds when this entry was committed.
    pub migrated_at: u64,
    /// Principal that committed the change. Pulled from the
    /// AuthContext scoped to the EnsureTable call. "<system>" if
    /// the change was made by an internal process with no token.
    pub migrated_by: String,
    /// Canonical bytes of the schema at this generation.
    /// Replaying `Schema::decode(&snapshot)` reproduces exactly
    /// the schema whose fingerprint is `self.fingerprint`.
    pub snapshot: Vec<u8>,
}

4.2 Keyspace layout

Using the existing kdb-record keyspace convention:

catalog:fingerprint_history:<tenant_id>:<table_id>:<generation>

tenant_id is a fixed~~width big~~endian u64; table_id is a fixed~~width big~~endian u32 (matching the existing proto types); generation is a fixed~~width big~~endian u64. The layout preserves per-log ascending order under the natural lex order of KvCluster, so a forward scan over catalog:fingerprint_history:<tenant>:<table>: is already a chronological iteration. No secondary index needed.

4.3 Retention

Default retention: *nlimited* The expected volume is low (schema migrations are rare — tens per year per table in practice) and the value per entry is small (~1 KB including the snapshot). At 100M tenants × 10 tables × 10 entries per year, the worst-case annual footprint is on the order of tens of gigabytes cluster-wide — acceptable.

A per-tenant retention knob (max_history_entries) is a configurable follow-up but *ot shipped in v1* If an operator hits a volume concern before the knob lands, manual pruning (kdbctl catalog history --prune-older-than <timestamp>) is the escape hatch; see §8.

5. Write path

The append happens *nside the same transaction*as the schema change in Catalog::ensure_table. Pseudocode:

pub async fn ensure_table(&self, tenant: Tenant, schema: Schema)
    -> RecordResult<(TableId, Schema, u64 )>
{
    let mut tx = self.kv.begin_tx().await?;
    let existing = self.lookup_table_tx(&mut tx, tenant, &schema.name).await?;

    let next_fp = schema.fingerprint();
    match existing {
        Some((tid, current, cur_fp)) if cur_fp == next_fp => {
            // Idempotent re-ensure, no history entry.
            tx.rollback().await?;
            Ok((tid, current, cur_fp))
        }
        Some((tid, _, _)) => {
            // Schema changed. Append a new history entry.
            let gen = self.next_generation(&mut tx, tenant, tid).await?;
            let entry = FingerprintHistoryEntry { gen, fingerprint: next_fp, .. };
            self.write_history_entry(&mut tx, tenant, tid, &entry).await?;
            self.update_table_schema(&mut tx, tenant, tid, &schema).await?;
            tx.commit().await?;
            Ok((tid, schema, next_fp))
        }
        None => {
            // First-time creation. Allocate table_id and write
            // generation=1 entry.
            let tid = self.allocate_table_id(&mut tx).await?;
            let entry = FingerprintHistoryEntry { gen: 1, fingerprint: next_fp, .. };
            self.write_history_entry(&mut tx, tenant, tid, &entry).await?;
            self.insert_table(&mut tx, tenant, tid, &schema).await?;
            tx.commit().await?;
            Ok((tid, schema, next_fp))
        }
    }
}

Invariants (enforced by the transaction's serializable isolation):

*tomicity.*The history entry and the catalog row move
together. There is no window where the catalog thinks the table is at fingerprint B but the history still ends at A.
*onotonicity.*generation is strictly increasing per log.
Two concurrent ensure_table calls with different schemas against the same table serialize on the catalog row, so exactly one wins and the other re-enters the match arm above.
*dempotence.*A re-ensure_table with an unchanged schema
rolls back; no history entry is written. Without this, a busy client redoing ensure_table on every boot would flood the log.

6. Read path — gRPC

New RPC on kdb.v1.catalog.Catalog:

rpc GetFingerprintHistory(GetFingerprintHistoryRequest)
    returns (stream FingerprintHistoryEntryProto);

message GetFingerprintHistoryRequest {
  uint64 tenant_id    = 1;
  uint32 table_id     = 2;
  // Optional: minimum generation to return. 0 = from the start.
  uint64 since_generation = 3;
  // Optional: page size hint. 0 = server default (64).
  uint32 limit        = 4;
  // Optional: return in descending order (most recent first).
  // Default is ascending.
  bool   descending   = 5;
}

message FingerprintHistoryEntryProto {
  uint64 generation   = 1;
  uint64 fingerprint  = 2;
  uint64 migrated_at  = 3;  // unix microseconds
  string migrated_by  = 4;
  kdb.v1.schema.Schema snapshot = 5;
}

Streaming rather than unary because the log is unbounded in the long run. The server enforces a hard cap of 1024 entries per request regardless of limit; the client pages via since_generation.

Error mapping:

NOT_FOUND — tenant or table unknown.
PERMISSION_DENIED — the AuthContext does not carry a
catalog scope for this tenant.
INVALID_ARGUMENT — since_generation larger than the current
max generation by more than a sanity threshold (likely a bug).

7. Read path — CLI

New subcommand under kdbctl catalog:

kdbctl catalog history \
    --tenant acme \
    --table users \
    [--since <generation|duration>] \
    [--limit 64] \
    [--desc] \
    [--json]

Default output (human):

gen  fingerprint           migrated_at           migrated_by
---  -------------------   -------------------   -------------
1    0x3c7d91b6f2a8e0d4   2026-03-01 09:14:02   rodrigo@koder.dev
2    0x8a04f91e2b5c0d17   2026-03-18 15:42:11   cicd-bot
3    0xde1a44f0e77bb902   2026-04-09 11:07:55   rodrigo@koder.dev

With --json, emit one entry per line (ndjson) so the output can be piped into jq or an alerting pipeline. The snapshot field is base64-encoded in the json form.

Resolving --since as a duration (--since 7d) is a local convenience: the CLI translates it to a generation by first calling GetFingerprintHistory with since_generation = 0, limit = 1024, descending = true and finding the first entry older than now minus the duration. This keeps the server API agnostic to wall-clock time interpretation.

8. Pruning

Not v1. When the need lands, expose Catalog.PruneFingerprintHistory(tenant_id, table_id, keep_last N | before_generation G) and a mirror kdbctl catalog history --prune. The log stays append-only from the caller's perspective; pruning is an explicit administrative operation, never automatic.

Pruning never touches generation numbers — if you prune entries 1..=5, the next ensure_table still writes generation N+1 where N is the highest pruned generation. This preserves the monotonicity invariant from §5.

9. Pre-existing tables

Tables that exist before this feature ships have *o history entries* Two options for how to handle them on first read:

*ption A (lazy backfill).*On the first ensure_table, LookupTable, or GetFingerprintHistory that touches a legacy table, synthesize a generation-1 entry with migrated_at = 0, migrated_by = "<pre-history>" and write it in a background task. The entry reflects the current schema, not the schema at whatever time the table was actually created — we don't know that.

*ption B (explicit migration).*Ship a kdbctl catalog migrate-legacy-history admin command. Idempotent; does nothing if history entries already exist.

*ecision: Option A.*Lazy backfill minimizes operator work and the semantic loss ("pre-history" entry) is explicit. The background task is cheap (one KV write per legacy table) and runs with the <system> principal.

10. Rollback

If a migration is reverted (schema goes from B back to A), the history log does *ot*rewrite. A new entry is appended with the same fingerprint as an earlier entry, meaning the log may contain [1:fp=A, 2:fp=B, 3:fp=A]. This is intentional:

The log is append-only; overwriting entries would violate §5.
Rollbacks are operationally meaningful and deserve to be
visible in history.
Clients that care about "has this fingerprint ever been
stamped?" can scan for equality against any historical entry, not just the latest.

A future kdbctl catalog history may mark entries whose fingerprint duplicates an earlier one with a (rollback from gen N) annotation; that is cosmetic and can ship after v1.

11. Storage impact

Per-entry size estimate:

8 bytes generation + 8 bytes fingerprint + 8 bytes timestamp
~64 bytes principal string (typical)
~500 bytes canonical schema snapshot (typical; varies with
column count)
Keyspace overhead (~40 bytes per KV pair in TiKV)

Total: *650 bytes per entry*in the worst-case typical case.

At the RFC-001 §5 target scale of *00M tenants × 10 tables × 10 migrationsyear* steady-state footprint is *6.5 GByear cluster-wide* Negligible vs. the row data footprint it shares a substrate with.

This sits comfortably inside the RFC-001 §6.8 cardinality budget because the log is not a metric — it consumes KV storage, not Prometheus cardinality.

12. Implementation plan

Phase 1 — record layer (1 week)

Add FingerprintHistoryEntry to kdb-record/src/schema.rs
alongside Schema::fingerprint.
Extend Catalog::ensure_table per §5 to write a history
entry inside the transaction.
Add `Catalog::getfingerprinthistory(tenant, table, since,
limit, desc) returning a VecFingerprintHistoryEntry`.
Unit tests in kdb-record covering: first-time creation
writes gen1, schema change writes gen2, idempotent re-ensure writes nothing, rollback writes a duplicate-fingerprint entry.

Phase 2 — wire + gateway (3 days)

Add GetFingerprintHistory to kdb.v1.catalog.proto.
Implement the streaming handler in
kdb-gateway/src/catalog_service.rs, wiring the 1024-entry hard cap, scope-based auth and since_generation paging.
Integration test in kdb-gateway/tests/ using the existing
with_in_process_catalog helper pattern.

Phase 3 — kdb-cli (3 days)

Add CatalogClientHandle::get_fingerprint_history in
kdb-cli/src/lib.rs returning Vec<FingerprintHistoryEntry>.
Add kdbctl catalog history subcommand with --since,
--limit, --desc, --json, --tenant, --table.
Integration test in kdb-cli/tests/ that seeds multiple
schema versions and asserts the history comes back in order.

Phase 4 — legacy backfill + docs (2 days)

Background task in the gateway that walks the tenant/table
catalog and synthesizes <pre-history> entries for any table without history.
Extend docs/technical/schema-fingerprints.md with a
"History" section cross-linking this RFC.
Update the FingerprintLookup struct in kdb-cli to carry
an optional generation field so kdbctl catalog fingerprint --verbose can show the current generation too.

*otal* ~2–3 weeks of focused work. Splittable across phases.

13. Open questions

*hould --since accept a fingerprint value*as an
alternative to a generation or duration? It is convenient for "show me everything after the fingerprint my client had pinned". Proposed: yes, add as a third shape in the CLI only (server API stays numeric).
*o we expose the snapshot bytes in the CLI by default?*
They are large and hex-dumping them is useless. Proposed: suppress by default, expose behind --with-snapshot.
*s the migrated_by field always derivable?*Calls that
arrive via the legacy HMAC gateway auth (kdb-gateway/src/auth.rs) only carry tenant_id and role, not a principal identity. Proposed: record hmac:tenant=<id> as the principal for those calls and wait for Phase 2.2.d (see RFC-002 §5) to provide real identities via Koder ID JWT claims.

14. References

docs/rfcs/RFC-001-kdb-next-hyperscale-architecture.md — §6.6
auth + rate limiting; §6.8 cardinality budget.
docs/technical/schema-fingerprints.md — public surface of the
current-fingerprint API (backlog #045).
docs/technical/auth.md — authentication layers (backlog #048).
backlog/pending/051-rfc-schema-fingerprint-history.kmd — the
parent ticket for this RFC.
crates/kdb-record/src/schema.rs — where the
FingerprintHistoryEntry type will live.
crates/kdb-cli/src/lib.rs — where the CLI-side client will
live.

15. Decision log

Date	Decision	Notes
20260410	Drafted	Claude scribe, awaiting Rodrigo review
20260414	Accepted	No structural changes from Draft. Option A (lazy backfill) confirmed. Implementation sub-tickets opened: #124–#127.