Id RFC 010 cutover runbook

RFC-010 — Production Cutover Runbook (Zitadel → Koder ID v2)

  • *tatus:**CTIVE*(unsuspended 20260412 — Phase 5 PASSED)
  • *ate:*20260409 (original), 20260412 (unsuspended)
  • *uthor:*Koder Team
  • *epends on:*RFC009 (migration strategy), RFC001 Phase 5 (✅ DONE)
  • *everses:*No (additive — RFC-009 stays as the strategic doc)

*œ… UNSUSPENSION NOTICE (20260412)*

RFC-001 Phase 5 hyperscale validation is *ONE* All §5 v1.0 numerical targets met (read p99 = 4.97ms ≤ 10ms, write p99 = 26.48ms ≤ 30ms, 24h soak clean exit). Ticket 038 (scan cross- tenant leak) resolved. See platform/kdb/next/PHASE-5-RESULTS.md.

This runbook is now executable. Key changes from the original:

  • *ersistence* koderid v2 uses `kdbnext` (Rust, TiKV-backed)

    instead of koder-kdb 1.x (Go, SQLite). The kdb-next cluster runs on *XC 200 on s.r1*(10.0.200.10:18090, TLS+auth). koder-id v2 on s.k.lin connects via gRPC over the network.

  • *o local KDB* the KDB :7900 line in the architecture

    diagram is replaced by kdb-next gateway :18090 on LXC 200.

  • *atalog counter* bumped to 1,000,001 to skip Phase 2.5

    preseed range (ticket 038). All new tenants allocate above 1M.

  • The 62 OIDC apps, 1 org, 1 human user, 23MB DB — all confirmed

    unchanged since the pre-flight inventory.

~riginal suspension notice removed.~

Purpose

RFC-009 defined the *trategy*to migrate from Zitadel to Koder ID v2. This RFC turns that strategy into an *perational runbook*that can be executed stepbystep on the real s.k.lin server, with concrete commands, exact file paths, decision points and a rollback plan.

This document *ust be approved*before any production change is made. Read it endtoend and challenge anything that looks risky.

Scope

In scope:

  • Deploying koder-id v2 binaries to s.k.lin
  • Migrating users / OAuth clients / applications from Zitadel
  • Cutover of id.koder.dev traffic from Zitadel to koder-id v2
  • Rollback if cutover fails
  • Validation of integrated products (Koder Flow at minimum)

Out of scope (separate tickets):

  • Migration of audit logs / event history (Zitadel uses event sourcing)
  • WebAuthn device migration (low usage today)
  • Custom branding / theming (admin SPA already exists)
  • Multiregion HA (singleserver install on s.k.lin)

Preflight inventory results (executed 202604-09)

The read-only inventory was run on s.k.lin. Findings below; the commands that produced them are at the end of this section.

What's actually running

🚨 *ritical discovery* the system currently serving id.koder.dev is *OT*Zitadel under its own name — it is *itadel rebranded as "koder-id"* The CLI literally announces itself as "The KoderID CLI", its subcommands match Zitadel exactly (init, setup, start, mirror, keys), and the YAML config schema is verbatim Zitadel (Database:, DefaultInstance:, LoginPolicy:, SystemAPIUsers:). Throughout this RFC we now refer to it as *koder-id v1"*to distinguish it from the new from-scratch implementation we are deploying (*oder-id v2*.

Both v1 and v2 use the same names. The cutover *ust*rename v2 artifacts to avoid stomping v1 (see "Naming collision" below).

Real data inventory (from koderid PostgreSQL DB)

Resource Count Notes
Organizations ** "Koder" only — no Crescer/Vivver in v1
Human users ** root (admin) — root@koder-id.id.koder.dev
Machine users ** platform-admin@koder-id.id.koder.dev (PAT, valid until 20271231)
Projects 5 ZITADEL, Koder AI, Koder Cloud, Koder Platform, Koder Bull
*IDC apps* *2* 🎯 the real migration scope
API apps 3 machinetomachine
SAML apps 0 none configured
PATs 1 the platform-admin one above
*B total size* *3 MB* trivial backup

The 62 OIDC apps cover every Koder product (Koder Bull, AI Chat, AI Desktop, Browser Web, Keyboard, Kterm, KMD, Shell, Postal, KDB, Jet, MQ, DNS, Gate, CI, Box, Fleet, Mesh, Edge, Cron, ...). Each one has its own client_id stored as metadata in the integrated product, so *lient_ids must be preserved*during the migration — rotating them would break every product.

Database location

  • *ostgreSQL 18*running natively (NOT Docker), shared cluster

    postgresql@18-main, listening on port *433*(NOT the default 5432 — which is ccorp-postgres in Docker, unrelated)

  • The same cluster hosts other DBs: koderbull, kodersign,

    kodertalk, ticsign. *e must not stop the cluster*— only back up + read from the koderid database

  • Credentials (from /etc/koder-id/config.yaml):
    • app: koderid:KoderId@2026@127.0.0.1:5433/koderid
    • admin: postgres:Postgres@2026
  • Zitadel schema version (relevant tables):
    • projections.users14, users14_humans, users14_machines
    • projections.apps7, apps7_oidc_configs, apps7_api_configs,

      apps7_saml_configs

    • projections.orgs1, projections.projects4
    • projections.personal_access_tokens3

koder-jet routing (current state)

[[sites]]
domains = ["id.koder.dev"]
root = "/var/www/id.koder.dev-site"
proxy = "http://localhost:8090"
gzip = true

localhost:8090 is the koder-id v1 binary. The static root directory holds a landing page (index.html 21 KB) and the Zitadel LoginV2 SPA (app/index.html). Both can stay asis — koderjet serves static files first, falls through to proxy for unmatched paths.

Disk

  • / is *4% used*(770 GB / 964 GB)
  • /var/lib/postgresql = *04 GB*— but that's all the OTHER

    databases (koderbull etc), not koderid

  • The koderid DB itself is *3 MB*
  • Backup destination: must NOT be local-only, push to Google Drive

    per chosen strategy (Q2 = b)

Naming collision (v1 vs v2)

Everything from v1 uses the bare koder-id namespace. Installing the v2 deb as-is would overwrite v1 immediately:

v1 (current) v2 (planned) — must rename
/usr/local/bin/koder-id /usr/local/bin/koder-id-v2-* (one binary per service)
koder-id.service koder-id-v2-gateway.service, etc
/etc/koder-id/ /etc/koder-id-v2/
/var/lib/koder-id/ /var/lib/koder-id-v2/
Linux user koder-id koder-id-v2

The Phase 1 deb build below is updated to use the -v2 suffix on *ll*filesystem artifacts. After successful cutover (Phase 7 decommission) the v2 install can optionally be renamed back to bare koder-id in a follow-up ticket.

Inventory commands (for reproducibility)

# Reach the server (per CLAUDE.md mnemonic s.k.lin)
ssh -p 220 rodrigo@10.66.66.1

# 1. Confirm Zitadel is the IdP serving id.koder.dev
sudo ss -tlnp | grep -E ':(8090|8091|8092)'
sudo systemctl status zitadel || sudo systemctl list-units --type=service | grep -i zitadel
ps -ef | grep -i zitadel | grep -v grep

# 2. Find the Zitadel binary and version
which zitadel 2>/dev/null
zitadel --version 2>/dev/null
sudo ls -lah /usr/local/bin/zitadel /opt/zitadel

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/id-RFC-010-cutover-runbook.md