Full Headless Testability — every Koder UI is automatable/observable/reproducible without a display

draft

stackRFC005 — Full Headless Testability

*Every Koder UI surface — Flutter app, GTK/Adwaita binary, Wayland client, koder-x compositor, TUI, web app, installer-gui, initramfs UI — ships with a complete headless execution path: programmatic input + deterministic observation + golden capture, running the SAME binary the user runs. "Works on my screen" stops being a valid promotetoprod state.*

Abstract

The Koder Stack today has a mixed-state testability surface. Web properties have headless-self-check.kmd (Chrome+screenshot pattern, working). Flutter apps have flutter_test + golden files (mostly adopted, no Stack-wide enforcement). CLIs are inherently headless. *ayland desktop components — koder-x compositor, kolide-shell paneldocklauncher/overview, koder-linux-installer-gui Adwaita flow, koder-kterm — have no headless testability infrastructure at all.*The distro#034.2 VM smoke matrix attempted on 20260523 surfaced this gap concretely: koder-x segfaults under WLR_RENDERER=pixman in QEMU (infra/linux/x #032), blocking ALL VM-based UI testing of the Koder Linux desktop stack.

This RFC ratifies a 3-layer SDK architecture, fidelity contract, and .kdt declarative dialog DSL so every Koder UI component can opt into headless testing through a shared substrate instead of each module reinventing it. The goal is the same observability / input contract across koder_kit (Flutter), koder_kit_gtk (GTK4), koder_kit_wayland (native wlroots clients), koder_kit_compositor (testing koder-x itself), koder_kit_tui (Bubble Tea), and koder_web_test (existing Chrome path, formalized).

Status

draft — drafted as stub~~RFC during 2026~~05-23 session; awaiting owner ratification + pilot-component selection. *mplementation deferred to a dedicated session.*This RFC is the architectural contract; concrete code lives in follow-up tickets per §9 Sequencing.

policies/headless-first.kmd already enforces R1-R9 behavioral rules. Without this RFC ratified + the SDK primitives shipped, the policy gate (R9 release block at <80% coverage) is soft-warn during Stack acceleration. Hard-block transition gated on RFC ratification + pilot success (§9.5).

Motivation — what's broken today

Surface	Headless status today	Gap
Web (landings, Hub)	✅ `headless-self-check.kmd` enforces Chrome+screenshot pattern	Coverage informal — no Stack-wide gate
Flutter (Kruze, Drive, Calc, Snap, Pixel, Grid, Kortex, Talk, Eye, Koru, Backup, …)	✅ `flutter_test` works headless; many goldens already shipped	No Stack-wide standardization on golden tolerance, FakeClock injection, KoderApp fixtures
CLIs (cobra Go binaries)	✅ inherently headless	Trivial
TUI (Bubble Tea)	⚠️ partial — PTY pattern works, no DSL or golden infra	Snapshot testing ad~~hoc per~~module
Native GTK/Adwaita (installer~~gui, kolide~~shell, koder-kterm)	❌ ZERO infrastructure	Requires Wayland or Xwayland; pixel capture is per-widget art
Native Wayland clients (kolide~~shell layer~~shell surfaces)	❌ ZERO	wlroots-client direct binding tests exist nowhere
oder-x compositor itself	❌ ZERO	Multi~~output sim, layer~~shell coverage, screencopy round~~trip — all hand~~rolled
Native Android (Koder Keyboard via InputMethodService)	❌ ZERO	Kotlin native paths NOT covered by Flutter SDK; AndroidX Test/Espresso wiring missing
TV (Samsung Tizen, LG WebOS)	❌ ZERO	JS/React apps run on TV emulators; no Stack-wide test runner
Initramfs / cryptroot / GRUB UI	❌ pre~~OS, hand~~driven QEMU only	Boot-phase visibility = whatever serial console shows
Koder Linux ISO endtoend (boot → install → login)	❌ no automation harness; `distro#034.2` matrix relies on humanatVNC	OpenQA-style infra missing

The first cliff (koder-x desktop stack) is what triggered this RFC. The cascade behind it: every future Koder Linux release will need full VM regression testing, every cross-app interaction (window~~list IPC, drag~~drop, kolide-shell focus changes, dock indicators) needs reproducible coverage. Without a shared substrate, this becomes 30+ ad-hoc test harnesses, drifting independently.

Non-goals

*eplacing existing per-component test runners*(flutter test,
pytest, go test, meson test). Those stay as the test invocation surface. This RFC adds the substrate the tests use for inputobservationgolden, not a new runner.
*PU acceleration.*Headless tests run on software renderer
(Skia software for Flutter, wlr-pixman for wlroots, OffscreenSurface for GTK). GPU paths are validated separately on bare metal.
*erformance benchmarks.*policies/perf-baseline.md covers
that. Headless tests measure correctness + determinism, not speed.
*eplacing OpenQA / autopkgtest wholesale*for ISO testing. The
koder-test-iso orchestrator in Layer 3 borrows OpenQA's VNC-script pattern but reuses the Layer 12 primitives so the same input observation APIs work both inside~~the~~app and at~~the~~VM level.

Architecture

The substrate is *hree layers* top-down from generic to stack-specific:

Layer 1 — Primitives (in `engines/sdk/`)

Pure C / Go bindings, no GUI framework dependency. Each is consumed by Layer 2 bindings to compose stack-specific test helpers.

Package	Purpose	Language
`engines/sdk/koder_test_compositor`	Headless `koder-x` instance via `WLR_BACKENDS=headless` + offscreen wlshm output. Exposes IPC for `captureframe `/` injectpointer `/` injectkey `/` addoutput `/` simulatehotplug`.	C
`engines/sdk/koder_test_screencap`	Single API `capture(target, format) -> image`. Implementations: Wayland `wlr_screencopy_v1` capture, Flutter `OffscreenSurface.toImage()`, GTK `GtkWidgetPaintable → GdkTexture → PNG`, Chrome CDP `Page.captureScreenshot`, QEMU `screendump` via monitor socket, serial-ANSI parser for TTY.	C + per-stack bindings
`engines/sdk/koder_test_input`	Single API `inject({pointer, key, touch, gesture, text}, target)`. Implementations: `wlr-virtual-pointer-v1` + `wlr-virtual-keyboard-v1` for Wayland, Flutter `WidgetTester` calls, GTK `gtk_test_widget_send_key/click`, Chrome CDP `Input.dispatchKeyEvent`, VNC RFB `PointerEvent/KeyEvent`.	C + per-stack bindings
`engines/sdk/koder_test_clock`	FakeClock — programmable time. `now()`, `advance(duration)`, `register_timer(fn, when)`. Replaces `clock_gettime(CLOCK_MONOTONIC)` in test mode. Wired into wlroots' animation tick, Flutter's `Ticker`, GTK's `g_timeout_add`.	C
`engines/sdk/koder_test_rng`	Seeded RNG. `seed(u64)`, `next()`. Replaces `getrandom` / `secrets.token_bytes` in test mode for deterministic BIP-39 generation, UUID minting, etc.	C
`engines/sdk/koder_test_state`	State~~tree query API. `query(selector) -> json`. Implementations: Flutter widget tree introspection, GTK accessibility tree, Wayland wlr~~scene-node walk, DOM via CDP.	per-stack

Layer 2 — Per-stack bindings (in each existing SDK)

Idiomatic helpers in the target language, building on Layer 1 primitives. Each test in a component consumes its language's binding.

Binding	Built on	Language	Where it lives	OS coverage
`koder_kit/test` (Dart)	Flutter `flutter_test` + `integration_test` + L1 `koder_test_clock` + `koder_test_rng`	Dart	`engines/sdk/koder_kit/lib/test/`	ndroid, iOS, Linux, macOS, Windows, Flutter web— Flutter abstracts platform; same test runs everywhere with `--device <target>`
`koder_kit_android/test` (Kotlin)	AndroidX Test + Espresso + L1 primitives + Android InstrumentationRegistry	Kotlin	`engines/sdk/koder_kit_android/test/` (new)	ndroid native— for components NOT on Flutter (ex.: Koder Keyboard InputMethodService). Tests run via `gradle connectedAndroidTest` on emulator/device
`koder_kit_gtk/test` (C)	GTK4 widget testing + L1 `koder_test_screencap` (GTK impl) + `koder_test_input` (GTK impl)	C	`engines/sdk/koder_kit_gtk/test/` (new)	inux(GTK/Adwaita native — installer~~gui, kolide~~shell, kterm)
`koder_kit_wayland/test` (C)	Native wlroots-client test helpers + L1 `koder_test_compositor` IPC	C	`engines/sdk/koder_kit_wayland/test/` (new)	inux(native Wayland clients)
`koder_kit_compositor/test` (C)	For testing `koder-x` ITSELF — multi~~output simulation, layer~~shell coverage, screencopy round-trip	C	`engines/sdk/koder_kit_compositor/test/` (new)	inux(koder-x compositor)
`koder_kit_tv/test` (JS)	Tizen Studio CLI + WebOS WTA + Playwright headless + L1 primitives via WebSocket bridge	JS/TypeScript	`engines/sdk/koder_kit_tv/test/` (new)	amsung TVs (Tizen)+ G TVs (WebOS)— tests run on simulator/emulator OR real TV via WiFi developer mode
`koder_kit_tui/test` (Go)	Bubble Tea + PTY snapshots + ANSI assertion DSL + L1 `koder_test_clock`	Go	`engines/sdk/koder_kit_tui/test/` (new)	inux, macOS, Windows, BSD— any platform with PTY
`koder_web_test` (Go/JS)	Chrome headless + CDP via chromedp; formalizes existing `headless-self-check.kmd` pattern	Go	`engines/sdk/koder_web_test/` (new — extract from current ad-hoc scripts)	ny browser-capable(LinuxmacOSWindowsAndroidiOS/Chrome OS)

Layer 3 — Orchestration (in `dev/koder-tools/`)

Stack-level test runners that compose Layer 1+2 primitives across component boundaries. These are the binaries CI / /k-test / /k-housekeep invoke.

Binary	Purpose
`koder-test-iso`	Boot a desktop/server ISO in QEMU headless + drive via VNC input injection (Layer 1 `koder_test_input` RFB impl) + capture frames via QEMU `screendump` (Layer 1 `koder_test_screencap` QEMU impl) + assert against golden corpus. Inspired by OpenQA.
`koder-test-stack`	Spin up `kolide-services` + `kolide-shell` + N apps in a single headless `koder_test_compositor` instance, exercise full~~DE behaviors (window~~list IPC, screenshot wrapper, dock indicators, hot-corner overview), assert via IPC observation + screencap.
`koder-test-dialog`	`.kdt` (Koder Dialog Test) DSL runner. Declarative format: input events + observation assertions. See `§5 .kdt DSL`.

/k-test is updated to discover headless tests via the tests/headless/ convention path (parallel to tests/regression/ from policies/regression-tests.kmd).

Fidelity contract — formal definition

The defining property of headless-first testing: * headless run produces the same user~~observable behavior as a display~~attached user run* modulo deterministic substitutions at I/O edges.

Formally, given:

prog — the binary (Flutter app, GTK binary, koder-x, etc.)
inputs_user — user input sequence (clicks, keys, touches)
inputs_headless — programmatic injection through Layer 1 koder_test_input
outputs_user — pixel buffer + state observable by user
outputs_headless — pixel buffer + state captured via Layer 1 koder_test_screencap + koder_test_state

The fidelity contract guarantees:

if inputs_user ≡ inputs_headless (semantically equivalent event sequence)
   AND
   FakeClock.tick_schedule(headless) ≡ wall_clock_tick_observed(user)
   AND
   seeded_rng(headless) == observed_rng_outputs(user)
then
   outputs_user ≡ outputs_headless (pixel-equal modulo tolerance ≤ 0.001, state-tree-equal exact)

The substitutions allowed (FakeClock, seeded RNG, mocked Net/FS at boundaries) MUST NOT divert the code path. The same app.main() runs in both cases; only the libc-edge symbols differ. R1 of policies/headless-first.kmd is the rule; this section is its math.

*nti-fidelity examples that this contract bans:*

❌ Test mode short-circuiting render_frame() with a mock that
returns a synthetic buffer. → Code path diverges; test passes, prod breaks.
❌ Replacing Adw.SwitchRow with TestSwitchRow in test fixture.
→ Widget tree differs; layout regressions invisible.
❌ Pumping frames with runUntilIdle() that uses wall-clock
timeouts. → Test flake under CI load; production race conditions hidden.

*ro-fidelity patterns this contract endorses:*

✅ FakeClock.advance(180.ms) advances Ticker exactly 11
vblanks at 60Hz — same frame schedule the user observes.
✅ wlr_screencopy_v1_capture(output) returns the same wl_shm
buffer wlroots would have flipped to the physical display.
✅ gtk_test_widget_click(btn) runs the same clicked signal
emission path as a real Gdk.EventButton press.

`.kdt` DSL — Koder Dialog Test

Declarative format for endtoend UI flows. Replaces ad-hoc imperative scripts. Compiled by koder-test-dialog to Layer 1+2 calls.

Grammar (informal, sufficient for stub)

test "<name>" {
  setup {
    <directive>...
  }
  steps {
    <step>...
  }
  teardown {
    <directive>...
  }
}

directive:
  boot iso "<path>" with karg "<string>" on qemu vm "<disk.qcow2>"
  launch app <slug> with env { KEY=VAL, ... }
  prepare fs <path> { ... }
  set clock to "<ISO-8601>"
  seed rng with <u64>

step:
  click <selector>
  type "<text>" into <selector>
  press <key>
  scroll <selector> by <pixels>
  drag <selector> to <selector>
  wait <selector> visible | <selector> matches "<text>" | frame matches golden "<file>" [tolerance <0.001>]
  capture frame as "<name>"
  assert visible <selector>
  assert frame_matches_golden "<file>" tolerance <0.001>
  assert state <selector> "<json-path>" equals <value>
  poweroff vm | reboot vm | wait reboot ready

selector:
  widget_id "<id>"
  widget_type "<type>"  (e.g. Adw.SwitchRow, Gtk.Button, Flutter Widget runtimeType)
  text "<contains>"
  role "<a11y-role>"
  dock_icon "<slug>"
  stack_page "<name>"
  window_title "<title>"

Example — `distro#034.2` VM A happy path

test "VM A — LUKS install happy path" {
  setup {
    boot iso "/root/temp/034.2-vm-matrix/koder-linux-desktop-amd64.hybrid.iso"
         with karg "KODER_INSTALLER_LUKS_BETA=1"
         on qemu vm "/root/temp/034.2-vm-matrix/vm-A-12g.qcow2"
    set clock to "2026-05-23T22:00:00Z"
    seed rng with 0xDEADBEEF
  }
  steps {
    wait frame matches golden "kolide-desktop-idle.png" tolerance 0.005
    click dock_icon "koder-linux-installer-gui"
    wait window_title "Instalar Koder Linux"
    assert visible widget_type "Adw.SwitchRow" text "Encrypt this installation with LUKS2"
    click widget_type "Adw.SwitchRow" text "Encrypt this installation with LUKS2"
    click widget_type "Gtk.Button" text "Instalar"
    wait stack_page "passphrase"
    type "MyPassphrase123" into widget_id "luks_pass_entry"
    type "MyPassphrase123" into widget_id "luks_pass_confirm"
    click widget_type "Gtk.Button" text "Continue"
    wait stack_page "recovery_key"
    capture frame as "recovery-key-page"
    assert frame_matches_golden "recovery-key-page-bip39-seed-deadbeef.png" tolerance 0.001
    click widget_id "luks_recovery_check"
    click widget_type "Gtk.Button" text "Install"
    wait frame matches golden "install-done.png" timeout 600s
    click widget_id "luks_reminder_gate_check"
    click widget_type "Gtk.Button" text "Reiniciar"
    wait reboot ready timeout 60s
    # Cryptroot prompt phase — serial console
    wait serial_matches "Please unlock disk koder-root" timeout 30s
    serial type "MyPassphrase123"
    wait serial_matches "login:" timeout 30s
  }
}

koder-test-dialog vm-A-happy-path.kdt boots the VM, drives it, checks goldens, reports PASS / FAIL with frame diffs for any mismatch. Same .kdt file is the spec, the test, and the human- readable runbook — sole source of truth for "what VM A's happy path looks like."

Golden file convention

Path: <componente>/tests/headless/golden/<test-name>/<step-name>.png
Format: PNG (preferred for transparency + lossless) or WebP
(smaller, OK for opaque)
Metadata stripped (no creation time, software tag, etc.) so file
is fully deterministic given input
Pixel comparison: SSIM-based by default with tolerance ≤ 0.001 for
layouts; exact (tolerance = 0.0) for font rendering (Inter + JetBrains Mono are self-hosted per specs/fonts/typography.kmd, must render bit-identical)
Update flag: --update-golden only via explicit release-engineering
invocation; commit message MUST justify ("changed because spec X §Y ratified Z direction")
Diff visualizer: koder-golden-diff <expected> <actual> produces
sidebyside PNG with red-highlighted regions
Storage: tracked in git LFS for >50KB files (Flutter golden
default; GTK widget captures usually fit in plain git)

Determinism enforcement

The full corpus of non-determinism sources must be eliminated from test~~mode code paths. Audit script `headless~~first-audit.sh` (part of policy headless-first.kmd's audit block) statically detects:

Direct clock_gettime / gettimeofday / time(NULL) calls in
paths consumed by Layer 2 bindings → suggest FakeClock binding
Direct getrandom / arc4random / Dart Random() without seed →
suggest seeded RNG
Threading without explicit join/sync barriers before capture →
suggest pump_frames(N) before observation
Animation that uses derived~~wall~~clock rather than the Ticker
abstraction → flag for review

100~~iteration loop test (`koder~~test~~loop -~~count 100 test`) is the runtime canary: any non~~determinism manifests as non~~identical captures across iterations.

Pilot path + sequencing

Stack~~wide adoption is *raph~~shaped*(depends on Layer 1 / Layer 2 / per-component readiness). Pilots in priority order:

§9.1 Pilot 1 — `koder-x` headless backend + `koder_kit_wayland/test`

Unblocks distro#034.2. Concrete tickets:

infra/linux/x #032 (already open today) — fix or sidestep
WLR_RENDERER=pixman crash. Either patch wlroots-pixman init path OR make kolide-session heuristic prefer DRM gles2 when /dev/dri/card0 exists (infra/linux/kolide #037, also open).
New ticket infra/linux/x #033 — add WLR_BACKENDS=headless
build target to koder-x. Expose Layer 1 koder_test_compositor IPC primitives (capture / inject / output add).
New ticket engines/sdk/koder_kit_wayland #001 — bootstrap
package, expose Layer 2 helpers (launch_test_compositor, attach_test_client, golden-capture wrappers).
*cceptance:*rerun distro#034.2 VM matrix successfully endtoend
(5 VMs, A-E, all pass) using .kdt scripts.

Effort: ~2-3 weeks.

§9.2 Pilot 2 — GTKAdwaita headless binding (`koderkitgtktest`)

Unblocks installer~~gui, kolide~~shell, koder-kterm tests beyond what Pilot 1 gives at the compositor level.

New ticket engines/sdk/koder_kit_gtk #001 — bootstrap. Wrap
gtk_test_widget_* + GtkWidgetPaintable capture into idiomatic C helpers. Pair with FakeClock binding to g_timeout_add.
Adoption: distro#048.2 (LUKS UI golden corpus) + kolide-shell
panel/dock smoke tests.

Effort: ~2-3 weeks.

§9.3 Pilot 3 — `.kdt` DSL runner

New ticket dev/koder-tools #NNN — bootstrap koder-test-dialog.
Parser + step dispatcher + golden compare + diff visualizer.
First consumer: distro#034.2 rewritten as 5 .kdt files.

Effort: ~3-4 weeks (DSL design is the longest pole).

§9.4 Pilot 4 — Flutter normalization (`koder_kit/test` extensions)

Most Flutter UIs already have widget tests + goldens. This pilot formalizes (cross-platform — AndroidiOSLinuxmacOSWindows/web):

Common KoderApp fixture: signed-in state, FakeClock, seeded
identity.
Stack-wide golden tolerance (0.001 default).
koder_test_clock+koder_test_rng injection into Provider /
Riverpod scope at test setup.
Per-target headless harness:
- *ndroid/iOS*— flutter test integration_test em
  emulator/simulator (sem display físico — software renderer).
- *inuxmacOSWindows*— flutter test headless via Skia
  software path.
- *eb*— flutter test --platform chrome --headless (CDP).

Effort: ~1-2 weeks (incremental on existing infra).

§9.4b Pilot 4b — Native Android (`koder_kit_android/test`)

Componentes Android nativos NÃO Flutter (Koder Keyboard via InputMethodService, futuras integrações nativas) precisam binding próprio.

New ticket — bootstrap engines/sdk/koder_kit_android/test/ com
AndroidX Test + Espresso + L1 primitives + InstrumentationRegistry.
First consumer: infra/linux/keyboard (Koder Keyboard, módulo
scaffold-only hoje) ao implementar a primeira ticket de teclado (#001 multilingual layouts).

Effort: ~2 weeks (AndroidX Test infra é madura; trabalho é o wiring com Layer 1).

§9.4c Pilot 4c — TV (Tizen + WebOS) (`koder_kit_tv/test`)

Aplicações TV (Tizen + WebOS) usam JS/React. Tools existentes (Tizen Studio CLI, WebOS Web Application Toolkit, Playwright headless) compõem.

New ticket — bootstrap engines/sdk/koder_kit_tv/test/:
- Playwright headless dirige o app rodando no Tizen/WebOS
  simulator OU em TV real via WiFi developer mode.
- L1 primitives expostos via WebSocket bridge (TV-side é JS, host
  é Go).
- Golden capture via Playwright screenshot.
- Input injection via Playwright + remote-control key codes
  (TV-specific: arrow keys, back, home, enter — não mouse/touch).
First consumer: TBD quando produto TV materializar (atualmente
Koder Stack não tem produto TV-only shippado).

Effort: ~2 weeks (gated on owner approval — pode ser deferido até ter consumidor real).

§9.5 Pilot 5 — `koder-test-iso` orchestrator (full Koder Linux E2E)

The OpenQA-style binary. Boot ISO → drive via VNC → assert goldens. Consumer of Layer 1 (koder_test_screencap QEMU-screendump impl, koder_test_input VNC-RFB impl) + Layer 3 (.kdt runner).

New ticket infra/linux/distro #N — first ISO matrix written in
.kdt. Replaces the human-driven #034.2 matrix.

Effort: ~2 weeks once Pilots 1+3 are in.

§9.6 Stack-wide rollout

After 5 pilots ratified, policies/headless-first.kmd R9 release gate transitions from soft~~warn (PR comment) to hard~~block (release refuses). Registry registries/headless-coverage.md per-componente sentinel. Components below 80% coverage at that transition window get release~~block + PR~~of-remediation tickets opened automatically by /k-housekeep.

Effort: ~1-2 weeks of cleanup tooling + registry bootstrapping.

Cost summary

Sequential single~~IA cadence: ~12~~16 weeks total across §9.1-§9.6. With 2-3 IAs in parallel (Pilot 1 + Pilot 3 + Pilot 4 are independent), squeezable to ~6-8 weeks calendar. The biggest unknown is .kdt DSL design — could land in 2 weeks if minimal, 4 weeks if it grows assertion-DSL features.

Open questions

*Q1.*Should .kdt be its own grammar, or a YAML/TOML DSL?
Pros of own grammar: human-readable, clean step semantics. Cons: parser cost. Recommendation: TOML-backed with .kdt extension signaling Koder-specific schema validation.
*Q2.*How do we handle GPU-dependent visual regressions? GL/
Vulkan paths can't run on software renderer with bit-exactness. Recommendation: soft-warn on those tests in headless mode; hard check on bare-metal CI (separate runner, scheduled nightly).
*Q3.*Multi-monitor / hotplug fuzz testing — in scope?
Recommendation: yes, but Pilot 6+. Layer 1 koder_test_compositor.add_output() already supports it.
*Q4.*Accessibility tree coverage as part of golden corpus?
Recommendation: yes, R3's state(selector) query API already returns a11y~~derived state. Make a11y assertions first~~class in .kdt DSL.
*Q5.*initramfs / cryptroot UI testing — does this RFC cover
it, or punt to a separate RFC? Recommendation: covered, via koder-test-iso's serial-console + screendump primitives. The fidelity contract holds because the boot path runs the same initramfs the user gets.

Risks

*1.*Layer 1 primitive divergence between languages (CDart
Go) → contract drift. Mitigation: protobuf-defined IPC for cross-language calls; canonical test suite in each language that consumes the same fixture.
*2.*Golden file churn from minor rendering changes (font
hinting, antialiasing). Mitigation: tolerance ≤ 0.001 SSIM is forgiving enough for most legitimate changes; exact-match only for self~~hosted~~font text rendering where bit-exact IS the spec.
*3.*Coverage gaming — components hit 80% R9 threshold with
shallow tests. Mitigation: /k-housekeep audit cross-references user-facing flows (from each component's spec triggers) against test coverage; tests with assert true or assert nothing get flagged.

Decision

[*ending owner ratification.* Recommended path: accept this RFC, open the 6 pilot tickets called out in §9, sequence Pilot 1 (unblocks distro#034.2) immediately, allow §9.6 to be the softtohard gate transition trigger.

References

policies/headless-first.kmd — the policy this RFC justifies.
policies/test-host-isolation.kmd — where headless tests run
(LXC/QEMU, never laptop).
policies/regression-tests.kmd — per~~bug~~fix regression test
mandate; composes naturally (a headless regression test for a UI bug IS a .kdt script).
rfcs/stack-RFC-003-koder-icons-shared-ui-library.kmd —
precedent for SDKassubstrate.
OpenQA (https:/pen.qa) — VNC-based ISO testing pattern that
inspired Layer 3 koder-test-iso.
Flutter integration_test package — Layer 2 inspiration for
koder_kit/test.
Sway test suite — wlroots compositor with headless backend
exemplar for koder-x Pilot 1.

Audit trail

20260523 — drafted as stub during the same session that
surfaced distro#034.2 blocker (owner + Claude). Policy headless-first.kmd ratified in same commit window. Pilot tickets to be opened by owner / next dedicated session.