Perf baseline
Performance baseline registry
Anchor numbers for performance-sensitive components in the Koder Stack. One row per component bench run. Add a new entry whenever a bench moves materially or a new backend lands; do not let ad-hoc CI noise rewrite existing rows.
Each entry must reference the exact bench binary + flags so the result is reproducible. The hardware string identifies the laptop or runner the sample was taken on (uname -a head + CPU model + RAM is sufficient). Numbers from different hardware are not directly comparable; use the same row only if you can re-run on the same machine.
Schema
| Column | Meaning |
|---|---|
date |
YYYY |
component |
<area>/<sector> of the binary under test. |
bench |
Bench binary path + relevant flags (sufficient to reproduce). |
dataset |
Workload size (corpus + query count, or whatever applies). |
metric |
p50, p95, p99, throughput, etc. |
value |
Measured number with unit. |
target |
RFC-defined target (or — if untracked). |
pass |
✓ if value meets target, ✗ if it regresses, — if untracked. |
hardware |
One-line hardware tag. |
notes |
Anything noteworthy (recent change, blockers, follow-ups). |
Entries
servicesaimemory — inline embedder + in-process MemoryStore
| date | component | bench | dataset | metric | value | target | pass | hardware | notes |
|---|---|---|---|---|---|---|---|---|---|
| 2026 |
servicesaimemory | make bench-full (go run ./bench --episodes=10000 --queries=1000 --top-k=10 --threshold=0.5) |
10 K episodes / 1 K queries | populate (ep/s) | 70 497 | — | — | koder-laptop (Linux 6.12) | First baseline; #010 partial. Bench reuses recall.Service, exercising the full Search + Put path. |
| 2026 |
servicesaimemory | (same) | (same) | p50 | 5.28 ms | < 50 ms | ✓ | koder-laptop | ~10× headroom at 10 K. Brute-force cosine scales O(N) per query — won't hold at 100 K (hence #007). |
| 2026 |
servicesaimemory | (same) | (same) | p95 | 7.87 ms | — | — | koder-laptop | |
| 2026 |
servicesaimemory | (same) | (same) | p99 | 10.45 ms | < 200 ms | ✓ | koder-laptop | ~20× headroom at 10 K. |
| 2026 |
servicesaimemory | (same) | (same) | throughput | 176.5 q/s | — | — | koder-laptop | Single-goroutine. Multiply by core count for replica capacity ceiling. |
engineslangkoda — kasm assembler (ET_REL mode)
| date | component | bench | dataset | metric | value | target | pass | hardware | notes |
|---|---|---|---|---|---|---|---|---|---|
| 2026 |
engineslangkoda | ./kasm -c /tmp/koder_stage1_src.asm /tmp/kasm_stage1.o |
koder stage1 bundle, 809 883 lines (~30 MB) | wall (line 1–170 K) | 5.8 s | ≤ 7× nasm -O0 |
— | koder-laptop (Linux 6.12) | Preengines/lang/koda#764. |
| 2026 |
engineslangkoda | (same) | (same) | wall (full bundle) | DNF (≥1800 s) | ≤ 90 s | ✗ | koder-laptop | Pre-cascade baseline. Killed by SIGKILL after >100 s wall on the chunk past line 170 000. Was blocking #748 criterion 3. |
| 2026 |
engineslangkoda | nasm -O0 -felf64 /tmp/koder_stage1_src.asm -o /tmp/nasm.o (reference) |
(same) | wall (full bundle) | ~13 s | — | — | koder-laptop | External reference assembler. kasm's target envelope (≤7× nasm) anchors to this number. |
| 2026 |
engineslangkoda | (same) | koder stage1 bundle, 817 K lines (post-cascade build) | wall (full bundle) | *7 s* | ≤ 90 s | ✓ | s.khost1.dev |
*ost criterion 3 PASS**. Cliff eliminated; rerun via VM per policiestest |
| 2026 |
engineslangkoda | make build-safe × 3 generations |
full koder + smoke roundtrip | wall × 3 (gen-123) | 11m51s / 12m16s / 11m51s | bootstrap stable | ✓ | s.khost1.dev |
*772 validation gate PASS*— gen87d99225… byte#750/#770/#771/#772/#773/#774). |
| 2026 |
engineslangkoda | make test (full regression suite) |
1747-line full output | wall + PASS/FAIL count | 35m25s · 260 P / 50 F |
suite green | ✗ (baseline) | s.khost1.dev |
Preapt install python3 + chmod fixes. Captured pre-#784. |
| 2026 |
engineslangkoda | make test (post Phase-1 env setup) |
1808-line full output | wall + PASS/FAIL count | 36m50s · 277 P / 33 F |
suite green | ✗ (Phase 1 done) | s.khost1.dev |
*#784` Phase 1 PASS*— 17 FAIL files cleared by python3 install + chmod. Remaining 33 FAILs scheduled for Phase 2 (Cat. B binaries) + Phase 3 (Cat. C substantive triage). |
| 2026 |
engineslangkoda | xvfb-run -a make test (post Wave 2a+2b) |
823-line full output | wall + PASS/FAIL count | ~58min · 278 P / 33 F |
suite green | ✗ (Phase 2 partial) | s.khost1.dev |
*#784 Wave 2a+2b SHIPPED** — bundling shim in scriptstest (resolves require for native tests) + koderuiserver` PATH symlink + apt install xvfb + libgtktest_ui_hello now PASS. NUIX11 tests start under xvfb but don't emit PASS (interactive GUI by design — Cat. C redesign work). |
enginescompresskzip — pluggable compressor backends at 1 GiB scale
Productionreadiness milestone (v0.2.0, Passo 4 of the roadmap). Validates whether the LZMA2 / Brotli backends scale predictably past the 100 MiB workloads captured at tool numbers (incl. FOSS reference comparators at 100 MiB) live in the module's bench file; this section anchors only the kzip-specific 1 GiB rows.engines/compress/kzip/bench/baseline.md. Full per
| date | component | bench | dataset | metric | value | target | pass | hardware | notes |
|---|---|---|---|---|---|---|---|---|---|
| 2026 |
enginescompresskzip | bash bench/bench.sh --workload random-1g --tools kzip-baseline,kzip-lzma2,kzip-brotli |
1024 MiB /dev/urandom |
compress (s, zstd:3) | 8.548 | — | — | koder-laptop (Linux 6.12) | kzip |
| 2026 |
enginescompresskzip | (same) | (same) | compress (s, lzma2:6) | 39.650 | — | — | koder-laptop | ~4.6× slower than zstd, RSS 2.2 GiB. Predictable LZMA cost; no ratio gain on /dev/urandom. |
| 2026 |
enginescompresskzip | (same) | (same) | compress (s, brotli:6) | 9.983 | — | — | koder-laptop | ~1.2× zstd time, RSS 733 MiB. No ratio gain on incompressible data; speed/RAM matches zstd. |
| 2026 |
enginescompresskzip | bash bench/bench.sh --workload text-1g --tools kzip-baseline,kzip-lzma2,kzip-brotli |
1024 MiB repeating text | output size (MiB) | 0.05 (zstd) | — | — | koder-laptop | kzip |
| 2026 |
enginescompresskzip | (same) | (same) | output size (MiB) | *.07 (lzma2)* | — | ✗ | koder-laptop | *ounter |
| 2026 |
enginescompresskzip | (same) | (same) | output size (MiB) | *.01 (brotli)* | — | — | koder-laptop | *rotli wins at 1 GiB*on repeating text: 5× smaller than zstd, 7× smaller than lzma2. Tail-serve advantage compounds. |
*pen followup (kzip):*rerun on a realworld mixedvocabulary corpus (wikitext, linuxsource dump) before claiming LZMA2 ratio in operatorfacing docs. The synthetic `text1g` workload is too uniform — every backend converges to >99.99% reduction; the differences are noise-level (~0.06 MiB on a 1 GiB input).
*esolved followup (20260519):*bench rerun on the kzip source tree (28 MiB realworld mixed Go+docs+tests corpus). Result: lzma2:6 = 23.29 MiB out (*.7% smaller*than zstd:3 = 23.46 MiB) at *.4× compress time + 3.8× RSS* Brotli:6 matches lzma2's ratio at 1.08× zstd's time. *FC001 "2030% better" LZMA2 claim falsified on real corpora at this scale.*Default zstd dominates the speedratio Pareto for typical kzip workloads (<1 GiB mixed corpora). Operator guidance update: reserve --compressor=lzma2 for multiGB coldarchival workloads with workload-specific AB test confirmation; docs/usage/compression.kmd to be updated accordingly (subfollowup).
| date | component | bench | dataset | metric | value | target | pass | hardware | notes |
|---|---|---|---|---|---|---|---|---|---|
| 2026 |
enginescompresskzip | bash bench/bench.sh --workload source --tools kzip-baseline,kzip-lzma2,kzip-brotli |
kzip source tree (28.08 MiB mixed) | ratio (lzma2 vs zstd) | *ˆ’0.7%* | −20 to −30% per RFC-001 | ✗ | koder-laptop (Linux 6.12) | LZMA2 falsified on real corpus. Output 23.29 vs 23.46 MiB. Compress time 2.4× zstd, RSS 3.8× zstd. |
| 2026 |
enginescompresskzip | (same) | (same) | ratio (brotli vs zstd) | *ˆ’0.9%* | — | — | koder-laptop | Brotli matches lzma2's ratio at 1.08× zstd's time. Better Pareto point than lzma2 on this corpus. |
Pending baselines
services/ai/memory— BGEmultilingual + kdbvector @ 100 K (RFC targetp50 < 100 ms, p99 < 500 ms). Blocked on
services/ai/memory#007(kdb vector store swap) and#008(BGE real embedder).services/ai/gateway— round-trip overhead for cached vs uncachedproxy (RFC
001 followup; bench harness not yet written).services/ai/cache— exact-match lookup p99 vs raw upstream (blockedon
#002skeleton).engines/lang/koda— kasm residual mid-range timing variance(240K-280K range). After full cascade closed (
#764,#748criterion 3 PASS at 47s post-#765), a P2 polish ticket#776tracks the non-deterministic spike at 280K (81 s in one run vs 16.5 s at 300K in same run — GCpressurenoise, not structural). Bench evidence table in the ticket; rerun in VM when full stage1 needs to drop below 30 s for CI/CD self-host loops.
infranetbox — kbox container runtime cold-start
Tracked per meta/docs/stack/registries/self-hosted-pairs.md row 13: performance:cold_start_benchmark is the open gate before kbox flips from stable → official versus DockerPodman. The bench below is a placeholder row — the actual run + numbers land alongside the cold-start benchmark harness (kbox `testsperfcold-start), opened as a follow-up ticket when an isolated benchmark VM (s.khost1) is ready per policies/testhostisolation.kmd`.
| date | component | bench | dataset | metric | value | target | pass | hardware | notes |
|---|---|---|---|---|---|---|---|---|---|
| 2026 |
infranetbox | kbox run --rm alpine:3.18 /bin/true × N (cold rootfs each) |
alpine:3.18, 1 layer, ~3.4 MiB | p50 wall | — | ≤ podman run p50 × 1.1 |
— | s.khost1.dev |
Placeholder. Bench harness opening as follow-up. Cliff: until this row populates, cold_start_benchmark gate of pair #13 stays open. |
| 2026 |
infranetbox | (same) | (same) | p99 wall | — | ≤ podman run p99 × 1.2 |
— | (same) | Same row as above; populated together. |
productsdevflow — public API latency (single-client home LAN baseline)
First Flow baseline. Captures TTFB + total-response timings against flow.koder.dev from the developer laptop. Numbers anchor public clientperceived latency for the Forgejofork (HTTP handlers + PAT auth + repo index). Server CPU time is not isolated here — the sample includes ISP RTT + TLS + EVEO edge. Re-run via products/dev/flow/engine/perf-baseline/bench.sh; full snapshot at products/dev/flow/engine/perf-baseline/2026-05-24.md.
| date | component | bench | dataset | metric | value | target | pass | hardware | notes |
|---|---|---|---|---|---|---|---|---|---|
| 2026 |
productsdevflow | engine/perf-baseline/bench.sh (ITER20, WARMUP3) |
5 scenarios × 20 iter | home p50 TTFB | 361.856 ms | — | — | koder |
FLOW-116 first baseline; LANDING_PAGE = login 303 redirect. Includes ISP RTT + TLS. |
| 2026 |
productsdevflow | (same) | (same) | user_login p50 TTFB | 382.839 ms | — | — | (same) | Custom signin.tmpl (RFC |
| 2026 |
productsdevflow | (same) | (same) | api_version p50 TTFB | 1613.977 ms | — | — | (same) | Surprisingly slow for a fixed JSON literal. PAT validation + TLS dominate; warrants server |
| 2026 |
productsdevflow | (same) | (same) | apirepossearch p50 total | 1743.073 ms | — | — | (same) | /api/v1/repos/search?limit=5 against the current corpus. p95 → p99 widens (2755 ms p99) — index cache misses + auth round-trip. |
| 2026 |
productsdevflow | (same) | (same) | apirepokoder p50 total | 248.528 ms | — | — | (same) | /api/v1/repos/Koder/koder hot-cache hit. p99 1011 ms reflects occasional misses. Fastest p50 in the suite. |
| 2026 |
tools/design-gen | og.BuildPerPage (pre-#118: exec.Command per page) |
1755 pages × NumCPU=16 | per-page OG wall | ~55 s | — | — | s.khost1 fonts-smoke Incus runner | Cold-start dominated. Each page spawned a Chrome process (~500ms init). Superseded by chromedp rewrite below. |
| 2026 |
tools/design-gen | og.BuildPerPage (post-#118: chromedp allocator + tabs) |
1755 pages × NumCPU=16 | per-page OG wall | ~6 s | < 15 s | — | s.khost1 fonts-smoke Incus runner | One long (opt-in via KDSOGPERF_TEST=1`). |
No regression targets defined yet — promote to gates after the OTel pipeline (FLOW103/106) can publish serverside timings split from transport, and kboxequivalent staging exists for an applesto-apples client.
How to add an entry
- Run the bench you intend to record. Capture the full command line.
- Append rows to the appropriate sub-section above. Keep one entry per
metric so trends per metric are easy to grep.
- If the entry replaces an older row (same component + same bench + same
hardware), leave the old row in place and add the new row beneath it so historical drift is visible. Optionally annotate the older row's
notescolumn with a→ superseded YYYY-MM-DDpointer. - Reference the entry from the bench's own README.md so future readers
land here automatically.