Perf baseline

Performance baseline registry

Anchor numbers for performance-sensitive components in the Koder Stack. One row per component bench run. Add a new entry whenever a bench moves materially or a new backend lands; do not let ad-hoc CI noise rewrite existing rows.

Each entry must reference the exact bench binary + flags so the result is reproducible. The hardware string identifies the laptop or runner the sample was taken on (uname -a head + CPU model + RAM is sufficient). Numbers from different hardware are not directly comparable; use the same row only if you can re-run on the same machine.

Schema

Column Meaning
date YYYYMMDD of the run (UTC).
component <area>/<sector> of the binary under test.
bench Bench binary path + relevant flags (sufficient to reproduce).
dataset Workload size (corpus + query count, or whatever applies).
metric p50, p95, p99, throughput, etc.
value Measured number with unit.
target RFC-defined target (or — if untracked).
pass ✓ if value meets target, ✗ if it regresses, — if untracked.
hardware One-line hardware tag.
notes Anything noteworthy (recent change, blockers, follow-ups).

Entries

servicesaimemory — inline embedder + in-process MemoryStore

date component bench dataset metric value target pass hardware notes
20260510 servicesaimemory make bench-full (go run ./bench --episodes=10000 --queries=1000 --top-k=10 --threshold=0.5) 10 K episodes / 1 K queries populate (ep/s) 70 497 — — koder-laptop (Linux 6.12) First baseline; #010 partial. Bench reuses recall.Service, exercising the full Search + Put path.
20260510 servicesaimemory (same) (same) p50 5.28 ms < 50 ms ✓ koder-laptop ~10× headroom at 10 K. Brute-force cosine scales O(N) per query — won't hold at 100 K (hence #007).
20260510 servicesaimemory (same) (same) p95 7.87 ms — — koder-laptop
20260510 servicesaimemory (same) (same) p99 10.45 ms < 200 ms ✓ koder-laptop ~20× headroom at 10 K.
20260510 servicesaimemory (same) (same) throughput 176.5 q/s — — koder-laptop Single-goroutine. Multiply by core count for replica capacity ceiling.

engineslangkoda — kasm assembler (ET_REL mode)

date component bench dataset metric value target pass hardware notes
20260516 engineslangkoda ./kasm -c /tmp/koder_stage1_src.asm /tmp/kasm_stage1.o koder stage1 bundle, 809 883 lines (~30 MB) wall (line 1–170 K) 5.8 s ≤ 7× nasm -O0 — koder-laptop (Linux 6.12) Precascade baseline. Steady chunk_ms 85–137 ms up to line 140 K then superlinear cliff (243→960→3176→>100 000 ms). Tracked by engines/lang/koda#764.
20260516 engineslangkoda (same) (same) wall (full bundle) DNF (≥1800 s) ≤ 90 s ✗ koder-laptop Pre-cascade baseline. Killed by SIGKILL after >100 s wall on the chunk past line 170 000. Was blocking #748 criterion 3.
20260516 engineslangkoda nasm -O0 -felf64 /tmp/koder_stage1_src.asm -o /tmp/nasm.o (reference) (same) wall (full bundle) ~13 s — — koder-laptop External reference assembler. kasm's target envelope (≤7× nasm) anchors to this number.
20260520 engineslangkoda (same) koder stage1 bundle, 817 K lines (post-cascade build) wall (full bundle) *7 s* ≤ 90 s ✓ s.khost1.devlinuxkoda (Linux 6.12) *ostcascade PASS*— RFC019 cascade (#770#773#774#771) + label intern (#765). 90 s target met by 43 s margin. *#748 criterion 3 PASS**. Cliff eliminated; rerun via VM per policiestesthostisolation.kmd`.
20260524 engineslangkoda make build-safe × 3 generations full koder + smoke roundtrip wall × 3 (gen-123) 11m51s / 12m16s / 11m51s bootstrap stable ✓ s.khost1.devlinuxkoda *772 validation gate PASS*— gen2 =gen3 md5 87d99225… byteidentical (fixed point) and matches historical 20260520 phase4a canonical. RFC-019 cascade FULLY CLOSED (#750/#770/#771/#772/#773/#774).
20260524 engineslangkoda make test (full regression suite) 1747-line full output wall + PASS/FAIL count 35m25s · 260 P / 50 Ffiles (127 Flines) suite green ✗ (baseline) s.khost1.devlinuxkoda PrePhase1 baseline — predates apt install python3 + chmod fixes. Captured pre-#784.
20260524 engineslangkoda make test (post Phase-1 env setup) 1808-line full output wall + PASS/FAIL count 36m50s · 277 P / 33 Ffiles (108 Flines) suite green ✗ (Phase 1 done) s.khost1.devlinuxkoda *#784` Phase 1 PASS*— 17 FAIL files cleared by python3 install + chmod. Remaining 33 FAILs scheduled for Phase 2 (Cat. B binaries) + Phase 3 (Cat. C substantive triage).
20260524 engineslangkoda xvfb-run -a make test (post Wave 2a+2b) 823-line full output wall + PASS/FAIL count ~58min · 278 P / 33 Ffiles (107 Flines) suite green ✗ (Phase 2 partial) s.khost1.devlinuxkoda *#784 Wave 2a+2b SHIPPED** — bundling shim in scriptstestnative.sh (resolves require for native tests) + koderuiserver` PATH symlink + apt install xvfb + libgtk4-1. Native test_ui_hello now PASS. NUIX11 tests start under xvfb but don't emit PASS (interactive GUI by design — Cat. C redesign work).

enginescompresskzip — pluggable compressor backends at 1 GiB scale

Productionreadiness milestone (v0.2.0, Passo 4 of the roadmap). Validates whether the LZMA2 / Brotli backends scale predictably past the 100 MiB workloads captured at engines/compress/kzip/bench/baseline.md. Full pertool numbers (incl. FOSS reference comparators at 100 MiB) live in the module's bench file; this section anchors only the kzip-specific 1 GiB rows.

date component bench dataset metric value target pass hardware notes
20260519 enginescompresskzip bash bench/bench.sh --workload random-1g --tools kzip-baseline,kzip-lzma2,kzip-brotli 1024 MiB /dev/urandom compress (s, zstd:3) 8.548 — — koder-laptop (Linux 6.12) kzipbaseline. Linear scaleup from 100 MiB (1.88s → 8.55s ≈ 4.5×). RSS 552 MiB.
20260519 enginescompresskzip (same) (same) compress (s, lzma2:6) 39.650 — — koder-laptop ~4.6× slower than zstd, RSS 2.2 GiB. Predictable LZMA cost; no ratio gain on /dev/urandom.
20260519 enginescompresskzip (same) (same) compress (s, brotli:6) 9.983 — — koder-laptop ~1.2× zstd time, RSS 733 MiB. No ratio gain on incompressible data; speed/RAM matches zstd.
20260519 enginescompresskzip bash bench/bench.sh --workload text-1g --tools kzip-baseline,kzip-lzma2,kzip-brotli 1024 MiB repeating text output size (MiB) 0.05 (zstd) — — koder-laptop kzipbaseline output 0.05 MiB on uniform repeatingseed text. zstd wins at this scale on synthetic-uniform data.
20260519 enginescompresskzip (same) (same) output size (MiB) *.07 (lzma2)* — ✗ koder-laptop *ounterRFC001:*lzma2 produces MORE bytes than zstd on this corpus. RFC's "20-30% better than zstd" claim NOT validated.
20260519 enginescompresskzip (same) (same) output size (MiB) *.01 (brotli)* — — koder-laptop *rotli wins at 1 GiB*on repeating text: 5× smaller than zstd, 7× smaller than lzma2. Tail-serve advantage compounds.

*pen followup (kzip):*rerun on a realworld mixedvocabulary corpus (wikitext, linuxsource dump) before claiming LZMA2 ratio in operatorfacing docs. The synthetic `text1g` workload is too uniform — every backend converges to >99.99% reduction; the differences are noise-level (~0.06 MiB on a 1 GiB input).

*esolved followup (20260519):*bench rerun on the kzip source tree (28 MiB realworld mixed Go+docs+tests corpus). Result: lzma2:6 = 23.29 MiB out (*.7% smaller*than zstd:3 = 23.46 MiB) at *.4× compress time + 3.8× RSS* Brotli:6 matches lzma2's ratio at 1.08× zstd's time. *FC001 "2030% better" LZMA2 claim falsified on real corpora at this scale.*Default zstd dominates the speedratio Pareto for typical kzip workloads (<1 GiB mixed corpora). Operator guidance update: reserve --compressor=lzma2 for multiGB coldarchival workloads with workload-specific AB test confirmation; docs/usage/compression.kmd to be updated accordingly (subfollowup).

date component bench dataset metric value target pass hardware notes
20260519 enginescompresskzip bash bench/bench.sh --workload source --tools kzip-baseline,kzip-lzma2,kzip-brotli kzip source tree (28.08 MiB mixed) ratio (lzma2 vs zstd) *ˆ’0.7%* −20 to −30% per RFC-001 ✗ koder-laptop (Linux 6.12) LZMA2 falsified on real corpus. Output 23.29 vs 23.46 MiB. Compress time 2.4× zstd, RSS 3.8× zstd.
20260519 enginescompresskzip (same) (same) ratio (brotli vs zstd) *ˆ’0.9%* — — koder-laptop Brotli matches lzma2's ratio at 1.08× zstd's time. Better Pareto point than lzma2 on this corpus.

Pending baselines

  • services/ai/memory — BGEmultilingual + kdbvector @ 100 K (RFC target

    p50 < 100 ms, p99 < 500 ms). Blocked on services/ai/memory#007 (kdb vector store swap) and #008 (BGE real embedder).

  • services/ai/gateway — round-trip overhead for cached vs uncached

    proxy (RFC001 followup; bench harness not yet written).

  • services/ai/cache — exact-match lookup p99 vs raw upstream (blocked

    on #002 skeleton).

  • engines/lang/koda — kasm residual mid-range timing variance

    (240K-280K range). After full cascade closed (#764, #748 criterion 3 PASS at 47s post-#765), a P2 polish ticket #776 tracks the non-deterministic spike at 280K (81 s in one run vs 16.5 s at 300K in same run — GCpressurenoise, not structural). Bench evidence table in the ticket; rerun in VM when full stage1 needs to drop below 30 s for CI/CD self-host loops.

infranetbox — kbox container runtime cold-start

Tracked per meta/docs/stack/registries/self-hosted-pairs.md row 13: performance:cold_start_benchmark is the open gate before kbox flips from stable → official versus DockerPodman. The bench below is a placeholder row — the actual run + numbers land alongside the cold-start benchmark harness (kbox `testsperfcold-start), opened as a follow-up ticket when an isolated benchmark VM (s.khost1) is ready per policies/testhostisolation.kmd`.

date component bench dataset metric value target pass hardware notes
20260520 infranetbox kbox run --rm alpine:3.18 /bin/true × N (cold rootfs each) alpine:3.18, 1 layer, ~3.4 MiB p50 wall — ≤ podman run p50 × 1.1 — s.khost1.devlinuxkoda Placeholder. Bench harness opening as follow-up. Cliff: until this row populates, cold_start_benchmark gate of pair #13 stays open.
20260520 infranetbox (same) (same) p99 wall — ≤ podman run p99 × 1.2 — (same) Same row as above; populated together.

productsdevflow — public API latency (single-client home LAN baseline)

First Flow baseline. Captures TTFB + total-response timings against flow.koder.dev from the developer laptop. Numbers anchor public clientperceived latency for the Forgejofork (HTTP handlers + PAT auth + repo index). Server CPU time is not isolated here — the sample includes ISP RTT + TLS + EVEO edge. Re-run via products/dev/flow/engine/perf-baseline/bench.sh; full snapshot at products/dev/flow/engine/perf-baseline/2026-05-24.md.

date component bench dataset metric value target pass hardware notes
20260524 productsdevflow engine/perf-baseline/bench.sh (ITER20, WARMUP3) 5 scenarios × 20 iter home p50 TTFB 361.856 ms — — koderlaptop (Linux 6.12 + home WiFi + BR ISP → flow.koder.dev) FLOW-116 first baseline; LANDING_PAGE = login 303 redirect. Includes ISP RTT + TLS.
20260524 productsdevflow (same) (same) user_login p50 TTFB 382.839 ms — — (same) Custom signin.tmpl (RFC006 §1; FLOW115 locks). AutoredirecttoKoderID hint paints inside p50.
20260524 productsdevflow (same) (same) api_version p50 TTFB 1613.977 ms — — (same) Surprisingly slow for a fixed JSON literal. PAT validation + TLS dominate; warrants serverside OTel breakdown (FLOW103/106).
20260524 productsdevflow (same) (same) apirepossearch p50 total 1743.073 ms — — (same) /api/v1/repos/search?limit=5 against the current corpus. p95 → p99 widens (2755 ms p99) — index cache misses + auth round-trip.
20260524 productsdevflow (same) (same) apirepokoder p50 total 248.528 ms — — (same) /api/v1/repos/Koder/koder hot-cache hit. p99 1011 ms reflects occasional misses. Fastest p50 in the suite.
20260524 tools/design-gen og.BuildPerPage (pre-#118: exec.Command per page) 1755 pages × NumCPU=16 per-page OG wall ~55 s — — s.khost1 fonts-smoke Incus runner Cold-start dominated. Each page spawned a Chrome process (~500ms init). Superseded by chromedp rewrite below.
20260524 tools/design-gen og.BuildPerPage (post-#118: chromedp allocator + tabs) 1755 pages × NumCPU=16 per-page OG wall ~6 s < 15 s — s.khost1 fonts-smoke Incus runner One longlived Chrome; one tab per worker; warmtab path ~50mspage. Gate enforced by `testsregression/ogperftest.go (opt-in via KDSOGPERF_TEST=1`).

No regression targets defined yet — promote to gates after the OTel pipeline (FLOW103/106) can publish serverside timings split from transport, and kboxequivalent staging exists for an applesto-apples client.

How to add an entry

  1. Run the bench you intend to record. Capture the full command line.
  2. Append rows to the appropriate sub-section above. Keep one entry per

    metric so trends per metric are easy to grep.

  3. If the entry replaces an older row (same component + same bench + same

    hardware), leave the old row in place and add the new row beneath it so historical drift is visible. Optionally annotate the older row's notes column with a → superseded YYYY-MM-DD pointer.

  4. Reference the entry from the bench's own README.md so future readers

    land here automatically.

Source: ../home/koder/dev/koder/meta/docs/stack/registries/perf-baseline.md