Perf baseline

Performance baseline registry

Anchor numbers for performance-sensitive components in the Koder Stack. One row per component bench run. Add a new entry whenever a bench moves materially or a new backend lands; do not let ad-hoc CI noise rewrite existing rows.

Each entry must reference the exact bench binary + flags so the result is reproducible. The hardware string identifies the laptop or runner the sample was taken on (uname -a head + CPU model + RAM is sufficient). Numbers from different hardware are not directly comparable; use the same row only if you can re-run on the same machine.

Schema

Column	Meaning
`date`	YYYYMMDD of the run (UTC).
`component`	`<area>/<sector>` of the binary under test.
`bench`	Bench binary path + relevant flags (sufficient to reproduce).
`dataset`	Workload size (corpus + query count, or whatever applies).
`metric`	`p50`, `p95`, `p99`, `throughput`, etc.
`value`	Measured number with unit.
`target`	RFC-defined target (or `—` if untracked).
`pass`	`✓` if `value` meets `target`, `✗` if it regresses, `—` if untracked.
`hardware`	One-line hardware tag.
`notes`	Anything noteworthy (recent change, blockers, follow-ups).

Entries

servicesaimemory — inline embedder + in-process MemoryStore

date	component	bench	dataset	metric	value	target	pass	hardware	notes
20260510	servicesaimemory	`make bench-full` (`go run ./bench --episodes=10000 --queries=1000 --top-k=10 --threshold=0.5`)	10 K episodes / 1 K queries	populate (ep/s)	70 497	—	—	koder-laptop (Linux 6.12)	First baseline; #010 partial. Bench reuses `recall.Service`, exercising the full Search + Put path.
20260510	servicesaimemory	(same)	(same)	p50	5.28 ms	< 50 ms	✓	koder-laptop	~10× headroom at 10 K. Brute-force cosine scales O(N) per query — won't hold at 100 K (hence #007).
20260510	servicesaimemory	(same)	(same)	p95	7.87 ms	—	—	koder-laptop
20260510	servicesaimemory	(same)	(same)	p99	10.45 ms	< 200 ms	✓	koder-laptop	~20× headroom at 10 K.
20260510	servicesaimemory	(same)	(same)	throughput	176.5 q/s	—	—	koder-laptop	Single-goroutine. Multiply by core count for replica capacity ceiling.

engineslangkoda — kasm assembler (ET_REL mode)

date	component	bench	dataset	metric	value	target	pass	hardware	notes
20260516	engineslangkoda	`./kasm -c /tmp/koder_stage1_src.asm /tmp/kasm_stage1.o`	koder stage1 bundle, 809 883 lines (~30 MB)	wall (line 1–170 K)	5.8 s	≤ 7× nasm `-O0`	—	koder-laptop (Linux 6.12)	Pre~~cascade baseline. Steady chunk_ms 85–137 ms up to line 140 K then super~~linear cliff (243→960→3176→>100 000 ms). Tracked by `engines/lang/koda#764`.
20260516	engineslangkoda	(same)	(same)	wall (full bundle)	DNF (≥1800 s)	≤ 90 s	✗	koder-laptop	Pre-cascade baseline. Killed by SIGKILL after >100 s wall on the chunk past line 170 000. Was blocking `#748` criterion 3.
20260516	engineslangkoda	`nasm -O0 -felf64 /tmp/koder_stage1_src.asm -o /tmp/nasm.o` (reference)	(same)	wall (full bundle)	~13 s	—	—	koder-laptop	External reference assembler. kasm's target envelope (≤7× nasm) anchors to this number.
20260520	engineslangkoda	(same)	koder stage1 bundle, 817 K lines (post-cascade build)	wall (full bundle)	7 s	≤ 90 s	✓	s.khost1.dev~~linux~~koda (Linux 6.12)	ost~~cascade PASS— RFC~~019 cascade (#770#773#774#771) + label intern (#765). 90 s target met by 43 s margin. #748 `criterion 3 PASS*. Cliff eliminated; rerun via VM per` policiestest~~host~~isolation.kmd`.
20260524	engineslangkoda	`make build-safe` × 3 generations	full koder + smoke roundtrip	wall × 3 (gen-123)	11m51s / 12m16s / 11m51s	bootstrap stable	✓	s.khost1.dev~~linux~~koda	772 validation gate PASS— gen~~2 =gen~~3 md5 `87d99225…` byte~~identical (fixed point) and matches historical 2026~~05~~20 phase~~4a canonical. RFC-019 cascade FULLY CLOSED (`#750/#770/#771/#772/#773/#774`).
20260524	engineslangkoda	`make test` (full regression suite)	1747-line full output	wall + PASS/FAIL count	35m25s · 260 P / 50 F~~files (127 F~~lines)	suite green	✗ (baseline)	s.khost1.dev~~linux~~koda	Pre~~Phase~~1 baseline — predates `apt install python3` + chmod fixes. Captured pre-`#784`.
20260524	engineslangkoda	`make test` (post Phase-1 env setup)	1808-line full output	wall + PASS/FAIL count	36m50s · 277 P / 33 F~~files (108 F~~lines)	suite green	✗ (Phase 1 done)	s.khost1.dev~~linux~~koda	#784` Phase 1 PASS— 17 FAIL files cleared by python3 install + chmod. Remaining 33 FAILs scheduled for Phase 2 (Cat. B binaries) + Phase 3 (Cat. C substantive triage).
20260524	engineslangkoda	`xvfb-run -a make test` (post Wave 2a+2b)	823-line full output	wall + PASS/FAIL count	~58min · 278 P / 33 F~~files (107 F~~lines)	suite green	✗ (Phase 2 partial)	s.khost1.dev~~linux~~koda	#784 `Wave 2a+2b SHIPPED* — bundling shim in` scriptstest~~native.sh `(resolves` require `for native tests) +` koderuiserver` PATH symlink + apt install xvfb + libgtk~~4-1. Native `test_ui_hello` now PASS. NUIX11 tests start under xvfb but don't emit PASS (interactive GUI by design — Cat. C redesign work).

enginescompresskzip — pluggable compressor backends at 1 GiB scale

Productionreadiness milestone (v0.2.0, Passo 4 of the roadmap). Validates whether the LZMA2 / Brotli backends scale predictably past the 100 MiB workloads captured at engines/compress/kzip/bench/baseline.md. Full pertool numbers (incl. FOSS reference comparators at 100 MiB) live in the module's bench file; this section anchors only the kzip-specific 1 GiB rows.

date	component	bench	dataset	metric	value	target	pass	hardware	notes
20260519	enginescompresskzip	`bash bench/bench.sh --workload random-1g --tools kzip-baseline,kzip-lzma2,kzip-brotli`	1024 MiB `/dev/urandom`	compress (s, zstd:3)	8.548	—	—	koder-laptop (Linux 6.12)	kzip~~baseline. Linear scale~~up from 100 MiB (1.88s → 8.55s ≈ 4.5×). RSS 552 MiB.
20260519	enginescompresskzip	(same)	(same)	compress (s, lzma2:6)	39.650	—	—	koder-laptop	~4.6× slower than zstd, RSS 2.2 GiB. Predictable LZMA cost; no ratio gain on `/dev/urandom`.
20260519	enginescompresskzip	(same)	(same)	compress (s, brotli:6)	9.983	—	—	koder-laptop	~1.2× zstd time, RSS 733 MiB. No ratio gain on incompressible data; speed/RAM matches zstd.
20260519	enginescompresskzip	`bash bench/bench.sh --workload text-1g --tools kzip-baseline,kzip-lzma2,kzip-brotli`	1024 MiB repeating text	output size (MiB)	0.05 (zstd)	—	—	koder-laptop	kzip~~baseline output 0.05 MiB on uniform repeating~~seed text. zstd wins at this scale on synthetic-uniform data.
20260519	enginescompresskzip	(same)	(same)	output size (MiB)	.07 (lzma2)	—	✗	koder-laptop	ounter~~RFC~~001:lzma2 produces MORE bytes than zstd on this corpus. RFC's "20-30% better than zstd" claim NOT validated.
20260519	enginescompresskzip	(same)	(same)	output size (MiB)	.01 (brotli)	—	—	koder-laptop	rotli wins at 1 GiBon repeating text: 5× smaller than zstd, 7× smaller than lzma2. Tail-serve advantage compounds.

*pen follow~~up (kzip):*rerun on a real~~world mixed~~vocabulary corpus (wikitext, linux~~source dump) before claiming LZMA2 ratio in operator~~facing docs. The synthetic `text~~1g` workload is too uniform — every backend converges to >99.99% reduction; the differences are noise-level (~0.06 MiB on a 1 GiB input).

*esolved follow~~up (2026~~05~~19):*bench rerun on the kzip source tree (28 MiB real~~world mixed Go+docs+tests corpus). Result: lzma2:6 = 23.29 MiB out (*.7% smaller*than zstd:3 = 23.46 MiB) at *.4× compress time + 3.8× RSS* Brotli:6 matches lzma2's ratio at 1.08× zstd's time. *FC~~001 "20~~30% better" LZMA2 claim falsified on real corpora at this scale.*Default zstd dominates the speedratio Pareto for typical kzip workloads (<1 GiB mixed corpora). Operator guidance update: reserve --compressor=lzma2 for multi~~GB cold~~archival workloads with workload-specific AB test confirmation; docs/usage/compression.kmd to be updated accordingly (sub~~follow~~up).

date	component	bench	dataset	metric	value	target	pass	hardware	notes
20260519	enginescompresskzip	`bash bench/bench.sh --workload source --tools kzip-baseline,kzip-lzma2,kzip-brotli`	kzip source tree (28.08 MiB mixed)	ratio (lzma2 vs zstd)	��0.7%	−20 to −30% per RFC-001	✗	koder-laptop (Linux 6.12)	LZMA2 falsified on real corpus. Output 23.29 vs 23.46 MiB. Compress time 2.4× zstd, RSS 3.8× zstd.
20260519	enginescompresskzip	(same)	(same)	ratio (brotli vs zstd)	��0.9%	—	—	koder-laptop	Brotli matches lzma2's ratio at 1.08× zstd's time. Better Pareto point than lzma2 on this corpus.

Pending baselines

services/ai/memory — BGE~~multilingual + kdb~~vector @ 100 K (RFC target
p50 < 100 ms, p99 < 500 ms). Blocked on services/ai/memory#007 (kdb vector store swap) and #008 (BGE real embedder).
services/ai/gateway — round-trip overhead for cached vs uncached
proxy (RFC~~001 follow~~up; bench harness not yet written).
services/ai/cache — exact-match lookup p99 vs raw upstream (blocked
on #002 skeleton).
engines/lang/koda — kasm residual mid-range timing variance
(240K-280K range). After full cascade closed (#764, #748 criterion 3 PASS at 47s post-#765), a P2 polish ticket #776 tracks the non-deterministic spike at 280K (81 s in one run vs 16.5 s at 300K in same run — GC~~pressure~~noise, not structural). Bench evidence table in the ticket; rerun in VM when full stage1 needs to drop below 30 s for CI/CD self-host loops.

infranetbox — kbox container runtime cold-start

Tracked per meta/docs/stack/registries/self-hosted-pairs.md row 13: performance:cold_start_benchmark is the open gate before kbox flips from stable → official versus DockerPodman. The bench below is a placeholder row — the actual run + numbers land alongside the cold-start benchmark harness (kbox `testsperfcold-start), opened as a follow-up ticket when an isolated benchmark VM (s.khost1) is ready per policies/test~~host~~isolation.kmd`.

date	component	bench	dataset	metric	value	target	pass	hardware	notes
20260520	infranetbox	`kbox run --rm alpine:3.18 /bin/true` × N (cold rootfs each)	alpine:3.18, 1 layer, ~3.4 MiB	p50 wall	—	≤ `podman run` p50 × 1.1	—	s.khost1.dev~~linux~~koda	Placeholder. Bench harness opening as follow-up. Cliff: until this row populates, `cold_start_benchmark` gate of pair #13 stays open.
20260520	infranetbox	(same)	(same)	p99 wall	—	≤ `podman run` p99 × 1.2	—	(same)	Same row as above; populated together.

productsdevflow — public API latency (single-client home LAN baseline)

First Flow baseline. Captures TTFB + total-response timings against flow.koder.dev from the developer laptop. Numbers anchor public client~~perceived latency for the Forgejo~~fork (HTTP handlers + PAT auth + repo index). Server CPU time is not isolated here — the sample includes ISP RTT + TLS + EVEO edge. Re-run via products/dev/flow/engine/perf-baseline/bench.sh; full snapshot at products/dev/flow/engine/perf-baseline/2026-05-24.md.

date	component	bench	dataset	metric	value	target	pass	hardware	notes
20260524	productsdevflow	`engine/perf-baseline/bench.sh` (ITER20, WARMUP3)	5 scenarios × 20 iter	home p50 TTFB	361.856 ms	—	—	koder~~laptop (Linux 6.12 + home Wi~~Fi + BR ISP → flow.koder.dev)	FLOW-116 first baseline; `LANDING_PAGE = login` 303 redirect. Includes ISP RTT + TLS.
20260524	productsdevflow	(same)	(same)	user_login p50 TTFB	382.839 ms	—	—	(same)	Custom signin.tmpl (RFC~~006 §1; FLOW~~115 locks). Auto~~redirect~~to~~Koder~~ID hint paints inside p50.
20260524	productsdevflow	(same)	(same)	api_version p50 TTFB	1613.977 ms	—	—	(same)	Surprisingly slow for a fixed JSON literal. PAT validation + TLS dominate; warrants server~~side OTel breakdown (FLOW~~103/106).
20260524	productsdevflow	(same)	(same)	apirepossearch p50 total	1743.073 ms	—	—	(same)	`/api/v1/repos/search?limit=5` against the current corpus. p95 → p99 widens (2755 ms p99) — index cache misses + auth round-trip.
20260524	productsdevflow	(same)	(same)	apirepokoder p50 total	248.528 ms	—	—	(same)	`/api/v1/repos/Koder/koder` hot-cache hit. p99 1011 ms reflects occasional misses. Fastest p50 in the suite.
20260524	tools/design-gen	`og.BuildPerPage` (pre-#118: exec.Command per page)	1755 pages × NumCPU=16	per-page OG wall	~55 s	—	—	s.khost1 fonts-smoke Incus runner	Cold-start dominated. Each page spawned a Chrome process (~500ms init). Superseded by chromedp rewrite below.
20260524	tools/design-gen	`og.BuildPerPage` (post-#118: chromedp allocator + tabs)	1755 pages × NumCPU=16	per-page OG wall	~6 s	< 15 s	—	s.khost1 fonts-smoke Incus runner	One long~~lived Chrome; one tab per worker; warm~~tab path ~50mspage. Gate enforced by `testsregression/ogperftest.go `(opt-in via` KDSOGPERF_TEST=1`).

No regression targets defined yet — promote to gates after the OTel pipeline (FLOW~~103/106) can publish server~~side timings split from transport, and kbox~~equivalent staging exists for an apples~~to-apples client.

How to add an entry

Run the bench you intend to record. Capture the full command line.
Append rows to the appropriate sub-section above. Keep one entry per
metric so trends per metric are easy to grep.
If the entry replaces an older row (same component + same bench + same
hardware), leave the old row in place and add the new row beneath it so historical drift is visible. Optionally annotate the older row's notes column with a → superseded YYYY-MM-DD pointer.
Reference the entry from the bench's own README.md so future readers
land here automatically.