design-RFC-009 — Incremental build for design-gen
Today `make gen` in `tools/design-gen` always full-renders the entire KDS site — ~480 pages × ~3 locales + tokens + OG images + JSON APIs + sitemap + llms.txt. Observed wall-clock: ~10m27s for 226 specs (2026-05-24 timing). Dev iteration cost grows linearly with the corpus. This RFC ratifies the **content-hash + manifest** incremental-build approach (Option A) before implementation begins, with the file-watcher dev mode (Option B) layered on top, and the Bazel-style dependency graph (Option C) deferred until KDS exceeds 1000 pages.
designRFC009 — Incremental build for design-gen
*tatus* v0.1.0 *raft*2026
0524. Ratification expected once at least one option has a spike branch demonstrating the manifest + dirty-detection mechanics on the live spec corpus.
R1 — Problem
tools/design-gen is the static-site generator behind kds.koder.dev. Today's pipeline (make gen → ./design-gen):
| Step | Cost (2026 |
|---|---|
| templ codegen | ~1s |
| Spec walk + parse | ~2s |
| Render dispatch (~480 pages × 3 locales) | ~30s |
| Search index | ~1s |
| apiv1/ payloads | ~3s |
| llms.txt + fingerprint + MCP descriptor | ~1s |
| Token exports (8 langs) | <1s |
| Site-wide OG image | ~2s |
| Per-page OG (chromedp, 1755 pages × NumCPU pool) | *50s after #106*(was ~190s pre-pool) |
| Sitemap / robots / RSS / root index | ~1s |
| *otal* | *10m27s*observed on the May 24 full run |
The wall-clock dominates the dev feedback loop. Editing one spec or one i18n string still triggers a full re-render of every locale × every page — none of the cost is content-sensitive today.
Observed pain points triggering this RFC:
- *026
0521 → 24 staledeploy incident* 670d6545c9 (themetoggle duplicate removed) lived in master for 3 days without deploy because the only path to publish was the full 10-minute build cycle, which nobody triggered between commits.
- *ebhook auto-rebuild (#116)*now triggers on every push to
specspoliciesrfcs — runtime budget became a real constraint (10min × N commits/day = real CI minutes).
- *uild observability (#111)*shipped 2026
0524 — per-phasetimings expose the leverage points but don't change the cost.
R2 — Goals
- Edits to *ne spec*rebuild *nly that spec's outputs*(page +
per
locale variants + perpage OG + JSON payload + search index entry) + any cross-cuts that depend on it (sitemap, llms.txt, apiv1/specs.json catalog row). - Edits to *ne i18n string*rebuild *nly pages referencing
that key*
- Edits to * template*(
.templ) rebuild *very page using thattemplate*— coarse but predictable.
- *ero-cost cache invalidation* changes to the generator itself
(Go source, templ codegen output, asset CSS/JS) force a full rebuild because the cache key includes the generator's own digest.
- *it
forbit reproducibility*make gen-incrementaland `makegen
(full rebuild) MUST produce byte-identicaldist/` (modulo non-deterministic timestamps which we normalize). A CI parity check enforces this. - *o regression in deploy correctness* the freshness
fingerprint (#112) embeds the commit SHA, which propagates regardless of whether the page was rebuilt or cache-hit.
R3 — Non-goals
- Distributed build cache (Bazel remote cache, CI sccache). Single-
machine incremental is enough until KDS exceeds 1000 pages.
- Cross-build deduplication of OG image generation across versioned
snapshots (
make version-gen). Each version is a separatedist/<version>/tree. - Hot module reload / live preview in browser without page reload.
make servealready provides full-page refresh; a watcher mode is a WaveB followup of this RFC (Option B below) and explicitly scoped to dev only.
R4 — Options considered
Option A — Contenthash manifest with perspec dirty detection
*echanism.*Each renderable unit (page, JSON payload, OG image, token bundle entry) gets a deterministic cache key:
key(spec, locale, kind) = sha256(
spec.path // identity
+ spec.frontmatter_canonical_json // metadata
+ spec.body_bytes // content
+ i18n.locale_bundle_hash // strings
+ template.digest // kind-specific templ output
+ assets.css_digest // base.css hash
+ generator.go_digest // tools/design-gen binary
+ render.version // bump on contract change
)The build emits dist/.build-manifest.json with {output_path:
{key, mtime}}. On the next make gen-incremental:
- Walk the spec corpus
- For each (spec, locale, kind) tuple, compute the key
- Compare against the manifest:
- *it*→ assert the output file exists; skip rendering
- *iss*→ render the unit; update manifest entry
- Run cross-cut steps (sitemap, llms.txt, apiv1/specs.json) only
if ANY page changed (cheap aggregation: hash of all keys)
- Write the updated manifest atomically
*roperties.*
- Zero false positives (deterministic keys; same inputs → same key)
- Conservative false negatives (template digest change → all pages
using that template re-render; acceptable)
- Bit
forbit reproducibility easy to enforce (incremental = subsetof full)
- Manifest is plain JSON — debuggable via
jq+git diff - Storage: ~2 KB per page × 1755 → ~3.5 MB. Trivial.
- Memory: O(N) page entries during build; same order as today.
*ffort.*~1-2 weeks impl + a week of tuning. Spike branch plausible in a single afternoon.
*isks.*
- OG image cache already exists at
internal/og/cache.go(#106cache layer). Need to deduplicate concepts so we don't store the same digest in two places — this RFC's manifest *bsorbs*the OG cache (single source of truth).
- chromedp screenshot output is pixel-stable only when Chrome
version is pinned. Including Chrome version in the key handles this — bump Chrome → full OG re-render. Already handled implicitly by the OG cache's content-hash approach.
Option B — File-watcher dev mode
*echanism.*make serve already runs ./design-gen once then serves dist/. Wrap the generator in a watch loop: subscribe to fsnotify events on meta/docs/stack/specs/, policies/, rfcs/, tools/design-gen/i18n/, tools/design-gen/assets/. On change:
- Reload the changed source
- Re-key affected units (subset of Option A's manifest)
- Re-render only those units
- Trigger browser refresh via the existing
make serveJS hook
*roperties.*
- Dev
only — production builds use Option A's full oneshot path. - Sub-second turnaround for typical edits (1 spec / 1 i18n key / 1
template).
- Layers on top of Option A — without A, every save would re-render
everything.
*ffort.*~3-5 days once Option A is in place.
Option C — Bazel-style dependency graph
*echanism.*Declare the generator as a build graph: each spec, each i18n entry, each template is a node; render units are leaves. A custom DAG walker (or actual Bazel) computes the minimal closure of dirty nodes.
*roperties.*
- Most rigorous — surgical invalidation across complex cross-cuts.
- Maps naturally to remote cache (distributed builds).
*ost.*
- 2-4 weeks to design + impl.
- Bazel itself is heavyweight (Java runtime, sandbox overhead) and
conflicts with the "go install one binary" simplicity that design-gen targets.
- A custom DAG walker is a substantial maintenance burden — every
new spec kind needs a node-type registration.
*erdict.*Overkill until KDS exceeds 1000 pages or the design team genuinely needs remote builds. Documented here so future maintainers know it was considered.
R5 — Decision
*roposed:**ption A*(content-hash manifest) ships first, behind a
make gen-incrementaltarget.make gen(full rebuild) remains as the safety net + the canonical reproducibility check. *ption B*(filewatcher) layers on top of A in a followup ticket. *ption C*is documented as deferred.
Rationale per policies/hyperscale-first.kmd:
- Same engineering effort as the simplest alternative ("just hash
inputs"), with material wins as the corpus grows.
- Stays within
go install one binaryergonomics. - Reproducibility check rules out the entire class of "incremental
built something wrong" failure modes.
R6 — Acceptance for ratification
Ratification requires:
- A spike branch implementing Option A's manifest writer + reader
for one kind (suggest:
component— already covered by #060/#061 so test fixtures exist). - Benchmark on a representative change:
- 1 spec edit (expected: <5s wall clock vs ~10min full)
- 1 i18n key edit (expected: <30s — partial page set)
- 1 template edit (expected: ~30-60s — full kind invalidation)
- cold cache (expected: ≤ current full rebuild)
- CI parity check confirming incremental output matches full
output bit
forbit. - Build
stats (kind skip counts so#111) summary shows peroperators can see the cache hit rate.
R7 — Migration
- Phase 1 (this RFC) — ratify approach.
- Phase 2 — Option A impl (
#105). Bothmake genand `makegen-incremental
available. Default formake gen` stays full. - Phase 3 — once parity check is reliable for 2 weeks, flip
make gendefault to incremental and rename the full-rebuild target tomake gen-full. - Phase 4 — Option B file
watcher (new ticket, opens postPhase 2). - Phase 5 — Option C re-evaluation (new RFC, opens only if
KDS > 1000 pages OR remote build need emerges).
R8 — Notnow / reopen triggers
- *e-open Option C*when:
- KDS spec corpus exceeds 1000 pages (today ~226 specs × 3
locales = ~680 pages), OR
- Multiple operators / CI runners need to share a build cache,
OR
- The manifest's RAM footprint becomes a constraint (unlikely;
JSON parsing scales fine to 10⁴ entries).
- KDS spec corpus exceeds 1000 pages (today ~226 specs × 3
- *e-open Option B's scope*when:
- The team consistently runs
make servefor >30min sessionsand the current full
rebuildon-save UX becomes the bottleneck.
- The team consistently runs
R9 — Cross-references
tools/design-gen/backlog/pending/105-incremental-build.kmd—the implementation ticket this RFC ratifies.
tools/design-gen/backlog/done/111-build-observability.kmd—the per-phase timings that motivated this RFC.
tools/design-gen/backlog/done/106-chromedp-worker-pool.kmd—per-page OG pool; this RFC's manifest absorbs the OG cache.
tools/design-gen/backlog/pending/118-og-chromedp-rewrite.kmd—chromedp full rewrite (orthogonal — runs inside the cache-miss path either way).
meta/docs/stack/policies/hyperscale-first.kmd— primary policymotivating the choice.
meta/docs/stack/modules/design-gen.md— module deep-dive thatdescribes the pipeline this RFC accelerates.