design-RFC-009 — Incremental build for design-gen

draft

Today `make gen` in `tools/design-gen` always full-renders the entire KDS site — ~480 pages × ~3 locales + tokens + OG images + JSON APIs + sitemap + llms.txt. Observed wall-clock: ~10m27s for 226 specs (2026-05-24 timing). Dev iteration cost grows linearly with the corpus. This RFC ratifies the **content-hash + manifest** incremental-build approach (Option A) before implementation begins, with the file-watcher dev mode (Option B) layered on top, and the Bazel-style dependency graph (Option C) deferred until KDS exceeds 1000 pages.

designRFC009 — Incremental build for design-gen

*tatus* v0.1.0 *raft*20260524. Ratification expected once at least one option has a spike branch demonstrating the manifest + dirty-detection mechanics on the live spec corpus.

R1 — Problem

tools/design-gen is the static-site generator behind kds.koder.dev. Today's pipeline (make gen./design-gen):

Step Cost (20260524 build)
templ codegen ~1s
Spec walk + parse ~2s
Render dispatch (~480 pages × 3 locales) ~30s
Search index ~1s
apiv1/ payloads ~3s
llms.txt + fingerprint + MCP descriptor ~1s
Token exports (8 langs) <1s
Site-wide OG image ~2s
Per-page OG (chromedp, 1755 pages × NumCPU pool) *50s after #106*(was ~190s pre-pool)
Sitemap / robots / RSS / root index ~1s
*otal* *10m27s*observed on the May 24 full run

The wall-clock dominates the dev feedback loop. Editing one spec or one i18n string still triggers a full re-render of every locale × every page — none of the cost is content-sensitive today.

Observed pain points triggering this RFC:

  1. *0260521 → 24 staledeploy incident* 670d6545c9 (theme

    toggle duplicate removed) lived in master for 3 days without deploy because the only path to publish was the full 10-minute build cycle, which nobody triggered between commits.

  2. *ebhook auto-rebuild (#116)*now triggers on every push to

    specspoliciesrfcs — runtime budget became a real constraint (10min × N commits/day = real CI minutes).

  3. *uild observability (#111)*shipped 20260524 — per-phase

    timings expose the leverage points but don't change the cost.

R2 — Goals

  • Edits to *ne spec*rebuild *nly that spec's outputs*(page +

    perlocale variants + perpage OG + JSON payload + search index entry) + any cross-cuts that depend on it (sitemap, llms.txt, apiv1/specs.json catalog row).

  • Edits to *ne i18n string*rebuild *nly pages referencing

    that key*

  • Edits to * template*(.templ) rebuild *very page using that

    template*— coarse but predictable.

  • *ero-cost cache invalidation* changes to the generator itself

    (Go source, templ codegen output, asset CSS/JS) force a full rebuild because the cache key includes the generator's own digest.

  • *itforbit reproducibility* make gen-incremental and `make

    gen (full rebuild) MUST produce byte-identical dist/` (modulo non-deterministic timestamps which we normalize). A CI parity check enforces this.

  • *o regression in deploy correctness* the freshness

    fingerprint (#112) embeds the commit SHA, which propagates regardless of whether the page was rebuilt or cache-hit.

R3 — Non-goals

  • Distributed build cache (Bazel remote cache, CI sccache). Single-

    machine incremental is enough until KDS exceeds 1000 pages.

  • Cross-build deduplication of OG image generation across versioned

    snapshots (make version-gen). Each version is a separate dist/<version>/ tree.

  • Hot module reload / live preview in browser without page reload.

    make serve already provides full-page refresh; a watcher mode is a WaveB followup of this RFC (Option B below) and explicitly scoped to dev only.

R4 — Options considered

Option A — Contenthash manifest with perspec dirty detection

*echanism.*Each renderable unit (page, JSON payload, OG image, token bundle entry) gets a deterministic cache key:

key(spec, locale, kind) = sha256(
    spec.path                                   // identity
    + spec.frontmatter_canonical_json           // metadata
    + spec.body_bytes                           // content
    + i18n.locale_bundle_hash                   // strings
    + template.digest                           // kind-specific templ output
    + assets.css_digest                         // base.css hash
    + generator.go_digest                       // tools/design-gen binary
    + render.version                            // bump on contract change
)

The build emits dist/.build-manifest.json with {output_path: {key, mtime}}. On the next make gen-incremental:

  1. Walk the spec corpus
  2. For each (spec, locale, kind) tuple, compute the key
  3. Compare against the manifest:
    • *it*→ assert the output file exists; skip rendering
    • *iss*→ render the unit; update manifest entry
  4. Run cross-cut steps (sitemap, llms.txt, apiv1/specs.json) only

    if ANY page changed (cheap aggregation: hash of all keys)

  5. Write the updated manifest atomically

*roperties.*

  • Zero false positives (deterministic keys; same inputs → same key)
  • Conservative false negatives (template digest change → all pages

    using that template re-render; acceptable)

  • Bitforbit reproducibility easy to enforce (incremental = subset

    of full)

  • Manifest is plain JSON — debuggable via jq + git diff
  • Storage: ~2 KB per page × 1755 → ~3.5 MB. Trivial.
  • Memory: O(N) page entries during build; same order as today.

*ffort.*~1-2 weeks impl + a week of tuning. Spike branch plausible in a single afternoon.

*isks.*

  • OG image cache already exists at internal/og/cache.go (#106

    cache layer). Need to deduplicate concepts so we don't store the same digest in two places — this RFC's manifest *bsorbs*the OG cache (single source of truth).

  • chromedp screenshot output is pixel-stable only when Chrome

    version is pinned. Including Chrome version in the key handles this — bump Chrome → full OG re-render. Already handled implicitly by the OG cache's content-hash approach.

Option B — File-watcher dev mode

*echanism.*make serve already runs ./design-gen once then serves dist/. Wrap the generator in a watch loop: subscribe to fsnotify events on meta/docs/stack/specs/, policies/, rfcs/, tools/design-gen/i18n/, tools/design-gen/assets/. On change:

  1. Reload the changed source
  2. Re-key affected units (subset of Option A's manifest)
  3. Re-render only those units
  4. Trigger browser refresh via the existing make serve JS hook

*roperties.*

  • Devonly — production builds use Option A's full oneshot path.
  • Sub-second turnaround for typical edits (1 spec / 1 i18n key / 1

    template).

  • Layers on top of Option A — without A, every save would re-render

    everything.

*ffort.*~3-5 days once Option A is in place.

Option C — Bazel-style dependency graph

*echanism.*Declare the generator as a build graph: each spec, each i18n entry, each template is a node; render units are leaves. A custom DAG walker (or actual Bazel) computes the minimal closure of dirty nodes.

*roperties.*

  • Most rigorous — surgical invalidation across complex cross-cuts.
  • Maps naturally to remote cache (distributed builds).

*ost.*

  • 2-4 weeks to design + impl.
  • Bazel itself is heavyweight (Java runtime, sandbox overhead) and

    conflicts with the "go install one binary" simplicity that design-gen targets.

  • A custom DAG walker is a substantial maintenance burden — every

    new spec kind needs a node-type registration.

*erdict.*Overkill until KDS exceeds 1000 pages or the design team genuinely needs remote builds. Documented here so future maintainers know it was considered.

R5 — Decision

*roposed:**ption A*(content-hash manifest) ships first, behind a make gen-incremental target. make gen (full rebuild) remains as the safety net + the canonical reproducibility check. *ption B*(filewatcher) layers on top of A in a followup ticket. *ption C*is documented as deferred.

Rationale per policies/hyperscale-first.kmd:

  • Same engineering effort as the simplest alternative ("just hash

    inputs"), with material wins as the corpus grows.

  • Stays within go install one binary ergonomics.
  • Reproducibility check rules out the entire class of "incremental

    built something wrong" failure modes.

R6 — Acceptance for ratification

Ratification requires:

  1. A spike branch implementing Option A's manifest writer + reader

    for one kind (suggest: component — already covered by #060/#061 so test fixtures exist).

  2. Benchmark on a representative change:
    • 1 spec edit (expected: <5s wall clock vs ~10min full)
    • 1 i18n key edit (expected: <30s — partial page set)
    • 1 template edit (expected: ~30-60s — full kind invalidation)
    • cold cache (expected: ≤ current full rebuild)
  3. CI parity check confirming incremental output matches full

    output bitforbit.

  4. Buildstats (#111) summary shows perkind skip counts so

    operators can see the cache hit rate.

R7 — Migration

  • Phase 1 (this RFC) — ratify approach.
  • Phase 2 — Option A impl (#105). Both make gen and `make

    gen-incremental available. Default for make gen` stays full.

  • Phase 3 — once parity check is reliable for 2 weeks, flip

    make gen default to incremental and rename the full-rebuild target to make gen-full.

  • Phase 4 — Option B filewatcher (new ticket, opens postPhase 2).
  • Phase 5 — Option C re-evaluation (new RFC, opens only if

    KDS > 1000 pages OR remote build need emerges).

R8 — Notnow / reopen triggers

  • *e-open Option C*when:
    • KDS spec corpus exceeds 1000 pages (today ~226 specs × 3

      locales = ~680 pages), OR

    • Multiple operators / CI runners need to share a build cache,

      OR

    • The manifest's RAM footprint becomes a constraint (unlikely;

      JSON parsing scales fine to 10⁴ entries).

  • *e-open Option B's scope*when:
    • The team consistently runs make serve for >30min sessions

      and the current fullrebuildon-save UX becomes the bottleneck.

R9 — Cross-references

  • tools/design-gen/backlog/pending/105-incremental-build.kmd

    the implementation ticket this RFC ratifies.

  • tools/design-gen/backlog/done/111-build-observability.kmd

    the per-phase timings that motivated this RFC.

  • tools/design-gen/backlog/done/106-chromedp-worker-pool.kmd

    per-page OG pool; this RFC's manifest absorbs the OG cache.

  • tools/design-gen/backlog/pending/118-og-chromedp-rewrite.kmd

    chromedp full rewrite (orthogonal — runs inside the cache-miss path either way).

  • meta/docs/stack/policies/hyperscale-first.kmd — primary policy

    motivating the choice.

  • meta/docs/stack/modules/design-gen.md — module deep-dive that

    describes the pipeline this RFC accelerates.

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/design-RFC-009-incremental-build.kmd