Hub RFC 001 delta delivery

RFC-001 — Delta Delivery

*uthor:*Koder Engineering *ate:*20260415 *tatus:*Draft *odule:*productsdevhub


Table of Contents

  1. Summary
  2. Problem Statement
  3. Goals
  4. Non-Goals
  5. Background
  6. Design
  7. Bandwidth Savings Estimate
  8. Security Considerations
  9. Failure Modes and Mitigations
  10. Observability
  11. Roadmap
  12. Open Questions
  13. Alternatives Considered
  14. References

1. Summary

Koder Hub currently redistributes full .kpkg packages on every version bump, even when the diff between two consecutive versions is minimal. For a typical desktop application of 30 MB, a 1-line bug fix produces a 30 MB download. This RFC proposes a binary delta delivery mechanism based on *sdiff* reducing incremental update bandwidth by an estimated 80–95%.


2. Problem Statement

2.1 Current Behavior

When a publisher pushes myapp@1.2.1 after myapp@1.2.0:

  1. The full .kpkg (20–50 MB) is uploaded to the object store.
  2. Every client polling for updates downloads the full package regardless of how small the actual change is.
  3. At 10,000 active installations updating once per week, a 30 MB package consumes *300 GB/week*of egress per app.

2.2 Root Cause

The Store treats packages as opaque blobs. There is no mechanism to compute, store, or serve the difference between two versions of the same package.

2.3 Impact

  • *ublisher cost:*egress and storage grow linearly with each release regardless of diff size.
  • *nduser experience:*slow updates on metered or lowbandwidth connections.
  • *ederation cost (RFC-002):*mirrors pulling full packages amplify the bandwidth problem across the network.

2.4 Data Point

Empirical analysis of five internal Koder applications shows that consecutive minor/patch releases share 90–97% of bytes at the binary level, primarily due to statically linked runtimes (Flutter, Go) and large static assets (icons, fonts, locale files) that do not change between releases.


3. Goals

  • G1: Reduce download size for minor/patch updates by 80–95%.
  • G2: Guarantee byte-perfect integrity — a corrupt or partial delta must never result in a silently broken installation.
  • G3: Zero regression for clients that do not support delta (opt-in via HTTP header).
  • G4: Publisher workflow unchanged — publishers push full packages; delta computation is automatic and server-side.
  • G5: Compatible with the federation protocol (RFC-002) — federated mirrors can serve deltas they have cached.

4. Non-Goals

  • Delta computation on the client side (Phase 2 prerequisite only).
  • Streaming / chunked patch application (out of scope for Phase 1).
  • Delta between non-consecutive versions beyond N=3 depth (diminishing returns).
  • Source-level diffing (this is binary delta only; source diffs are handled by the version control layer).

5. Background

5.1 bsdiff

bsdiff (Colin Percival, 2003) is a binary diff algorithm optimized for executable code. It exploits the structure of compiled binaries by using suffix arrays to find matching byte sequences across versions and encoding the difference as a sequence of control tuples (x, y, z) where:

  • x bytes are copied from the old file and added to a "diff" bytestring.
  • y bytes are copied verbatim from the "extra" bytestream.
  • Seek forward z bytes in the old file.

The resulting patch is highly compressible and typically 5–20× smaller than the raw binary delta between two versions of the same application.

5.2 .kpkg Format

A .kpkg file (see specs/kpkg/format.kmd) is a ZIP-based container. bsdiff operates on the raw bytes of the full .kpkg file rather than individual ZIP members, which:

  • Avoids needing a ZIP-aware differ.
  • Works correctly even when the ZIP central directory moves due to file additions/removals.
  • Is simpler to implement and audit.

6. Design

6.1 Algorithm — bsdiff

Delta generation uses the reference bsdiff implementation compiled as a Go CGO wrapper (or a pure-Go reimplementation such as gabstep/binarydist).

patch_bytes = bsdiff(old_kpkg_bytes, new_kpkg_bytes)
patch_compressed = gzip(patch_bytes, level=9)
artifact_name = "<sha256_new>.<sha256_old>.bsdiff.gz"

The patch is stored base64url-encoded only if transmitted in a JSON envelope; when stored as a binary object in the object store, raw bytes are used.

*otation:*

  • SHA256_NEW — lowercase hex SHA-256 of the new (target) .kpkg.
  • SHA256_OLD — lowercase hex SHA-256 of the base (source) .kpkg.

6.2 Delta Artifact Naming

<SHA256_NEW>.<SHA256_OLD>.bsdiff.gz

Example:

a3f9e1...c2b4.8d01fa...7e93.bsdiff.gz

This content-addressed naming scheme:

  • Allows caching at any layer (CDN, mirror, local disk) by identity.
  • Avoids namespace collisions between apps or versions.
  • Enables mirrors (RFC-002) to cache and serve deltas without knowing app metadata.

6.3 Publish Flow

When a publisher pushes a new version V_new of an application:

Publisher
  │
  ▼
POST /api/v1/apps/{slug}/publish
  │  body: multipart/form-data { kpkg: <binary>, manifest: <json> }
  │
  ▼
Store Backend
  ├── Validate manifest signature
  ├── Compute SHA256(V_new) → sha256_new
  ├── Store full package: objects/<sha256_new>.kpkg
  ├── Retrieve version history: [V_n-1, V_n-2, V_n-3]  (N=3)
  │
  └── For each base_version in history:
        sha256_base = base_version.sha256
        if delta_exists(sha256_new, sha256_base): skip
        patch = bsdiff(read(sha256_base + ".kpkg"), read(sha256_new + ".kpkg"))
        compressed = gzip(patch)
        store("deltas/<sha256_new>.<sha256_base>.bsdiff.gz", compressed)
        record DeltaEntry { sha256_new, sha256_base, size_bytes, created_at }

*=3 rationale:*Clients update within 3 release cycles in 99%+ of cases (internal telemetry). Storing deltas beyond 3 versions provides diminishing returns while increasing storage cost linearly.

*sync computation:*Delta generation is performed in a background worker after the publish API returns 202 Accepted. The package is immediately available for full download; delta availability follows within seconds to minutes depending on package size.

6.4 Download API

Existing endpoint (unchanged for full download):

GET /api/v1/apps/{slug}/download?version=X

New: delta-aware endpoint

GET /api/v1/apps/{slug}/download?version=X&base=SHA256_BASE

*eaders sent by client:*

Accept: application/x-bsdiff, application/zip;q=0.9
X-Koder-Store-Version: 1.2.0

*erver decision logic:*

if request has base= param AND Accept includes application/x-bsdiff:
    sha256_new = manifest[version=X].sha256
    sha256_base = base (from query param)
    delta_path = "deltas/<sha256_new>.<sha256_base>.bsdiff.gz"
    if delta_path exists in object store:
        return 200 {
            Content-Type: application/x-bsdiff
            Content-Encoding: gzip
            X-Koder-Delta-Base: <sha256_base>
            X-Koder-Delta-Target: <sha256_new>
            X-Koder-Delta-Target-Size: <uncompressed_kpkg_size>
            body: <delta bytes>
        }
    else:
        return full package (delta not yet computed or base too old)
else:
    return full package

*esponse codes:*

Code Meaning
200 Delta or full package returned
404 App or version not found
409 Delta computation in progress; retry after X seconds (Retry-After header)

6.5 Client Apply Flow

Client (Flutter / CLI)
  │
  ├── 1. Read installed package SHA256 → sha256_base
  ├── 2. Check for update: GET /apps/{slug}/manifest?version=latest
  │       → new version available, sha256_new, size_full
  │
  ├── 3. Request delta:
  │       GET /apps/{slug}/download?version=X&base=<sha256_base>
  │       Accept: application/x-bsdiff, ...
  │
  ├── 4a. If response Content-Type == application/x-bsdiff:
  │       decompress gzip → raw_patch_bytes
  │       target_bytes = bspatch(current_kpkg_bytes, raw_patch_bytes)
  │       actual_sha256 = sha256(target_bytes)
  │       if actual_sha256 != sha256_new: goto 4b (fallback)
  │       write target_bytes → disk
  │       install new package
  │
  └── 4b. Fallback (no delta, mismatch, or any error):
          GET /apps/{slug}/download?version=X  (full package)
          verify sha256
          install

*tomic write:*The patched bytes are written to a temporary file in the same directory, then renamed atomically (os.Rename) to avoid partial writes.

6.6 Integrity Verification

Integrity is enforced at three levels:

  1. *anifest SHA256:*The app manifest (signed by the publisher) contains sha256 of the expected final .kpkg. The client always verifies the applied result matches this value.
  1. *elta artifact hash:*The server records the SHA256 of each .bsdiff.gz file at generation time. The download endpoint includes X-Koder-Delta-Artifact-SHA256 header for in-transit verification.
  1. *ull-package fallback:*Any verification failure — corrupt download, bspatch error, SHA256 mismatch — triggers an automatic fallback to full package download. The fallback is logged and reported in telemetry.
Verification failure → log event {
  app: slug,
  version_new: X,
  base_sha256: ...,
  failure_reason: "sha256_mismatch" | "bspatch_error" | "download_error",
  fallback: true
}

6.7 Backwards Compatibility

Delta delivery is *trictly opt-in*by the client:

  • Clients that do not send Accept: application/x-bsdiff always receive full packages.
  • Clients that do not send ?base=SHA256 always receive full packages.
  • Existing CLI (koder-hub install) and legacy Flutter clients are unaffected.

The server never sends a delta unless the client explicitly requests one. This ensures zero behavior change for existing clients during rollout.

6.8 Storage Layout

object-store/
├── packages/
│   └── <sha256>.kpkg            # full packages (existing)
└── deltas/
    └── <sha256_new>.<sha256_old>.bsdiff.gz

*etention policy:*

  • Full packages: retain indefinitely (all versions must remain downloadable).
  • Delta artifacts: retain for as long as both the base and target versions are published. When a version is yanked, all delta artifacts referencing it as base are garbage-collected.

7. Bandwidth Savings Estimate

Based on internal analysis of 5 Koder applications:

Scenario Full Package Delta Size Saving
Patch release (bug fix) 35 MB 0.5–2 MB 94–99%
Minor release (new feature) 35 MB 2–7 MB 80–94%
Major release (new runtime) 35 MB 15–25 MB 29–57%

*leet projection*(10,000 active installs, one update per week, avg 35 MB package):

Update type Without delta With delta (est.)
Patch 350 GB/week 10–70 GB/week
Minor 350 GB/week 21–70 GB/week
Major 350 GB/week 148–249 GB/week

Delta delivery does not reduce bandwidth for major releases where large binary sections are replaced (e.g., runtime version bump), but these events are infrequent.


8. Security Considerations

8.1 Malicious Delta

A delta artifact produced from a tampered base can produce an arbitrary target file. Mitigations:

  • Delta generation runs only on the Store backend, never on user-supplied input.
  • The client always verifies the final SHA256 against the publisher-signed manifest. A valid delta applied to the wrong base will almost certainly produce a SHA256 mismatch and trigger fallback.

8.2 Delta Amplification

A malicious publisher could upload packages designed to produce pathologically large delta artifacts (patch bombs). Mitigation:

  • Maximum delta artifact size cap: min(full_package_size * 1.1, 100 MB). If bsdiff output exceeds this cap, the delta is discarded and only the full package is served.

8.3 Storage Exhaustion

With N=3 deltas per publish and a busy publisher, delta storage can grow. Mitigation:

  • Delta storage is soft-limited per app (configurable, default 500 MB).
  • Delta GC runs on version yanks and retention policy expiry.

9. Failure Modes and Mitigations

Failure Client behavior Server behavior
Delta not yet computed Fallback to full download Return 200 full package
Delta generation fails N/A Log error; mark delta as unavailable; serve full package
Client bspatch error Fallback to full download N/A
SHA256 mismatch after patch Fallback to full download Log telemetry event
Object store unavailable Retry with exponential backoff 503
Base version yanked mid-download Download will fail; client retries without base param Delta GC removes artifact

10. Observability

*etrics (Prometheus):*

store_delta_generation_duration_seconds{app, status}
store_delta_size_bytes{app, version_new, version_base}
store_delta_download_total{app, result}   # result: delta_hit | delta_miss | fallback
store_delta_bandwidth_saved_bytes_total{app}

*og events:*

  • delta.generated — artifact created, includes sizes and generation time.
  • delta.generation_failed — error detail.
  • delta.served — client received delta.
  • delta.fallback — client requested delta but received full package (with reason).
  • delta.apply_failed — client-side apply failure (reported via telemetry endpoint).

11. Roadmap

Phase 1 — Server-Side Delta Generation + API (Weeks 1–4)

  • bsdiff Go wrapper integrated into Store backend.
  • Background worker for delta generation on publish.
  • Object store layout for delta artifacts.
  • Download API deltaaware response (ContentType, headers).
  • Unit + integration tests for delta generation and serving.
  • Metrics and logging.

Phase 2 — Flutter Client Apply (Weeks 5–8)

  • bspatch integration in Flutter app (via FFI or Dart port).
  • Delta-aware download flow with SHA256 verification.
  • Fallback to full package on any failure.
  • UI: show "Updating (small patch)" vs "Updating (full download)".

Phase 3 — Edge Node Delta Serving (Weeks 9–16)

  • Federated mirrors (RFC-002) cache delta artifacts by content hash.
  • Edge nodes serve deltas without contacting Origin for known hashes.
  • Benchmark: timetoupdate at high concurrency (10k simultaneous updates).

12. Open Questions

  1. *ure Go vs CGO bsdiff:*Using CGO links the reference C implementation but complicates crosscompilation. A pureGo reimplementation (e.g., gabstep/binarydist) avoids CGO but may be slower for large packages. Decision needed before Phase 1 implementation.
  1. *elta depth N:*N3 is based on internal telemetry. Should N be configurable per app? Large enterprise deployments may have slower update cycles requiring N5 or N=7.
  1. *arallel delta generation:*Should the worker generate all N deltas in parallel? Parallelism speeds up availability but increases CPU and memory peaks. Start sequential, profile, then parallelize if needed.
  1. *elta for prerelease channels:*Should alpha and beta channels participate in delta delivery? Prerelease packages change more frequently and are installed by fewer users; the ROI may not justify the storage cost.

13. Alternatives Considered

13.1 xdelta3

xdelta3 is a widely used binary delta algorithm. Compared to bsdiff, it is faster at generation time but produces larger patches for compiled executables. Since our bottleneck is download bandwidth (not generation time), bsdiff is preferred.

13.2 rsync/librsync

rsync operates on rolling checksums and requires the client to have the old file and a rolling checksum manifest. This would require a roundtrip to compute the checksum set before the server can construct the delta. bsdiff generates the full delta serverside from both versions, avoiding the round-trip.

13.3 ZIP member-level delta

Computing bsdiff on individual ZIP members (resources, assets, native libraries) could improve patch quality when only a few members change. The complexity of a ZIPaware differ and the additional infrastructure required (memberlevel object store) do not justify the marginal improvement over whole-archive bsdiff for our package sizes.

13.4 Separate "update package" published by developers

Some stores require developers to manually publish a delta package. This shifts computation burden to the publisher and requires tooling on their side. Automatic server-side delta generation (this RFC) provides better developer ergonomics with no workflow change required.


14. References

  • C. Percival, "Naive differences of executable code," 2003. http:/ww.daemonology.netbsdiff
  • gabstep/binarydist — pure-Go bsdiff/bspatch implementation
  • specs/kpkg/format.kmd — Koder Package format specification
  • RFC002 — Federation Protocol (productsdevhubdocsrfcs/RFC002federationprotocol.md)
  • Ticket #013 — Delta Delivery Server (productsdevhubbacklogpending/013deltadelivery-server.md)
  • Ticket #014 — Delta Delivery Client (productsdevhubbacklogpending/014deltadelivery-client.md)

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/hub-RFC-001-delta-delivery.md