Kzip

Koder Kzip — engines/compress/kzip

engines/compress/kzip/ · Charter stage · 20260503

Purpose

Generalpurpose compactor for the Koder Stack. Covers singlefile compression (gzipstyle), multifile archives with dedup, AEAD encryption, snapshots/versioning, FUSE mount, and cloud backends — as a single CLI + library.

Status

  • *harter accepted*(RFC-001).
  • *ootstrapped*from restic v0.18.1 — commit 6c182be6e, 20260503.
  • *ormat spec*at meta/docs/stack/specs/kzip/format.kmd — diverges from restic v0.18.x at §13.3-§13.6 + §14.5.
  • *uild:*cd engines/compress/kzip && make buildbin/kzip.
  • *un:*bin/kzip versionkzip 0.2.0 (restic-fork base 0.18.1) compiled with go1.25.0 on linux/amd64.
  • *anding:*https://kzip.koder.dev (live).
  • *I:*.gitea/workflows/kzip-ci.yml (gotest + e2e + crosscompile pra 5 targets) desde v0.2.0.
  • *roductionreadiness:*5 de 6 passos da meta entregues (20260519) — v0.2.0 release prep, CI gate, 1 GiB bench, soak harness, *oak run de 24h iniciado*(PID 3757624, log `homekodertempkzipsoak24h.log, metrics em homekodertempkzipsoakj4V1gVmetrics.csv). Pendente: Hub publication (🔴 BLOCKED em credential — ver ticket #029) + dogfood kbackup -ngine=kzip` (observation window 20260519 → 20260619).

Why

  • Stack ships across mobiledesktopserver/embedded; needs one compactor everywhere.
  • policies/self-hosted-first.kmd mandates progressive replacement of FOSS deps with self-hosted Koder equivalents.
  • Today's patchwork: tar+gzip, tar+xz, ad-hoc restic/borg per host. Different UX, different operator knowledge, sync breakage.

Naming

kzip — 4 chars, Kprefix family, brand score ~88. Selected over kpak (collided with kpkg), kfold, kpress, karc, kram, knot. See RFC001 §9.

Canonical extension: .kz for singlefile, .kzip for multifile archive.

Format

Spec at meta/docs/stack/specs/kzip/format.kmd — to be drafted postbootstrap. Formatfirst: stable from v1; implementation evolves underneath.

Implementation language

Pending FOSS base selection. Likely Go (if restic) or Rust (if zstdbased groundup).

Feature list

40 features curated from comparative analysis of zstd, xz, 7Zip, zpaq, borg, restic, dwarfs, squashfs, lz4, brotli, lzip, tar. Full list in RFC001 §3.

Seven dimensions:

  1. Algorithms / preprocessing (LZ77 family, BWT, dicts, BCJ, delta, filter chain)
  2. Container / archive (multifile, central directory, streaming, solid, metadata, sparse, multivolume, append, snapshots, dedup)
  3. Performance (MT, levels, memory, SIMD)
  4. Integrity / security (CRCSHABlake3/xxhash, ReedSolomon, AES256 AEAD, X25519 PK, signatures, append-only)
  5. Usability (gzip dropin, multilanguage bindings, listextractverifyrandom-accessdry-run, FUSE, cloud+SSH backends, public format spec)

FOSS comparison (RFC-001 §5)

Tool Score Lang License Notes
*org* 29/40 Python+C BSD Most features overall
*estic* 28/40 Go BSD Cloudnative; *tackaligned (Go)* recommended
*-Zip* 26/40 C++ LGPL Best algorithms; no dedup
*paq* 25/40 C++ public domain Append-only versioned archive
*std* 20/40 C BSD Algorithm only; container would be ours
*z* 18/40 C public domain Smaller codebase

Layout (post-#008, canonical)

Top-level Go module with module path koder.dev/engines/compress/kzip. The bootstrap-era engine/<tool>_vendor/ subdirectory was retired in ticket #008 (20260513) — Koder additions sit as first-class siblings of packages inherited from restic. Full historical layout in RFC-001 §7.1; canonical layout in §7.2.

engines/compress/kzip/
├── go.mod                    ← module koder.dev/engines/compress/kzip
├── go.sum
├── Makefile                  ← `make build` → bin/kzip (GOWORK=off)
├── koder.toml
├── README.md, CHANGELOG.md, LICENSE
├── docs/
│   ├── rfcs/RFC-001-charter.md
│   └── upstream/restic-{NOTICE,CHANGELOG,README,Makefile}
├── cmd/kzip/                 ← cobra-based CLI entry
├── internal/                 ← all packages
│   ├── archiver/  backend/  repository/  crypto/  fuse/  …
│   ├── pubkey/               ← Koder addition (#005)
│   ├── filters/              ← Koder addition (#003 BCJ + delta)
│   ├── krecovery/            ← Koder addition (#007 + #009)
│   └── kzdispatch/           ← Koder addition (#010 magic-byte)
├── tests/{e2e,regression}/
└── backlog/{pending,in-progress,done}/

Backlog

Ticket Status Description
001 done Bootstrap from restic v0.18.1
002 *one* Landing kzip.koder.dev — live at https://kzip.koder.dev
003 *one* BCJ filter set (multi-arch executable preprocessing) — all phases shipped; closed alongside #015
004 done FUSE mount Linux native — exposed + tested (E2E in tests/e2e/fuse_mount_test.sh) + documented (docs/usage/fuse-mount.md)
005 *one* X25519 publickey crypto for repository keys — `kzip key genidentity + kzip init -ecipient + global -dentity; E2E in testse2ebackupwithpubkey_test.sh; spec format.kmd §3.4.1; doc docsusagepublickeymode.md`
006 *one* Windows snapshot mount via WinFsp — winfsp/go-winfsp pureGo binding; crossplatform mount lands w/ no cgo on Windows (closed 20260514)
007 *one* ReedSolomon recovery records — v1 sidecar mode; kzip recovery {encode,verify,repair}; E2E in tests/e2e/recovery_test.sh; doc `docsusagerecoveryrecords.md`
008 *one* Toplevel go.mod migration — koder.dev/engines/compress/kzip; vendor subdir retired; 389 files / 1101 imports rewritten; cmd/resticcmd/kzip; full test suite + 5 E2E + 4 regression scripts pass; regression guard `testsregression004toplevelmodule-path.test.sh` enforces no return to old import path
009 *one* ReedSolomon recovery records — packformatembedded variant (Type=5 per format.kmd §13.4); `kzip backup -recoveryrecords=N% + kzip recovery repairpack id`; pershard SHA256 for direct repair; E2E verified
010 done Magic-byte dispatch + .kz extension unification
011 done Vendor go.mod refresh + GNU-tar dispatch fix
012 done BCJ x86 filter
013 done BCJ ARM family filters (A32 / AArch64 / Thumb-2)
014 done BCJ miscarch filters (PPC BE/LE / SPARC / RISCV / IA-64)
015 *one* kzip backup --filter=<chain> wired into packwriter + reader — E2E byteidentical for delta:1 and bcj-x86,delta:1; Type=4 pack header per spec §13.3
016 done BCJ filter chain on tarofbinaries — content-sniff aware path
017 done Per-chunk content sniffer + FlagFilterSkipped bit (spec §13.3.1 / §14.4)
018 *one* Pluggable compressor backends umbrella — internal/compressor/ registry + KZC payload envelope (spec §13.5 / RFC002); CLI `compressor=name[:level] + kzip compressor list`; zstd kept bare for legacy bitcompat
019 *one* LZMA2 backend (pure-Go via ulikunitz/xz) — child of #018
020 *one* Brotli backend (pure-Go via andybalholm/brotli) — child of #018
021 *one* Bench baseline refresh + kzip-lzma2 / kzip-brotli variants in bench/bench.sh; bench/baseline.md populated for random-100m + text-100m (20260519)
022 *one* kzip migrate recompress — inplace perpack reencode to a different compressor backend, reuses prune's Repack primitive; blob IDs preserved; kzip check green before and after (RFC002 §6 follow-up)
023 *one* Detached signatures (kzip sign / kzip verify) — Ed25519, kzig-v1 sidecar (.kz.sig, 101 B) per format.kmd §13.6; separate signingidentity files (.kzkey/.kzpub); `pubkey + keyring` hardening; closes RFC001 §3 feature 33
024 *one* gzipcompatible singlefile CLI dropin (kzip compress / kzip decompress / implicitdispatch + gunzip symlink) — RFC001 §2 charter promise; internal/kzstream/, format spec §13.2 ratifiedonemission (202605-19)
025 *one* Prebuilt zstd dictionary — consumer side: `dict path + $KZIP_DICT env var; Codec.SetDict + new DictAware` interface; zstd backend implements percall encoder/decoder with WithEncoderDict. v1 contract: filebased, same dict on read+write; inrepo persistence + trainer → #027
027 *one* Dict persistence in repo (restic.Config.Dict) + Gonative trainer (kzip dict train / dict inspect / dict set) — wraps klauspost zstd.BuildDict; init -dict persists, autoloads on subsequent opens; closes #025 followup
026 *one* Framework for configurable integrity hash: internal/hashalg/ registry + SHA256 + Blake3 (32byte digests); Config.HashAlg persisted; kzip hash list + init --hash=<name> recorded. Full restic.Hash dispatch wiring deferred to #028 (evidence-gated)
028 *one* Wired internal/hashalg into restic.Hash via Design A (packagelevel state). `kzip init -hash=blake3 now produces real Blake3 blob IDs end-to-end. Dual-repo guard via ErrHashAlgConflict`. Spec divergence §14.5
029 pending Unblock Hub publication: regenerate meta/context/credentials/store-publish-key (BLOCKED on operator action — current file is binary garbage, .enc password missing)
030 *one (framework)* Bench rigor framework — --runs, --warmup, --cpu, --max-load flags + helpers (wait_for_quiet_machine, maybe_taskset, median_of). Pertool Nrun loop integration is subfollowup; today operator gets the gates + flags + scaffold for tagged-release bench prep.
031 *one* Fix TestStdinFromCommand* to resolve python via LookPath("python3") then fallback python, then skip cleanly. 4 tests now PASS on python3-only hosts (Debian 13, Fedora 41+, Arch).
032 *one* make release target — crosscompile 5 targets + SHA256SUMS manifest in one shot. `bin/kziptarget[.exe] + bin/SHA256SUMS. make release-clean` companion.
033 *one* Systemduser unit + timer for the soak harness — `testssoakkzipsoak.service (sandboxed, MemoryMax=4G) + kzipsoak.timer (weekly Sunday rollover). systemdanalyze -ser verify green. Operator-overridable knobs via Environment=. Install runbook in testssoakREADME.md`.

Refs

  • Charter RFC: engines/compress/kzip/docs/rfcs/RFC-001-charter.md
  • Selfhostedfirst policy: meta/docs/stack/policies/self-hosted-first.kmd
  • Brandscore spec: `metadocsstackspecsnaming/brandscore.kmd`
  • Module catalog entry: row 17c in meta/docs/stack/module-catalog.md
  • Related: meta/context/commands/k-backup.md (consumer of kzip once shipped)

Source: ../home/koder/dev/koder/meta/docs/stack/modules/kzip.md