Kzip — Format Spec (v1 — restic-compatible bootstrap)
Formato canônico de arquivos `.kz` gerados por `kzip`, o compactador da Koder Stack. Uma única extensão cobre todos os modos (single-file, multi-file tar, diretório); dispatch é por magic bytes, não por sufixo. Durante o bootstrap (v1), o formato é compatível byte-a-byte com o repositório restic v0.18.x — single source of truth. Divergências futuras requerem bump de versão major + ticket explícito + nota de incompatibilidade.
Kzip Format Specification — v1 (restic-compatible)
0. Stage and stability
Esta v1 do formato é *estic-compatible*durante o bootstrap. Documentos canônicos de referência:
restic design.rst v0.18.1— comportamento do repositóriorestic references/design.rst v0.18.1— formato wire-level
A spec abaixo resume o contrato; em caso de conflito, os documentos restic acima prevalecem (até a v1 ser ratificada).
1. File extension
Tudo que kzip emite usa a extensão *.kz**, independente do modo. Dispatch single-file vs multi-file vs sidecar é feito por **magic bytes** (ver §11), não pelo sufixo — em linha com .zst/.gz/.xz` (uma extensão por formato, payload variado).
| Forma | Modo | Notas |
|---|---|---|
<name>.kz (arquivo) |
Single-file stream | Frame de stream comprimido análogo a .zst. *eservado mas não emitido pelo bootstrap v0.1*(a definir no ticket #010). |
<name>.tar.kz (arquivo) |
Multi-file empacotado | Repositório kzip agregado num .tar (transporte/distribuição). Cliente desempacota antes de operar. |
<name>.kz/ (diretório) |
Multi-file desempacotado | Repositório kzip como árvore de arquivos (layout nativo de §2). Modo de trabalho típico. |
<name>.kz.rs (sidecar) |
Reed-Solomon parity sidecar | Ver §13.1. Sufixo .rs aplicado depois do .kz do arquivo coberto (backup.kz → backup.kz.rs). |
*istórico:*versões anteriores da spec usavam
.kzip(multifile) e05-12 antes de qualquer dado em produção; nenhum migration path necessário..kzrs(sidecar). Unificadas em.kz/.kz.rsem 2026*ota:*restic não usa extensão pra repositórios. Adotamos
.kz(sufixo de diretório ou tar) para signal in-name.
2. Repository layout
Um repositório kzip é um diretório com a seguinte estrutura. Todos os arquivos abaixo são *ncrypted*com a chave do repositório (exceto config que tem header não-cifrado pra detection).
<repo-root>/
├── config ← repository config (JSON, encrypted body)
├── keys/<id> ← chaves derivadas + Argon2 KDF
├── data/<2-hex>/<pack-id> ← packs (blobs concatenados + comprimidos)
├── index/<index-id> ← índices map (blob-id → pack-id+offset+length)
├── snapshots/<snap-id> ← snapshot metadata (timestamp, paths, tree-id)
├── locks/<lock-id> ← exclusion locks (TTL ~30 min)
└── HEAD ← (opcional) ponteiro para snapshot mais recenteTodos os IDs são *HA-256 hashes*em hex lowercase (64 chars).
3. Crypto
3.1 Master key derivation
- KDF: *rgon2id*(default
time=3, memory=64MiB, parallelism=1). - Salt: 64 bytes random per-key.
- Output: 64 bytes (32 encryption + 32 MAC).
3.2 Encryption
- Cipher: *ES
256CTR* - IV: 16 bytes random per-blob.
- MAC: *oly1305*(32 bytes) over (IV || ciphertext).
- AEAD construction: encrypt
thenMAC.
3.3 Format wire de cada blob criptografado
+-----------------------------+----------------+-------------------------+
| IV (16 bytes random) | ciphertext (N) | Poly1305 tag (32 bytes) |
+-----------------------------+----------------+-------------------------+Total = IV(16) + N + tag(32). N pode ser 0 (empty plaintext válido).
3.4 Chave de repositório
Cada keys/<id> contém JSON cifrado com a master key, com:
{
"created": "<RFC3339>",
"username": "<string>",
"hostname": "<string>",
"kdf": "argon2id",
"n": 524288, "r": 1, "p": 1, // Argon2 params
"salt": "<base64>",
"data": "<base64-encrypted-master-key>"
}A senha do operador deriva uma keyencryptionkey via Argon2id; essa KEK descriptografa data para obter a master key real do repositório. Múltiplas keys/ podem coexistir (multi-user / rotation).
3.4.1 Recipient mode — X25519 publickey wrap (ticket #005, ratified 202605-13)
Para automação sem operador presente (CI, cron, runner sem senha), um keys/<id> pode estar em *ecipient mode* a master key viaja envelopada para uma ou mais chaves públicas X25519 em vez de derivada por senha. O JSON omite kdf/N/r/p/salt/data e carrega:
{
"created": "<RFC3339>",
"username": "<string>",
"hostname": "<string>",
"recipients": [
{
"type": "x25519-kzip-v1",
"recipient_fingerprint": "<base64-32-bytes-X25519-pub>",
"envelope": "<base64-of-envelope-bytes>"
}
]
}Cada envelope é a saída de internal/pubkey.Recipient.Wrap:
+----------------+--------+---------------------------+
| ephemeral_pub | nonce | ciphertext ‖ Poly1305 tag |
| (32 bytes) |(12 B) | (variable) |
+----------------+--------+---------------------------+Chave de embrulho:
shared = ECDH(ephemeral_priv, recipient_pub)
key = HKDF-SHA256(shared, salt="", info="kzip-pubkey-v1")
ct/tag = ChaCha20-Poly1305.Seal(key, nonce, master_key_json, AD=ephemeral_pub)Properties:
- *eterminismo do payload*
master_key_jsoné o JSONmarshal daidêntico ao*crypto.Keymaster — bitdataque password-mode produziria. Ler viaOpenKeyWithIdentityproduz o mesmocrypto.KeyqueOpenKeydaria. - *phemeral key per Wrap* cada
envelopecarrega umaephemeral_pubfresca. Dois envelopes para o mesmo recipient do mesmo master são distintos (anti-dedup). - *EAD cobre o envelope inteiro*
ephemeral_pubé a associated data, então alterar qualquer byte (ephemeral_pub,nonce,ciphertext,tag) surface comoErrUnwrap. - *odos mútuamente exclusivos por arquivo* um
keys/<id>em recipient mode não temkdf, e viceversa.mode comOpenKeyrecusa recipientErrRecipientMode;OpenKeyWithIdentityrecusa password-mode.SearchKeyeSearchKeyWithIdentityskipam o modo errado silenciosamente, então um repo pode ter keys mistos. - *ingerprint routing*
OpenKeyWithIdentitysó tenta envelopes cujorecipient_fingerprintbate comidentity.Recipient().Bytes()— evita timing oracle sobre os outros recipients no mesmo arquivo.
File format on disk (identity/recipient files)
Operadores trocam o *ublic*(<prefix>.pub) e guardam o *rivate*(<prefix>.key, perm 0600). Layout texto, uma linha:
KZIP-IDENTITY-V1-<base64-StdEncoding(priv 32B)>
KZIP-RECIPIENT-V1-<base64-StdEncoding(pub 32B)>O prefixo distintivo previne footgun (priv em slot de pub e vice-versa): internal/pubkey.ReadIdentityFile / ReadRecipientFile retornam ErrBadFormat antes de qualquer decode.
CLI surface
| Comando | Função |
|---|---|
kzip key gen-identity --output <prefix> |
Gera <prefix>.key (priv 0600) + <prefix>.pub (pub 0644) |
kzip init --recipient <pub> (1+) |
Bootstrapa repo recipient-only — sem senha |
kzip key add-recipient --pubkey <pub> |
Adiciona recipient a repo existente (master vem do open atual) |
kzip --identity <key> <cmd…> |
Global flag — abre repo via X25519 ao invés de senha |
kzip key list |
Coluna Mode (password/recipient) + Fingerprints |
Divergência byte-compat
Recipientmode keysid são readable por restic v0.18.x como JSON, mas a falta de `kdfdata` faz o open path do restic falhar imediatamente ("KDF is not scrypt"). É uma divergência *orwardcompatprafrente* kzip lê o que restic escreve, restic não consegue abrir keys recipient-mode. Listada em §14.
3.4.2 Future — age compatibility (separate ticket)
Um envelope type: "age-v1" ficaria ortogonal: mesmo arquivo, recipient diferente. O ticket atual (#005) não implementa age compat pra evitar arrastar o header/armor format do age; manter a impl mínima até houver demanda concreta.
4. Pack files
Pack files agregam vários blobs num único arquivo para amortizar overhead I/O e melhorar compressão.
4.1 Layout
+---------------+---------------+-----+---------------+-------------------+
| blob 1 | blob 2 | ... | blob N | header (encrypted) |
+---------------+---------------+-----+---------------+-------------------+
↑
ends at EOF - 4
+----+
| H | header length (uint32 LE, last 4 bytes of file)
+----+4.2 Pack header (após decrypt)
type PackHeaderEntry struct {
Type byte // 0=data, 1=tree, 2=data-compressed, 3=tree-compressed
Length uint32 // length of ciphertext (= encrypted-and-possibly-compressed bytes)
// If Type ∈ {2, 3} an additional uint32 LE follows here:
// UncompressedLength uint32 — length of plaintext before compression
ID [32]byte // SHA-256 of plaintext
}Header inteiro = repeated PackHeaderEntry + cifrado AEAD.
*ype byte legend (revised 2026
0512 to match restic v0.18.x semantics):*values 0–3 are exhausted by the (DataBlobTreeBlob) × (compresseduncompressed) cross-product. The "padding (legacy)" reading in earlier drafts of this spec was incorrect; the byte 2 is *DataBlob compressed`*in restic v0.18.x and is reserved with that meaning here. Values 4–255 are available for kzip extensions (see §13).
4.3 Tipos de blob
| Type | Conteúdo | Compressed | Geração |
|---|---|---|---|
data (0) |
Chunk de arquivo (bytes brutos do arquivo após chunking) | no | Pelo backup, antes de cifrar |
tree (1) |
JSON serializado de uma árvore de diretório | no | Pelo backup, ao subir cada dir |
data-c (2) |
Chunk de arquivo — comprimido com zstd | yes | Pelo backup, antes de cifrar |
tree-c (3) |
JSON da árvore de diretório — comprimido com zstd | yes | Pelo backup, ao subir cada dir |
Compressed variants carry an extra 4byte LE allocate the decode buffer. Default compressor is zstd level 3 (see §10).UncompressedLength field in the header entry (between Length and ID) so the reader can pre
5. Content-defined chunking (CDC)
- Algoritmo: *abin fingerprint*sobre janela rolante.
- Polinomial: random per-repo (gerado no
init, salvo emconfig). - Tamanhos: min512 KiB, max8 MiB, target=1 MiB (defaults restic — podem ser tunáveis em RFC futura).
- Boundary: hash mod 2²⁰ =0 (ajustável para hit target).
Cada chunk vira um data blob (após dedup pelo hash).
6. Trees
Um tree blob é JSON serializado:
{
"nodes": [
{
"name": "filename",
"type": "file" | "dir" | "symlink" | "fifo" | "socket" | "blockdev" | "chardev",
"mode": "0644",
"mtime": "<RFC3339>",
"atime": "<RFC3339>",
"ctime": "<RFC3339>",
"uid": 1000, "gid": 1000,
"user": "koder", "group": "koder",
"size": 12345,
"content": ["<blob-id>", "<blob-id>"], // for files
"subtree": "<tree-id>", // for dirs
"linktarget": "<path>", // for symlinks
"extended_attributes": [{"name":"...","value":"<base64>"}]
}
]
}xattrs e ACLs são preservados via extended_attributes. Hard-links não são deduplicados explicitamente — mesma content array implica mesmo conteúdo, mas inode identity não é preservada.
7. Snapshots
Cada snapshots/<id> contém JSON cifrado:
{
"time": "<RFC3339>",
"tree": "<root-tree-id>",
"paths": ["/home/user/docs"],
"hostname": "host",
"username": "user",
"uid": 1000, "gid": 1000,
"tags": ["weekly", "automated"],
"parent": "<previous-snapshot-id>", // optional
"program_version": "kzip 0.1.0-bootstrap (restic-fork)"
}8. Indices
index/<id> é JSON cifrado mapping cada blob-id para sua localização:
{
"supersedes": ["<old-index-id>"],
"packs": [
{
"id": "<pack-id>",
"blobs": [
{
"id": "<blob-id>",
"type": "data" | "tree",
"offset": 0,
"length": 4194304,
"uncompressed_length": 5242880 // optional, for compressed blobs
}
]
}
]
}prune consolida múltiplos índices num só (substituindo via supersedes).
9. Locks
Exclusion locks em locks/<id>:
{
"time": "<RFC3339>",
"exclusive": true | false,
"hostname": "host",
"username": "user",
"pid": 12345
}TTL ~30 min; locks abandonados expiram. Stale locks detectados via PID liveness.
10. Compression
Blobs (data + tree) são comprimidos antes de cifrar. v1 suporta:
| Algorithm | Default | Notas |
|---|---|---|
| zstd level 3 | sim | balance perf/ratio default |
| zstd level 1 | opt-in | máxima velocidade |
| zstd level 11 | opt |
máxima compressão |
| nenhum | opt |
escapa quando dados já comprimidos |
LZMA, BWT, BCJ filters não suportados em v1 (planejados em ticket #003).
11. Magic numbers / detection
Como a extensão .kz é única (§1), o dispatch é pelo conteúdo. Dado um <name>.kz arbitrário, o CLI decide o modo na ordem:
- *irectory check*— se for diretório, é repositório multi-file desempacotado; abrir
<name>.kz/config. - *ar header*— se os primeiros 257..264 bytes contêm o magic
ustar\0(POSIX tar), é repositório empacotado; extrair para<name>.kz/(ou stream-process tar) antes de operar. - *idecar magic*— primeiros 4 bytes =
KZRS→ sidecar Reed-Solomon (ver §13.1). Sufixo canônico do arquivo:.kz.rs. - *tream frame magic*— primeiros 4 bytes =
KZ\x01\x00(single-file stream; ver §13.2 para o layout completo do header). Decode via stream decompressor. - *enão*— erro
KZIP-FORMAT-001("unrecognized kzip payload").
Outros pontos de detection:
- Pack files dentro de
<repo>/data/<2-hex>/: *em magic number explícito*— detection via tentativa de decrypt do header lido pelos últimos 4 bytes (length). - Config: *SON cifrado com header
"version": 2*(após decrypt). - Repository version atual: **(mesmo do restic v0.18.x).
12. Endianness
Todos campos numéricos binários são *ittle-endian*
13. Future extension hooks
A v1 reserva os seguintes campos para uso futuro sem quebrar compat:
PackHeaderEntry.Typevalues 0–3 are taken by restic blob semantics (see §4.2/§4.3). Value4= *ilterchain descriptor*(see §13.3, ratified 20260512). Values 5255 remain reserved for future extensions (signature blob, etc.).Snapshot.tagsaceita arbitrary strings para metadata Koderspecific (backup`).koder:repo=hub, `koder:role=daily- Config JSON aceita campos não
reconhecidos sem erro (forwardcompat) — kzip futuro pode adicionarsigning_key_id,recovery_records_enabled,bcj_filter_chain, etc.
13.1 Sidecar artifacts (outofband, not part of repo format)
Some kzip features write *idecar files*alongside repo artifacts without modifying the repo format. Sidecars are additive: a v1 reader/restic that doesn't recognize the sidecar simply ignores it.
*.kz.rs` — Reed-Solomon parity sidecar (kzip ticket #007 v1 sidecar mode):*
Filename convention: applied as a .rs suffix on top of the covered file's .kz suffix — backup.kz → backup.kz.rs. The internal magic stays KZRS regardless of filename. (Pre202605-12 builds wrote .kzrs; readers SHOULD accept both during the deprecation window per ticket #010.)
Layout (big-endian where applicable):
+-----+--------+--------+--------+----------+----------+----------+
| 4B | 1B | 1B | 1B | 4B BE | 32B | N×B |
| KZRS| ver=01 | dShard | pShard | dataSize | sha256 | parity |
+-----+--------+--------+--------+----------+----------+----------+- Magic
KZRS(0x4B 0x5A 0x52 0x53); version0x01. dShard + pShard ≤ 256(klauspost/reedsolomon constraint).parity=pShardshards ofceil(dataSize / dShard)bytes.- Generated by
kzip recovery encode <file>; consumed byverify/repair. - Out
ofband: removing all.kz.rsfiles leaves the repo intact and readable by stock restic.
The packformatembedded variant landed via kzip ticket #009 (ratified 20260513). It carries parity inside the pack header as a PackHeaderEntry.Type=5 blob with pershard SHA256 checksums — see §13.4 for the wire layout and §14.3 for the divergence note. The sidecar form continues to be supported in parallel for files outside the repo (e.g. raw deploy artifacts). (Pre20260512 drafts of this spec proposed Type=4 for the RS packembedded entry; Type=4 was reallocated to the filter-chain descriptor in §13.3 once the actual restic type space was audited.)
13.2 Singlefile stream header (.kz, ratified 20260512 via ticket #010; emitter shipped 202605-19 via ticket #024)
.kz, ratified 2026A singlefile byte fixed header, an opaque compressed payload, and a 4-byte trailer. The header is *ot*encrypted (the payload may be); §11 detection only requires the first 4 bytes..kz artifact has a 48
+-----+--------+--------+------+-----------+----------+---------+------+
| 4B | 1B | 1B | 2B | 8B LE | 32B | N bytes | 4B |
| KZ | ver=01 | comp | flags| uncomp_sz | sha256 | payload | tlen |
| \01 | | | LE | | (plain) | (compr.)| (LE) |
| \00 | | | | | | | |
+-----+--------+--------+------+-----------+----------+---------+------+| Field | Width | Encoding | Meaning |
|---|---|---|---|
magic |
4B | bytes | Exactly 0x4B 0x5A 0x01 0x00 (KZ, version-prefix, NUL). Distinguishes from KZRS (sidecar) and ustar0 (tar repo). |
ver |
1B | u8 | Header version. 0x01 is the only value defined; readers MUST reject other values. |
comp |
1B | u8 | Compressor: 0 = uncompressed; otherwise the RFCinternal/compressor/): 1zstd, 2lzma2, 3=brotli, 4–127 reserved first128–255 reserved downstream. Level is *ot*persisted (codecs auto-detect). |
flags |
2B | u16 LE | Bit flags. Bit 0 = encrypted_payload (payload is AEAD |
uncomp_sz |
8B | u64 LE | Plaintext length. 0 is valid (empty file). Cap = 2⁶³−1 (signed-int interop). |
sha256 |
32B | bytes | SHA |
payload |
N bytes | opaque | Compressed payload. If comp == 0, the payload is the raw plaintext (or AEAD-wrapped raw plaintext when flags.encrypted_payload). If comp > 0, the payload is the native bytes of the named codec (zstd frame, .xz stream, brotli stream). |
tlen |
4B | u32 LE | Trailer: byte length of payload (sanity vs. truncation; equals filesize − 52). |
Endianness rule of thumb: uncomp_sz, flags, and tlen are littleendian (matching §12 and the rest of the kzip packfile format). The sidecar .kz.rs blob is the *nly*bigendian field in the spec — preserved unchanged for resticformat alignment.
Constraints:
len(file) ≥ 52(header 48 + trailer 4 ≥ 52, with an empty payload allowed). Reader rejects shorter files withKZIP-FORMAT-002.tlen + 52 == len(file)MUST hold; otherwiseKZIP-FORMAT-003 ("truncated kz stream").- If
comp == 0,len(payload) == uncomp_sz(or, with encryption,len(payload) == uncomp_sz + 48per §3.3). sha256is computed over the *laintext* not the ciphertext or the compressed bytes. Reader verifies after decompress+decrypt.
Pre20260519 drafts of this section assigned only enum (comp to a zstd1zstd-1, 2zstd3, 11, 3=zstd16–31 reserved for nonzstd). That assignment was never emitted (the section was "reserved but not emitted" until #024) and is *uperseded*by the RFC002 CompressorID alignment above. The new assignment is the canonical one going forward — readers MUST follow this table.
CLI surface (kzip #024):
kzip compress <file> [--compressor=<name>[:level]]→ writes<file>.kzper the layout above.kzip decompress <file>.kz→ infers compressor fromcomp, validatessha256, writes<file>.kzip compress -c <file>/kzip decompress -c <file>.kz→ stdout, matching gzip semantics.- Bare
kzip <file>(no subcommand) routes by extension:.kzending → decompress; else → compress. - Symlinks named
gunzip,kunzip,kzdforce decompress regardless of extension.
13.3 Filterchain descriptor (PackHeaderEntry.Type = 4, ratified 202605-12 via ticket #003)
PackHeaderEntry.Type = 4, ratified 2026A pack file may include a *ilterchain descriptor*to document the precompression byte transforms (e.g. BCJ, delta) that were applied to its data/tree blobs before compression+encryption. The descriptor is one extra entry in the pack header, distinguishable from data (Type0) / tree (Type1) / legacy padding (Type=2) by its Type byte.
Wire format (after decrypt, little-endian):
+--------+------------+------------+------------+-----+-----------------------------+
| 0x04 | nfilters | flen[0] | filter[0] | ... | flen[n-1] | filter[n-1] |
| (Type) | (u16 LE) | (u16 LE) | (n bytes) | | (u16 LE) | (n bytes) |
+--------+------------+------------+------------+-----+-----------------------------+Each filter[i] is a UTF8 string of the form syntax produced by "<name>" or "<name>:<param>", exactly matching the chainChain.String() in engines/compress/kzip/internal/filters/ (post-#008 layout; was engine/restic_vendor/internal/filters/ during bootstrap). Names are stable identifiers from the filter registry:
| Name | Status | Origin |
|---|---|---|
delta |
shipped (ticket #003) | universal delta-encoding (out[i] = in[i] − in[i−d]) |
bcj-x86 |
stub (ticket #012) | xz7-zip x86 branchcall/jump |
bcj-arm, bcj-thumb, bcj-arm64 |
stub (ticket #013) | xz ARM variants |
bcj-ppc, bcj-ppc-le, bcj-ia64, bcj-sparc, bcj-riscv, bcj-riscv32 |
stub (ticket #014) | xz misc-arch variants |
Constraints:
nfilters ≤ 256— generous cap, no realistic chain exceeds 4.flen[i] ≤ 64— keeps each entry inside one cache line; sanity-bounds the descriptor.- The total descriptor is bounded above by
4 + 256 × (2 + 64) = 16900 bytes. - Filter names not present in the reader's registry MUST fail loudly (
KZIP-FORMAT-004 ("unknown filter '<name>' in pack header")). Treating unknown filters as no-ops would silently produce garbage on decode — never acceptable.
Backward-compat for readers without filter support:
- A v1 reader (restic v0.18.x or pre
#003 kzip) that encountersempty chains MUST guarantee any non-trivial filter was actually applied (the chain is the contract).Type=4in the pack header MUST skip the entry —Lengthtells it how many bytes to advance. Skipping is safe iff the pack file'sdata/treeblobs were *ot*transformed by any filter; producers that emitType=4with non - The reverse case (a kzip-#003 reader on a v1 pack without
Type=4) is the common case: empty chain implicit, no transform was applied, blobs decompress directly.
The encoding is intentionally textratherthanbytecode to keep the chain debuggable via -filter=` chain syntax.xxd on the encrypted header and to align with the CLI's `
13.3.1 Perblob filterskip flag (kzip ticket #017, 20260515)
The pack-header entry Type byte of a data/tree blob carries an *ptin highnibble flag field*in addition to the low-nibble type discriminator. The dispatch nibble (bits 0–3) stays compatible with the values defined in §4.3 (0DataBlob, 1TreeBlob, 2=DataBlob compressed, 3=TreeBlob compressed). Bits 4–7 are reserved for kzip extensions; the first defined flag is:
| Bit | Mask | Name | Semantics |
|---|---|---|---|
| 7 | 0x80 |
FlagFilterSkipped |
The chain declared in this pack's Type=4 descriptor was *ithheld*for this blob. The reader MUST decompress as usual but MUST NOT call chain.Reverse — the post-decompress buffer IS the original plaintext. |
| 4–6 | 0x70 |
reserved | Future kzip flags. Readers MUST mask them out (Type & 0xF0 carries the full flag field today; older readers that lacked this field encoded only bit 7). |
Why this exists: contentaware filters like `bcjx86 only pay off
when the whole chunk is the right kind of data. Content-defined
chunking can produce chunks that span tar headers, ELF .text, debug
info, and unrelated files (see ticket #016: tar-of-usrbin showed
a +17.6% regression with bcjx86,delta:1`). When the perchunk sniffer (internal/sniff/) classifies a chunk as not matching the declared chain, the writer (saveAndEncrypt) skips chain.Apply and sets bit 7 on the Type byte so the reader knows to skip chain.Reverse.
Backward-compat:
- Bit 7 is *pt-in via writer choice*— a kzip backup without a
content-aware filter chain never sets it. Repos written by kzip 0.1.0 (pre
#017) are byteidentical to repos written by kzip with #017 on empty/non-BCJ chains. - A pre
#017 kzip reader seesdata + flag)Type=0x82(compressedand falls into the
default → errorbranch of the entry-type switch, refusing to decode the pack. This is *orward-incompatible for opt-in repos only*— users who never invoke BCJ chains never hit it. If strict forward-compat is needed, gate emission of bit 7 behind a futurerepo.Versionbump (path C of #016 §13.3 design).
Encode side (writer, kzip ≥ 0.2.0):
chain := r.filterChain
flags := restic.BlobFlags(0)
if chain.NeedsSniffing() && t == DataBlob {
if !chain.Matches(sniff.Sniff(data)) {
flags |= restic.FlagFilterSkipped // skip chain.Apply
}
}
// pack header entry Type byte = (dispatch_type & 0x0F) | (flags & 0xF0)Decode side (reader, kzip ≥ 0.2.0):
encodedType := p[0]
b.Flags = restic.BlobFlags(encodedType & 0xF0)
tpe := encodedType & 0x0F // 0..3 — same as pre-#017
// ... decompress ...
if !b.Flags.Has(restic.FlagFilterSkipped) {
plaintext = chain.Reverse(plaintext)
}13.4 Packembedded recovery record (PackHeaderEntry.Type = 5, ratified 202605-13 via ticket #009)
PackHeaderEntry.Type = 5, ratified 2026A pack file may include a *eed-Solomon recovery record*that lets the data section be reconstructed in place after partial corruption. The record lives inside the encrypted pack header as a single Type=5 entry — distinct from the sidecar mode in §13.1, which keeps parity in a separate .kz.rs file.
Packheader wire format (after decrypt, littleendian):
+--------+----------------+------------------------------+
| 0x05 | u32 LE length | kzre-v1 payload (length B) |
+--------+----------------+------------------------------+The payload itself is the kzre-v1 blob produced by internal/krecovery.EncodeEmbedded:
+-------+-----+--------+--------+----------+----------+--------+----------+
| KZRE | ver | dShard | pShard | reserved | dataSize | sha256 | shard |
| (4B) | (1) | (1) | (1) | (1) | (4 LE) | (32B) | hashes |
+-------+-----+--------+--------+----------+----------+--------+----------+
+---------+
| parity |
| shards |
+---------+KZREmagic distinguishes from the sidecar'sKZRS.IsEmbedded(blob)is the cheap 4-byte peek.- Per
shard SHA256 covers every shard (data shards then parity shards). Repair hashes each shard, marks failures as nil, and reconstructs only whencorruptCount ≤ ParityShards— no brute-force suspect loop. - Little
endian length (endian for restic-format alignment.dataSize) matches §13.2 / §13.3. Sidecar mode stays big pickShards(ratio)mirrors the sidecar logic: aim for ~256 total shards, parity =clamp(ratio × 256, 1, 128), data =256 - parity.
Constraints:
- The Type=5 entry payload is capped at
MaxRecoveryEntrySize = 8 MiB. A 16 MiB pack at 50% ratio stays well under this; the cap is a sanity gate against bogus length fields. - Multiple Type=5 entries in the same header are rejected (one record per pack).
- Per
shard hashes are checked before any ReedSolomon math — corrupt shards are identified in O(N) rather than tried in turn.
Backward-compat for readers without recovery support:
- A v1 reader (restic v0.18.x or pre
#009 kzip) that encountersidentical with or without the recovery record (the parity blob never modifies blob bytes).Type=5MUST skip the entry — the u32 length tells it how many bytes to advance. Skipping is safe; the data section is byte - A kzip-#009 reader on a v1 pack without
Type=5is the common case: no record means no repair available, the pack reader returns whatever the underlying check produces.
CLI surface:
| Command | Function |
|---|---|
kzip backup --recovery-records=N% |
Emit Type=5 in every written pack (#009 R3) |
kzip recovery repair-pack <pack-id> |
Read Type=5, reconstruct corrupted data section, atomically replace the pack on the backend (#009 R4) |
kzip recovery {encode,verify,repair} continue to operate on sidecar .kz.rs files — distinct flow for files outside a repo.
13.5 KZC compressor envelope (payloadprefix, ratified 20260519 via ticket #018 / RFC002)
Perblob compressor descriptor for pluggable compressor backends (zstd / lzma2 / brotli, with reserved IDs for future firstparty backends). The descriptor lives *nside the encryptedandthencompressed payload* not in the packheader type space — strictly additive, no PackHeaderEntry.Type reallocated.
Post-decryption layout:
+-----+-----+-----+-----+-----+----------+----------+
| 'K' | 'Z' | 'C' | ver | CID | reserved | payload |
| 4B | 5A | 43 | 01 | N | 00 | ... |
+-----+-----+-----+-----+-----+----------+----------+Readerside dispatch (postdecryption, pre-decompression):
- Byte 0 =
0x28→ *are zstd frame*(legacy path, every pre-#018 blob). Decode with the zstd backend. - Bytes 0..3 =
0x4B 0x5A 0x43 0x01→ *ZC envelope* Parse byte 4 as CompressorID and dispatch to the registered backend; bytes 6+ are the compressor-native payload. - Anything else → corruption / unknown future format; surface as a typed error pointing the operator at
kzip versionandkzip migrate.
CompressorID registry (RFC-002 §3.3):
| ID | Name |
|---|---|
| 0 | reserved (do not emit) |
| 1 | zstd (never emitted in canonical kzip — zstd blobs are bare) |
| 2 | lzma2 |
| 3 | brotli |
| 4..127 | reserved (first-party backends) |
| 128..255 | reserved (downstream forks) |
Backward-compat:
- Zstd blobs (the default) stay bare. Pre
#018 readers see byteidentical bytes — the format is fully backward-compatible for the common case. - A pre
#018 reader encountering a KZCenveloped blob fails at the zstd frame check (zstd: invalid input) — the AEAD over the ciphertext catches any tampering before decompression, so silent mis-decode is impossible. - A post
#018 reader on a pre#018 pack: identical to the barezstd legacy path. T3 in RFC002 §8 asserts byte-identical restore.
Mixedbackend repositories are valid; selection is perSaveBlob, threaded via Repository.Options.Compressor. kzip check decodes every blob through the right backend automatically — operators do not need to track which packs are which.
CLI surface:
| Command | Function |
|---|---|
kzip backup --compressor=<name>[:level] |
Pick the backend for the new blobs in this backup |
kzip compressor list |
List registered backends with their IDs + level ranges |
$KZIP_COMPRESSOR |
Env |
Posture vs §14 divergence policy: this is a *onbreaking*addition — the encryptedpayload envelope is invisible to pack-header parsers. It does not constitute a §14 divergence; no repository.version bump.
13.6 Detachedsignature sidecar `kzigv1 (.kz.sig`, ratified 20260519 via ticket #023)
Outofband Ed25519 signature for any file (compressed or not). The sidecar lives next to the covered file with suffix .kz.sig — analogous to the .kz.rs parity sidecar from §13.1.
Wire format (fixed 101 bytes, little-endian where applicable):
+-----+--------+----------+--------------+
| 4B | 1B | 32B | 64B |
|KZIG | algo | pubkey | signature |
+-----+--------+----------+--------------+| Field | Width | Encoding | Meaning |
|---|---|---|---|
magic |
4B | bytes | Exactly 0x4B 0x5A 0x49 0x47 (KZIG). Distinguishes from KZRS (RS parity sidecar) and KZx01x00 (single-file stream). |
algo |
1B | u8 | Signature algorithm. 0x01 = Ed25519 is the only value defined; readers MUST reject other values. |
pubkey |
32B | bytes | Ed25519 public key of the signer. Embedded so verification works without an out--pubkey to pin a known signer. |
signature |
64B | bytes | Ed25519 signature over SHA |
Constraints:
len(file) == 101. Reader rejects shorter/longer files withkzsig: wrong sidecar size.algoMUST be0x01(Ed25519). Future algorithms add new values; writers MUST NOT emit a value the spec hasn't ratified.- The signed message is
SHA-256(covered-file). Not the file directly — this lets the signing primitive stay 32 bytes regardless of covered-file size.
Identity file formats (companion to §3.4.x recipient files):
Signing identity (private, 0600): KZIP-SIGNING-IDENTITY-V1-<base64-32B-seed>
Signing recipient (public, 0644): KZIP-SIGNING-RECIPIENT-V1-<base64-32B-pub>Suffixes .kzkey (private) and .kzpub (public) are the canonical extensions.
CLI surface (kzip #023):
| Command | Function |
|---|---|
kzip key gen-signing-identity --output <prefix> |
Mint a fresh Ed25519 keypair; writes <prefix>.kzkey + <prefix>.kzpub |
kzip sign --signing-identity <file> <path>... |
Produce <path>.kz.sig for each file |
kzip verify <path>... |
Validate <path>.kz.sig against the file; exit 0 only on full success |
kzip verify --pubkey <hex|file> |
Hardening: require the sidecar's embedded pubkey to match the pinned one |
kzip verify --keyring <dir> |
Hardening: require the sidecar's pubkey to be one of the .kzpub files in the directory |
Posture vs §14 divergence policy: the sidecar is outofband — it does NOT touch the packformat wire layout and is invisible to restic v0.18.x. This is *onbreaking*
Why a separate file from .kz.rs? Both sidecars sit beside the covered file but answer different questions. .kz.rs answers "can I reconstruct corrupted bytes?"; .kz.sig answers "who created these bytes?". Keeping them in distinct files lets operators ship parity without exposing the signing key, or sign without paying the parity overhead.
14. Divergence policy
Mudanças que *uebram*a compat byteabyte com restic v0.18.x:
- Exigem RFC novo (e.g.
kzip-RFC-002-format-divergence.md) com:- Justificativa (feature impossível com formato atual)
- Caminho de migração (forward-compat se possível)
- Bump de
repository.version(3 ou superior)
- Lifecycle:
--migratecommand para converter repos v2 → vN- Suporte de leitura para v2 mantido por ≥1 ano após bump
- Snapshot nota explícita:
kzip 0.X.0 introduced repo v3, see CHANGELOG
Mudanças que *antêm*byte-compat (não quebram):
- Adicionar campos JSON novos (forward-compat por convention)
- Adicionar tags Koder-specific
- Adicionar comprehensible algorithms (zstd higher levels, etc.)
Estes não exigem RFC, apenas atualização desta spec + entrada CHANGELOG.
14.5 Fifth active divergence: Configurable blobID hash algorithm (kzip ticket #026/#028, 202605-19)
A repository's blobID hash algorithm is selectable at init time via `kzip init -hash=name. The choice is persisted in Config.HashAlg`; subsequent operations dispatch through that algorithm transparently. Registered algorithms (kzip #026):
| Name | Digest size | Notes |
|---|---|---|
sha256 |
32 bytes | Default. restic-compatible (the only value restic upstream recognizes). |
blake3 |
32 bytes | ~2–3× faster than SHA-256 on modern x86. |
All registered algorithms produce 32byte digests, so the disk ID layout is unchanged. Variable[32]byte onlength digests (xxhash 8byte) would require a separate divergence and are deferred.
Config.HashAlg = "" or "sha256" is bytecompatible with restic v0.18.x (the legacy default). 256 on read and find no matching blob IDs, failing cleanly without silent corruption.Config.HashAlg = "blake3" is the breakage point: restic will fall through to SHA
Dispatch contract:
- Process
wide:level dispatch target consumed byrestic.SetHashAlg(name, fn)sets a packagerestic.Hash(data). The first repo opened in a process locks the algorithm; subsequent opens with a different algorithm returnErrHashAlgConflict. - Per
blob: not supported. The whole repo uses one algorithm. Crossalgorithm midlife migration is not supported in v1; the supported workflow is restoreintostaging + reinitwithnewhash + rebackup.
14.4 Fourth active divergence: Perblob filterskip flag (kzip ticket #017, 20260515)
The high nibble (0xF0) of the pack-header entry Type byte (§13.3.1) encodes opt-in BlobFlags. The first defined flag is FlagFilterSkipped = 0x80. A pre-#017 kzip reader or restic v0.18.x sees Type=0x80..0xFF and falls into default → invalid type — same onedirectional break as §14.1/§14.3. Optin via writer behavior: kzip without a contentaware filter chain (no `bcj* in
-ilter`) never sets the bit and emits byteidentical pre#017 packs.
Skipbylength contract: the high nibble doesn't change entry size; a generic skipunknowntypes reader (which neither restic nor kzip implement) could mask Type & 0x0F to recover the dispatch nibble and proceed. Current readers don't do that — they reject the byte outright on the strictfailbetterthansilent-corrupt principle (§14.3 rationale).
14.3 Third active divergence: Type=5 recovery records (kzip ticket #009, 20260513)
Pack files written with kzip backup --recovery-records=N% carry a Type5 entry per §13.4. Per §13 (header type byte legend), values 5-255 are reserved for kzip extensions; restic v0.18.x doesn't recognise this type byte. The current restic vendor parser returns invalid type 5 and aborts — same posture as Type4 (filter chain).
Skipbylength is the contract for both: a v1 reader that knows to advance by the u32encoded length field (Type4 and Type5 share this layout) can still read the regular blob entries. Stock restic doesn't do that — it returns an error on the first unknown type. The breakage is onedirectional: kzip can read packs that restic wrote (no Type=5 ever); restic can NOT read kzip packs with --recovery-records set.
Hard-fail is strictly safer than silent skip here: silent skip would mean a restic operator sees the pack as parseable but loses all repair capacity without knowing.
14.2 Second active divergence: recipientmode keys/id (kzip ticket #005, 202605-13)
Recipientmode key files (§3.4.1) carry valid but fails to open ("KDF is not scrypt"); it does NOT silently corrupt anything — readrecipients instead of kdf/data. A restic v0.18.x reader treats the file as JSONonly sideeffectfree. kzip provides mode repos (some password keys, some recipient keys) work transparently in both directions via the routing-error sentinels.OpenKeyWithIdentity to consume these files; mixed
This divergence is optin: users who don't pass `recipientbyte to kzip init` get a fully resticcompat repo. Once a recipientmode key file lands, the operator commits to kzipaware tooling for opening — but the pack files and snapshots remain bytecompat as long as no --filter= is used (cf. §14.1).
14.1 First active divergence: Type=4 (kzip ticket #015, 20260513)
The filterchain descriptor (§13.3) became the *irst bytecompat-breaking divergence shipped to users* A restic v0.18.x reader hitting a pack with a Type=4 entry returns invalid type 4 and aborts. This is acceptable because:
- The break only manifests on packs *ctually written with
--filter=…*No
filter packs (the bytecompat default) still parse cleanly under stock restic. - The reverse-pipeline contract (decrypt → decompress →
chain.Reverse)is the only way a no
filteraware reader could honour the data; without it, plaintext would silently be the filtered (incorrect) form. Hard failure is strictly safer than silent corruption. - The
repository.versionfield is unchanged (still 2). Migration isone
way: kzipwritten filtered repos cannot be downgraded to restic without rearchiving from source. Plain (no `-filter=`) kzip repos remain interchangeable with restic v0.18.x.
Future bytecompatbreaking extensions (e.g. Type=5 RS pack-embedded,
009) follow the same pattern: gated by an optin CLI surface, hardfail
on legacy readers, no silent-misread escape hatch.
- Tests de regressão em
engines/compress/kzip/tests/regression/devem incluir golden-hash compare contra binaries restic v0.18.x para garantir interop. - Tests em
engines/compress/kzip/{cmd,internal}*_test.go(test suite herdado de restic v0.18.1, agora toplevel desde #008) preservados comois.
Anexo A — Mapeamento kzip ↔ restic
Durante o bootstrap, todos os termos restic são equivalentes aos termos kzip. Mapeamento:
| Restic | Kzip | Notas |
|---|---|---|
restic init |
kzip init |
mesmo behavior |
| repository | repository (.kz/ ou .tar.kz) |
extensão única .kz, dispatch por magic (§11) |
| pack file | pack file | layout idêntico |
| blob | blob | idêntico |
| snapshot | snapshot | idêntico |
| Argon2id KDF | Argon2id KDF | idêntico |
| AES |
AES |
idêntico |
| Rabin chunker | Rabin chunker | idêntico |
Quando começar a divergir (ticket #003 BCJ filters, etc.), entradas serão adicionadas a este anexo com data de divergência.