Kzip — Format Spec (v1 — restic-compatible bootstrap)

mandatory

Formato canônico de arquivos `.kz` gerados por `kzip`, o compactador da Koder Stack. Uma única extensão cobre todos os modos (single-file, multi-file tar, diretório); dispatch é por magic bytes, não por sufixo. Durante o bootstrap (v1), o formato é compatível byte-a-byte com o repositório restic v0.18.x — single source of truth. Divergências futuras requerem bump de versão major + ticket explícito + nota de incompatibilidade.

Kzip Format Specification — v1 (restic-compatible)

0. Stage and stability

Esta v1 do formato é *estic-compatible*durante o bootstrap. Documentos canônicos de referência:

A spec abaixo resume o contrato; em caso de conflito, os documentos restic acima prevalecem (até a v1 ser ratificada).

1. File extension

Tudo que kzip emite usa a extensão *.kz**, independente do modo. Dispatch single-file vs multi-file vs sidecar é feito por **magic bytes** (ver §11), não pelo sufixo — em linha com .zst/.gz/.xz` (uma extensão por formato, payload variado).

Forma Modo Notas
<name>.kz (arquivo) Single-file stream Frame de stream comprimido análogo a .zst. *eservado mas não emitido pelo bootstrap v0.1*(a definir no ticket #010).
<name>.tar.kz (arquivo) Multi-file empacotado Repositório kzip agregado num .tar (transporte/distribuição). Cliente desempacota antes de operar.
<name>.kz/ (diretório) Multi-file desempacotado Repositório kzip como árvore de arquivos (layout nativo de §2). Modo de trabalho típico.
<name>.kz.rs (sidecar) Reed-Solomon parity sidecar Ver §13.1. Sufixo .rs aplicado depois do .kz do arquivo coberto (backup.kzbackup.kz.rs).

*istórico:*versões anteriores da spec usavam .kzip (multifile) e .kzrs (sidecar). Unificadas em .kz / .kz.rs em 202605-12 antes de qualquer dado em produção; nenhum migration path necessário.

*ota:*restic não usa extensão pra repositórios. Adotamos .kz (sufixo de diretório ou tar) para signal in-name.

2. Repository layout

Um repositório kzip é um diretório com a seguinte estrutura. Todos os arquivos abaixo são *ncrypted*com a chave do repositório (exceto config que tem header não-cifrado pra detection).

<repo-root>/
├── config                    ← repository config (JSON, encrypted body)
├── keys/<id>                 ← chaves derivadas + Argon2 KDF
├── data/<2-hex>/<pack-id>    ← packs (blobs concatenados + comprimidos)
├── index/<index-id>          ← índices map (blob-id → pack-id+offset+length)
├── snapshots/<snap-id>       ← snapshot metadata (timestamp, paths, tree-id)
├── locks/<lock-id>           ← exclusion locks (TTL ~30 min)
└── HEAD                      ← (opcional) ponteiro para snapshot mais recente

Todos os IDs são *HA-256 hashes*em hex lowercase (64 chars).

3. Crypto

3.1 Master key derivation

  • KDF: *rgon2id*(default time=3, memory=64MiB, parallelism=1).
  • Salt: 64 bytes random per-key.
  • Output: 64 bytes (32 encryption + 32 MAC).

3.2 Encryption

  • Cipher: *ES256CTR*
  • IV: 16 bytes random per-blob.
  • MAC: *oly1305*(32 bytes) over (IV || ciphertext).
  • AEAD construction: encryptthenMAC.

3.3 Format wire de cada blob criptografado

+-----------------------------+----------------+-------------------------+
| IV (16 bytes random)       | ciphertext (N) | Poly1305 tag (32 bytes) |
+-----------------------------+----------------+-------------------------+

Total = IV(16) + N + tag(32). N pode ser 0 (empty plaintext válido).

3.4 Chave de repositório

Cada keys/<id> contém JSON cifrado com a master key, com:

{
  "created": "<RFC3339>",
  "username": "<string>",
  "hostname": "<string>",
  "kdf": "argon2id",
  "n": 524288, "r": 1, "p": 1,    // Argon2 params
  "salt": "<base64>",
  "data": "<base64-encrypted-master-key>"
}

A senha do operador deriva uma keyencryptionkey via Argon2id; essa KEK descriptografa data para obter a master key real do repositório. Múltiplas keys/ podem coexistir (multi-user / rotation).

3.4.1 Recipient mode — X25519 publickey wrap (ticket #005, ratified 202605-13)

Para automação sem operador presente (CI, cron, runner sem senha), um keys/<id> pode estar em *ecipient mode* a master key viaja envelopada para uma ou mais chaves públicas X25519 em vez de derivada por senha. O JSON omite kdf/N/r/p/salt/data e carrega:

{
  "created": "<RFC3339>",
  "username": "<string>",
  "hostname": "<string>",
  "recipients": [
    {
      "type": "x25519-kzip-v1",
      "recipient_fingerprint": "<base64-32-bytes-X25519-pub>",
      "envelope": "<base64-of-envelope-bytes>"
    }
  ]
}

Cada envelope é a saída de internal/pubkey.Recipient.Wrap:

+----------------+--------+---------------------------+
| ephemeral_pub  | nonce  | ciphertext ‖ Poly1305 tag |
|   (32 bytes)   |(12 B)  |        (variable)         |
+----------------+--------+---------------------------+

Chave de embrulho:

shared = ECDH(ephemeral_priv, recipient_pub)
key    = HKDF-SHA256(shared, salt="", info="kzip-pubkey-v1")
ct/tag = ChaCha20-Poly1305.Seal(key, nonce, master_key_json, AD=ephemeral_pub)

Properties:

  1. *eterminismo do payload* master_key_json é o JSONmarshal da *crypto.Key master — bitidêntico ao data que password-mode produziria. Ler via OpenKeyWithIdentity produz o mesmo crypto.Key que OpenKey daria.
  2. *phemeral key per Wrap* cada envelope carrega uma ephemeral_pub fresca. Dois envelopes para o mesmo recipient do mesmo master são distintos (anti-dedup).
  3. *EAD cobre o envelope inteiro* ephemeral_pub é a associated data, então alterar qualquer byte (ephemeral_pub, nonce, ciphertext, tag) surface como ErrUnwrap.
  4. *odos mútuamente exclusivos por arquivo* um keys/<id> em recipient mode não tem kdf, e viceversa. OpenKey recusa recipientmode com ErrRecipientMode; OpenKeyWithIdentity recusa password-mode. SearchKey e SearchKeyWithIdentity skipam o modo errado silenciosamente, então um repo pode ter keys mistos.
  5. *ingerprint routing* OpenKeyWithIdentity só tenta envelopes cujo recipient_fingerprint bate com identity.Recipient().Bytes() — evita timing oracle sobre os outros recipients no mesmo arquivo.

File format on disk (identity/recipient files)

Operadores trocam o *ublic*(<prefix>.pub) e guardam o *rivate*(<prefix>.key, perm 0600). Layout texto, uma linha:

KZIP-IDENTITY-V1-<base64-StdEncoding(priv 32B)>
KZIP-RECIPIENT-V1-<base64-StdEncoding(pub 32B)>

O prefixo distintivo previne footgun (priv em slot de pub e vice-versa): internal/pubkey.ReadIdentityFile / ReadRecipientFile retornam ErrBadFormat antes de qualquer decode.

CLI surface

Comando Função
kzip key gen-identity --output <prefix> Gera <prefix>.key (priv 0600) + <prefix>.pub (pub 0644)
kzip init --recipient <pub> (1+) Bootstrapa repo recipient-only — sem senha
kzip key add-recipient --pubkey <pub> Adiciona recipient a repo existente (master vem do open atual)
kzip --identity <key> <cmd…> Global flag — abre repo via X25519 ao invés de senha
kzip key list Coluna Mode (password/recipient) + Fingerprints

Divergência byte-compat

Recipientmode keysid são readable por restic v0.18.x como JSON, mas a falta de `kdfdata` faz o open path do restic falhar imediatamente ("KDF is not scrypt"). É uma divergência *orwardcompatprafrente* kzip lê o que restic escreve, restic não consegue abrir keys recipient-mode. Listada em §14.

3.4.2 Future — age compatibility (separate ticket)

Um envelope type: "age-v1" ficaria ortogonal: mesmo arquivo, recipient diferente. O ticket atual (#005) não implementa age compat pra evitar arrastar o header/armor format do age; manter a impl mínima até houver demanda concreta.

4. Pack files

Pack files agregam vários blobs num único arquivo para amortizar overhead I/O e melhorar compressão.

4.1 Layout

+---------------+---------------+-----+---------------+-------------------+
| blob 1        | blob 2        | ... | blob N        | header (encrypted) |
+---------------+---------------+-----+---------------+-------------------+
                                                       ↑
                                                  ends at EOF - 4
+----+
| H  | header length (uint32 LE, last 4 bytes of file)
+----+

4.2 Pack header (após decrypt)

type PackHeaderEntry struct {
    Type   byte    // 0=data, 1=tree, 2=data-compressed, 3=tree-compressed
    Length uint32  // length of ciphertext (= encrypted-and-possibly-compressed bytes)
    // If Type ∈ {2, 3} an additional uint32 LE follows here:
    //   UncompressedLength uint32 — length of plaintext before compression
    ID     [32]byte // SHA-256 of plaintext
}

Header inteiro = repeated PackHeaderEntry + cifrado AEAD.

*ype byte legend (revised 20260512 to match restic v0.18.x semantics):*values 0–3 are exhausted by the (DataBlobTreeBlob) × (compresseduncompressed) cross-product. The "padding (legacy)" reading in earlier drafts of this spec was incorrect; the byte 2 is *DataBlob compressed`*in restic v0.18.x and is reserved with that meaning here. Values 4–255 are available for kzip extensions (see §13).

4.3 Tipos de blob

Type Conteúdo Compressed Geração
data (0) Chunk de arquivo (bytes brutos do arquivo após chunking) no Pelo backup, antes de cifrar
tree (1) JSON serializado de uma árvore de diretório no Pelo backup, ao subir cada dir
data-c (2) Chunk de arquivo — comprimido com zstd yes Pelo backup, antes de cifrar
tree-c (3) JSON da árvore de diretório — comprimido com zstd yes Pelo backup, ao subir cada dir

Compressed variants carry an extra 4byte LE UncompressedLength field in the header entry (between Length and ID) so the reader can preallocate the decode buffer. Default compressor is zstd level 3 (see §10).

5. Content-defined chunking (CDC)

  • Algoritmo: *abin fingerprint*sobre janela rolante.
  • Polinomial: random per-repo (gerado no init, salvo em config).
  • Tamanhos: min512 KiB, max8 MiB, target=1 MiB (defaults restic — podem ser tunáveis em RFC futura).
  • Boundary: hash mod 2²⁰ =0 (ajustável para hit target).

Cada chunk vira um data blob (após dedup pelo hash).

6. Trees

Um tree blob é JSON serializado:

{
  "nodes": [
    {
      "name": "filename",
      "type": "file" | "dir" | "symlink" | "fifo" | "socket" | "blockdev" | "chardev",
      "mode": "0644",
      "mtime": "<RFC3339>",
      "atime": "<RFC3339>",
      "ctime": "<RFC3339>",
      "uid": 1000, "gid": 1000,
      "user": "koder", "group": "koder",
      "size": 12345,
      "content": ["<blob-id>", "<blob-id>"],         // for files
      "subtree": "<tree-id>",                         // for dirs
      "linktarget": "<path>",                         // for symlinks
      "extended_attributes": [{"name":"...","value":"<base64>"}]
    }
  ]
}

xattrs e ACLs são preservados via extended_attributes. Hard-links não são deduplicados explicitamente — mesma content array implica mesmo conteúdo, mas inode identity não é preservada.

7. Snapshots

Cada snapshots/<id> contém JSON cifrado:

{
  "time": "<RFC3339>",
  "tree": "<root-tree-id>",
  "paths": ["/home/user/docs"],
  "hostname": "host",
  "username": "user",
  "uid": 1000, "gid": 1000,
  "tags": ["weekly", "automated"],
  "parent": "<previous-snapshot-id>",   // optional
  "program_version": "kzip 0.1.0-bootstrap (restic-fork)"
}

8. Indices

index/<id> é JSON cifrado mapping cada blob-id para sua localização:

{
  "supersedes": ["<old-index-id>"],
  "packs": [
    {
      "id": "<pack-id>",
      "blobs": [
        {
          "id": "<blob-id>",
          "type": "data" | "tree",
          "offset": 0,
          "length": 4194304,
          "uncompressed_length": 5242880   // optional, for compressed blobs
        }
      ]
    }
  ]
}

prune consolida múltiplos índices num só (substituindo via supersedes).

9. Locks

Exclusion locks em locks/<id>:

{
  "time": "<RFC3339>",
  "exclusive": true | false,
  "hostname": "host",
  "username": "user",
  "pid": 12345
}

TTL ~30 min; locks abandonados expiram. Stale locks detectados via PID liveness.

10. Compression

Blobs (data + tree) são comprimidos antes de cifrar. v1 suporta:

Algorithm Default Notas
zstd level 3 sim balance perf/ratio default
zstd level 1 opt-in máxima velocidade
zstd level 11 optin (`-compression max`) máxima compressão
nenhum optin (`-compression off`) escapa quando dados já comprimidos

LZMA, BWT, BCJ filters não suportados em v1 (planejados em ticket #003).

11. Magic numbers / detection

Como a extensão .kz é única (§1), o dispatch é pelo conteúdo. Dado um <name>.kz arbitrário, o CLI decide o modo na ordem:

  1. *irectory check*— se for diretório, é repositório multi-file desempacotado; abrir <name>.kz/config.
  2. *ar header*— se os primeiros 257..264 bytes contêm o magic ustar\0 (POSIX tar), é repositório empacotado; extrair para <name>.kz/ (ou stream-process tar) antes de operar.
  3. *idecar magic*— primeiros 4 bytes = KZRS → sidecar Reed-Solomon (ver §13.1). Sufixo canônico do arquivo: .kz.rs.
  4. *tream frame magic*— primeiros 4 bytes = KZ\x01\x00 (single-file stream; ver §13.2 para o layout completo do header). Decode via stream decompressor.
  5. *enão*— erro KZIP-FORMAT-001 ("unrecognized kzip payload").

Outros pontos de detection:

  • Pack files dentro de <repo>/data/<2-hex>/: *em magic number explícito*— detection via tentativa de decrypt do header lido pelos últimos 4 bytes (length).
  • Config: *SON cifrado com header "version": 2*(após decrypt).
  • Repository version atual: **(mesmo do restic v0.18.x).

12. Endianness

Todos campos numéricos binários são *ittle-endian*

13. Future extension hooks

A v1 reserva os seguintes campos para uso futuro sem quebrar compat:

  • PackHeaderEntry.Type values 0–3 are taken by restic blob semantics (see §4.2/§4.3). Value 4 = *ilterchain descriptor*(see §13.3, ratified 20260512). Values 5255 remain reserved for future extensions (signature blob, etc.).
  • Snapshot.tags aceita arbitrary strings para metadata Koderspecific (koder:repo=hub, `koder:role=dailybackup`).
  • Config JSON aceita campos nãoreconhecidos sem erro (forwardcompat) — kzip futuro pode adicionar signing_key_id, recovery_records_enabled, bcj_filter_chain, etc.

13.1 Sidecar artifacts (outofband, not part of repo format)

Some kzip features write *idecar files*alongside repo artifacts without modifying the repo format. Sidecars are additive: a v1 reader/restic that doesn't recognize the sidecar simply ignores it.

*.kz.rs` — Reed-Solomon parity sidecar (kzip ticket #007 v1 sidecar mode):*

Filename convention: applied as a .rs suffix on top of the covered file's .kz suffix — backup.kzbackup.kz.rs. The internal magic stays KZRS regardless of filename. (Pre202605-12 builds wrote .kzrs; readers SHOULD accept both during the deprecation window per ticket #010.)

Layout (big-endian where applicable):

+-----+--------+--------+--------+----------+----------+----------+
| 4B  |  1B    |  1B    |  1B    |   4B BE  |   32B    |  N×B     |
| KZRS| ver=01 | dShard | pShard | dataSize | sha256   |  parity  |
+-----+--------+--------+--------+----------+----------+----------+
  • Magic KZRS (0x4B 0x5A 0x52 0x53); version 0x01.
  • dShard + pShard ≤ 256 (klauspost/reedsolomon constraint).
  • parity = pShard shards of ceil(dataSize / dShard) bytes.
  • Generated by kzip recovery encode <file>; consumed by verify/repair.
  • Outofband: removing all .kz.rs files leaves the repo intact and readable by stock restic.

The packformatembedded variant landed via kzip ticket #009 (ratified 20260513). It carries parity inside the pack header as a PackHeaderEntry.Type=5 blob with pershard SHA256 checksums — see §13.4 for the wire layout and §14.3 for the divergence note. The sidecar form continues to be supported in parallel for files outside the repo (e.g. raw deploy artifacts). (Pre20260512 drafts of this spec proposed Type=4 for the RS packembedded entry; Type=4 was reallocated to the filter-chain descriptor in §13.3 once the actual restic type space was audited.)

13.2 Singlefile stream header (.kz, ratified 20260512 via ticket #010; emitter shipped 202605-19 via ticket #024)

A singlefile .kz artifact has a 48byte fixed header, an opaque compressed payload, and a 4-byte trailer. The header is *ot*encrypted (the payload may be); §11 detection only requires the first 4 bytes.

+-----+--------+--------+------+-----------+----------+---------+------+
| 4B  |  1B    |  1B    |  2B  |   8B LE   |   32B    | N bytes |  4B  |
| KZ  | ver=01 | comp   | flags| uncomp_sz | sha256   | payload | tlen |
| \01 |        |        | LE   |           | (plain)  | (compr.)| (LE) |
| \00 |        |        |      |           |          |         |      |
+-----+--------+--------+------+-----------+----------+---------+------+
Field Width Encoding Meaning
magic 4B bytes Exactly 0x4B 0x5A 0x01 0x00 (KZ, version-prefix, NUL). Distinguishes from KZRS (sidecar) and ustar0 (tar repo).
ver 1B u8 Header version. 0x01 is the only value defined; readers MUST reject other values.
comp 1B u8 Compressor: 0 = uncompressed; otherwise the RFC002 CompressorID (§13.5 / internal/compressor/): 1zstd, 2lzma2, 3=brotli, 4127 reserved firstparty, 128255 reserved downstream. Level is *ot*persisted (codecs auto-detect).
flags 2B u16 LE Bit flags. Bit 0 = encrypted_payload (payload is AEADwrapped per §3.3); bits 1–15 reserved (MUST be zero on write; readers MUST accept any value to allow forwardcompat additions).
uncomp_sz 8B u64 LE Plaintext length. 0 is valid (empty file). Cap = 2⁶³−1 (signed-int interop).
sha256 32B bytes SHA256 of plaintext (precompression, pre-encryption).
payload N bytes opaque Compressed payload. If comp == 0, the payload is the raw plaintext (or AEAD-wrapped raw plaintext when flags.encrypted_payload). If comp > 0, the payload is the native bytes of the named codec (zstd frame, .xz stream, brotli stream).
tlen 4B u32 LE Trailer: byte length of payload (sanity vs. truncation; equals filesize − 52).

Endianness rule of thumb: uncomp_sz, flags, and tlen are littleendian (matching §12 and the rest of the kzip packfile format). The sidecar .kz.rs blob is the *nly*bigendian field in the spec — preserved unchanged for resticformat alignment.

Constraints:

  • len(file) ≥ 52 (header 48 + trailer 4 ≥ 52, with an empty payload allowed). Reader rejects shorter files with KZIP-FORMAT-002.
  • tlen + 52 == len(file) MUST hold; otherwise KZIP-FORMAT-003 ("truncated kz stream").
  • If comp == 0, len(payload) == uncomp_sz (or, with encryption, len(payload) == uncomp_sz + 48 per §3.3).
  • sha256 is computed over the *laintext* not the ciphertext or the compressed bytes. Reader verifies after decompress+decrypt.

Pre20260519 drafts of this section assigned comp to a zstdonly enum (1zstd-1, 2zstd3, 3=zstd11, 16–31 reserved for nonzstd). That assignment was never emitted (the section was "reserved but not emitted" until #024) and is *uperseded*by the RFC002 CompressorID alignment above. The new assignment is the canonical one going forward — readers MUST follow this table.

CLI surface (kzip #024):

  • kzip compress <file> [--compressor=<name>[:level]] → writes <file>.kz per the layout above.
  • kzip decompress <file>.kz → infers compressor from comp, validates sha256, writes <file>.
  • kzip compress -c <file> / kzip decompress -c <file>.kz → stdout, matching gzip semantics.
  • Bare kzip <file> (no subcommand) routes by extension: .kz ending → decompress; else → compress.
  • Symlinks named gunzip, kunzip, kzd force decompress regardless of extension.

13.3 Filterchain descriptor (PackHeaderEntry.Type = 4, ratified 202605-12 via ticket #003)

A pack file may include a *ilterchain descriptor*to document the precompression byte transforms (e.g. BCJ, delta) that were applied to its data/tree blobs before compression+encryption. The descriptor is one extra entry in the pack header, distinguishable from data (Type0) / tree (Type1) / legacy padding (Type=2) by its Type byte.

Wire format (after decrypt, little-endian):

+--------+------------+------------+------------+-----+-----------------------------+
| 0x04   | nfilters   | flen[0]    | filter[0]  | ... | flen[n-1] | filter[n-1]    |
| (Type) | (u16 LE)   | (u16 LE)   | (n bytes)  |     | (u16 LE)  | (n bytes)      |
+--------+------------+------------+------------+-----+-----------------------------+

Each filter[i] is a UTF8 string of the form "<name>" or "<name>:<param>", exactly matching the chainsyntax produced by Chain.String() in engines/compress/kzip/internal/filters/ (post-#008 layout; was engine/restic_vendor/internal/filters/ during bootstrap). Names are stable identifiers from the filter registry:

Name Status Origin
delta shipped (ticket #003) universal delta-encoding (out[i] = in[i] − in[i−d])
bcj-x86 stub (ticket #012) xz7-zip x86 branchcall/jump
bcj-arm, bcj-thumb, bcj-arm64 stub (ticket #013) xz ARM variants
bcj-ppc, bcj-ppc-le, bcj-ia64, bcj-sparc, bcj-riscv, bcj-riscv32 stub (ticket #014) xz misc-arch variants

Constraints:

  • nfilters ≤ 256 — generous cap, no realistic chain exceeds 4.
  • flen[i] ≤ 64 — keeps each entry inside one cache line; sanity-bounds the descriptor.
  • The total descriptor is bounded above by 4 + 256 × (2 + 64) = 16900 bytes.
  • Filter names not present in the reader's registry MUST fail loudly (KZIP-FORMAT-004 ("unknown filter '<name>' in pack header")). Treating unknown filters as no-ops would silently produce garbage on decode — never acceptable.

Backward-compat for readers without filter support:

  • A v1 reader (restic v0.18.x or pre#003 kzip) that encounters Type=4 in the pack header MUST skip the entry — Length tells it how many bytes to advance. Skipping is safe iff the pack file's data/tree blobs were *ot*transformed by any filter; producers that emit Type=4 with nonempty chains MUST guarantee any non-trivial filter was actually applied (the chain is the contract).
  • The reverse case (a kzip-#003 reader on a v1 pack without Type=4) is the common case: empty chain implicit, no transform was applied, blobs decompress directly.

The encoding is intentionally textratherthanbytecode to keep the chain debuggable via xxd on the encrypted header and to align with the CLI's `-filter=` chain syntax.

13.3.1 Perblob filterskip flag (kzip ticket #017, 20260515)

The pack-header entry Type byte of a data/tree blob carries an *ptin highnibble flag field*in addition to the low-nibble type discriminator. The dispatch nibble (bits 0–3) stays compatible with the values defined in §4.3 (0DataBlob, 1TreeBlob, 2=DataBlob compressed, 3=TreeBlob compressed). Bits 4–7 are reserved for kzip extensions; the first defined flag is:

Bit Mask Name Semantics
7 0x80 FlagFilterSkipped The chain declared in this pack's Type=4 descriptor was *ithheld*for this blob. The reader MUST decompress as usual but MUST NOT call chain.Reverse — the post-decompress buffer IS the original plaintext.
4–6 0x70 reserved Future kzip flags. Readers MUST mask them out (Type & 0xF0 carries the full flag field today; older readers that lacked this field encoded only bit 7).

Why this exists: contentaware filters like `bcjx86 only pay off when the whole chunk is the right kind of data. Content-defined chunking can produce chunks that span tar headers, ELF .text, debug info, and unrelated files (see ticket #016: tar-of-usrbin showed a +17.6% regression with bcjx86,delta:1`). When the perchunk sniffer (internal/sniff/) classifies a chunk as not matching the declared chain, the writer (saveAndEncrypt) skips chain.Apply and sets bit 7 on the Type byte so the reader knows to skip chain.Reverse.

Backward-compat:

  • Bit 7 is *pt-in via writer choice*— a kzip backup without a

    content-aware filter chain never sets it. Repos written by kzip 0.1.0 (pre#017) are byteidentical to repos written by kzip with #017 on empty/non-BCJ chains.

  • A pre#017 kzip reader sees Type=0x82 (compresseddata + flag)

    and falls into the default → error branch of the entry-type switch, refusing to decode the pack. This is *orward-incompatible for opt-in repos only*— users who never invoke BCJ chains never hit it. If strict forward-compat is needed, gate emission of bit 7 behind a future repo.Version bump (path C of #016 §13.3 design).

Encode side (writer, kzip ≥ 0.2.0):

chain := r.filterChain
flags := restic.BlobFlags(0)
if chain.NeedsSniffing() && t == DataBlob {
    if !chain.Matches(sniff.Sniff(data)) {
        flags |= restic.FlagFilterSkipped  // skip chain.Apply
    }
}
// pack header entry Type byte = (dispatch_type & 0x0F) | (flags & 0xF0)

Decode side (reader, kzip ≥ 0.2.0):

encodedType := p[0]
b.Flags = restic.BlobFlags(encodedType & 0xF0)
tpe := encodedType & 0x0F  // 0..3 — same as pre-#017
// ... decompress ...
if !b.Flags.Has(restic.FlagFilterSkipped) {
    plaintext = chain.Reverse(plaintext)
}

13.4 Packembedded recovery record (PackHeaderEntry.Type = 5, ratified 202605-13 via ticket #009)

A pack file may include a *eed-Solomon recovery record*that lets the data section be reconstructed in place after partial corruption. The record lives inside the encrypted pack header as a single Type=5 entry — distinct from the sidecar mode in §13.1, which keeps parity in a separate .kz.rs file.

Packheader wire format (after decrypt, littleendian):

+--------+----------------+------------------------------+
| 0x05   | u32 LE length  | kzre-v1 payload (length B)   |
+--------+----------------+------------------------------+

The payload itself is the kzre-v1 blob produced by internal/krecovery.EncodeEmbedded:

+-------+-----+--------+--------+----------+----------+--------+----------+
| KZRE  | ver | dShard | pShard | reserved | dataSize | sha256 | shard    |
| (4B)  | (1) |  (1)   |  (1)   |   (1)    | (4 LE)   | (32B)  | hashes   |
+-------+-----+--------+--------+----------+----------+--------+----------+

                                                      +---------+
                                                      | parity  |
                                                      | shards  |
                                                      +---------+
  • KZRE magic distinguishes from the sidecar's KZRS. IsEmbedded(blob) is the cheap 4-byte peek.
  • Pershard SHA256 covers every shard (data shards then parity shards). Repair hashes each shard, marks failures as nil, and reconstructs only when corruptCount ≤ ParityShards — no brute-force suspect loop.
  • Littleendian length (dataSize) matches §13.2 / §13.3. Sidecar mode stays bigendian for restic-format alignment.
  • pickShards(ratio) mirrors the sidecar logic: aim for ~256 total shards, parity = clamp(ratio × 256, 1, 128), data = 256 - parity.

Constraints:

  • The Type=5 entry payload is capped at MaxRecoveryEntrySize = 8 MiB. A 16 MiB pack at 50% ratio stays well under this; the cap is a sanity gate against bogus length fields.
  • Multiple Type=5 entries in the same header are rejected (one record per pack).
  • Pershard hashes are checked before any ReedSolomon math — corrupt shards are identified in O(N) rather than tried in turn.

Backward-compat for readers without recovery support:

  • A v1 reader (restic v0.18.x or pre#009 kzip) that encounters Type=5 MUST skip the entry — the u32 length tells it how many bytes to advance. Skipping is safe; the data section is byteidentical with or without the recovery record (the parity blob never modifies blob bytes).
  • A kzip-#009 reader on a v1 pack without Type=5 is the common case: no record means no repair available, the pack reader returns whatever the underlying check produces.

CLI surface:

Command Function
kzip backup --recovery-records=N% Emit Type=5 in every written pack (#009 R3)
kzip recovery repair-pack <pack-id> Read Type=5, reconstruct corrupted data section, atomically replace the pack on the backend (#009 R4)

kzip recovery {encode,verify,repair} continue to operate on sidecar .kz.rs files — distinct flow for files outside a repo.

13.5 KZC compressor envelope (payloadprefix, ratified 20260519 via ticket #018 / RFC002)

Perblob compressor descriptor for pluggable compressor backends (zstd / lzma2 / brotli, with reserved IDs for future firstparty backends). The descriptor lives *nside the encryptedandthencompressed payload* not in the packheader type space — strictly additive, no PackHeaderEntry.Type reallocated.

Post-decryption layout:

+-----+-----+-----+-----+-----+----------+----------+
| 'K' | 'Z' | 'C' | ver | CID | reserved | payload  |
| 4B  | 5A  | 43  | 01  |  N  |    00    |  ...     |
+-----+-----+-----+-----+-----+----------+----------+

Readerside dispatch (postdecryption, pre-decompression):

  • Byte 0 =0x28 → *are zstd frame*(legacy path, every pre-#018 blob). Decode with the zstd backend.
  • Bytes 0..3 =0x4B 0x5A 0x43 0x01 → *ZC envelope* Parse byte 4 as CompressorID and dispatch to the registered backend; bytes 6+ are the compressor-native payload.
  • Anything else → corruption / unknown future format; surface as a typed error pointing the operator at kzip version and kzip migrate.

CompressorID registry (RFC-002 §3.3):

ID Name
0 reserved (do not emit)
1 zstd (never emitted in canonical kzip — zstd blobs are bare)
2 lzma2
3 brotli
4..127 reserved (first-party backends)
128..255 reserved (downstream forks)

Backward-compat:

  • Zstd blobs (the default) stay bare. Pre#018 readers see byteidentical bytes — the format is fully backward-compatible for the common case.
  • A pre#018 reader encountering a KZCenveloped blob fails at the zstd frame check (zstd: invalid input) — the AEAD over the ciphertext catches any tampering before decompression, so silent mis-decode is impossible.
  • A post#018 reader on a pre#018 pack: identical to the barezstd legacy path. T3 in RFC002 §8 asserts byte-identical restore.

Mixedbackend repositories are valid; selection is perSaveBlob, threaded via Repository.Options.Compressor. kzip check decodes every blob through the right backend automatically — operators do not need to track which packs are which.

CLI surface:

Command Function
kzip backup --compressor=<name>[:level] Pick the backend for the new blobs in this backup
kzip compressor list List registered backends with their IDs + level ranges
$KZIP_COMPRESSOR Envvar equivalent of `-compressor`

Posture vs §14 divergence policy: this is a *onbreaking*addition — the encryptedpayload envelope is invisible to pack-header parsers. It does not constitute a §14 divergence; no repository.version bump.

13.6 Detachedsignature sidecar `kzigv1 (.kz.sig`, ratified 20260519 via ticket #023)

Outofband Ed25519 signature for any file (compressed or not). The sidecar lives next to the covered file with suffix .kz.sig — analogous to the .kz.rs parity sidecar from §13.1.

Wire format (fixed 101 bytes, little-endian where applicable):

+-----+--------+----------+--------------+
| 4B  |  1B    |   32B    |     64B      |
|KZIG | algo   | pubkey   | signature    |
+-----+--------+----------+--------------+
Field Width Encoding Meaning
magic 4B bytes Exactly 0x4B 0x5A 0x49 0x47 (KZIG). Distinguishes from KZRS (RS parity sidecar) and KZx01x00 (single-file stream).
algo 1B u8 Signature algorithm. 0x01 = Ed25519 is the only value defined; readers MUST reject other values.
pubkey 32B bytes Ed25519 public key of the signer. Embedded so verification works without an outofband recipient file; the hardening posture is to pass --pubkey to pin a known signer.
signature 64B bytes Ed25519 signature over SHA256(coveredfilebytes). Streamingsafe — the hash is computed incrementally, so the covered file may be multi-gigabyte without buffering.

Constraints:

  • len(file) == 101. Reader rejects shorter/longer files with kzsig: wrong sidecar size.
  • algo MUST be 0x01 (Ed25519). Future algorithms add new values; writers MUST NOT emit a value the spec hasn't ratified.
  • The signed message is SHA-256(covered-file). Not the file directly — this lets the signing primitive stay 32 bytes regardless of covered-file size.

Identity file formats (companion to §3.4.x recipient files):

Signing identity (private, 0600):  KZIP-SIGNING-IDENTITY-V1-<base64-32B-seed>
Signing recipient (public, 0644):  KZIP-SIGNING-RECIPIENT-V1-<base64-32B-pub>

Suffixes .kzkey (private) and .kzpub (public) are the canonical extensions.

CLI surface (kzip #023):

Command Function
kzip key gen-signing-identity --output <prefix> Mint a fresh Ed25519 keypair; writes <prefix>.kzkey + <prefix>.kzpub
kzip sign --signing-identity <file> <path>... Produce <path>.kz.sig for each file
kzip verify <path>... Validate <path>.kz.sig against the file; exit 0 only on full success
kzip verify --pubkey <hex|file> Hardening: require the sidecar's embedded pubkey to match the pinned one
kzip verify --keyring <dir> Hardening: require the sidecar's pubkey to be one of the .kzpub files in the directory

Posture vs §14 divergence policy: the sidecar is outofband — it does NOT touch the packformat wire layout and is invisible to restic v0.18.x. This is *onbreaking*

Why a separate file from .kz.rs? Both sidecars sit beside the covered file but answer different questions. .kz.rs answers "can I reconstruct corrupted bytes?"; .kz.sig answers "who created these bytes?". Keeping them in distinct files lets operators ship parity without exposing the signing key, or sign without paying the parity overhead.

14. Divergence policy

Mudanças que *uebram*a compat byteabyte com restic v0.18.x:

  1. Exigem RFC novo (e.g. kzip-RFC-002-format-divergence.md) com:
    • Justificativa (feature impossível com formato atual)
    • Caminho de migração (forward-compat se possível)
    • Bump de repository.version (3 ou superior)
  2. Lifecycle:
    • --migrate command para converter repos v2 → vN
    • Suporte de leitura para v2 mantido por ≥1 ano após bump
    • Snapshot nota explícita: kzip 0.X.0 introduced repo v3, see CHANGELOG

Mudanças que *antêm*byte-compat (não quebram):

  • Adicionar campos JSON novos (forward-compat por convention)
  • Adicionar tags Koder-specific
  • Adicionar comprehensible algorithms (zstd higher levels, etc.)

Estes não exigem RFC, apenas atualização desta spec + entrada CHANGELOG.

14.5 Fifth active divergence: Configurable blobID hash algorithm (kzip ticket #026/#028, 202605-19)

A repository's blobID hash algorithm is selectable at init time via `kzip init -hash=name. The choice is persisted in Config.HashAlg`; subsequent operations dispatch through that algorithm transparently. Registered algorithms (kzip #026):

Name Digest size Notes
sha256 32 bytes Default. restic-compatible (the only value restic upstream recognizes).
blake3 32 bytes ~2–3× faster than SHA-256 on modern x86.

All registered algorithms produce 32byte digests, so the [32]byte ondisk ID layout is unchanged. Variablelength digests (xxhash 8byte) would require a separate divergence and are deferred.

Config.HashAlg = "" or "sha256" is bytecompatible with restic v0.18.x (the legacy default). Config.HashAlg = "blake3" is the breakage point: restic will fall through to SHA256 on read and find no matching blob IDs, failing cleanly without silent corruption.

Dispatch contract:

  • Processwide: restic.SetHashAlg(name, fn) sets a packagelevel dispatch target consumed by restic.Hash(data). The first repo opened in a process locks the algorithm; subsequent opens with a different algorithm return ErrHashAlgConflict.
  • Perblob: not supported. The whole repo uses one algorithm. Crossalgorithm midlife migration is not supported in v1; the supported workflow is restoreintostaging + reinitwithnewhash + rebackup.

14.4 Fourth active divergence: Perblob filterskip flag (kzip ticket #017, 20260515)

The high nibble (0xF0) of the pack-header entry Type byte (§13.3.1) encodes opt-in BlobFlags. The first defined flag is FlagFilterSkipped = 0x80. A pre-#017 kzip reader or restic v0.18.x sees Type=0x80..0xFF and falls into default → invalid type — same onedirectional break as §14.1/§14.3. Optin via writer behavior: kzip without a contentaware filter chain (no `bcj* in -ilter`) never sets the bit and emits byteidentical pre#017 packs.

Skipbylength contract: the high nibble doesn't change entry size; a generic skipunknowntypes reader (which neither restic nor kzip implement) could mask Type & 0x0F to recover the dispatch nibble and proceed. Current readers don't do that — they reject the byte outright on the strictfailbetterthansilent-corrupt principle (§14.3 rationale).

14.3 Third active divergence: Type=5 recovery records (kzip ticket #009, 20260513)

Pack files written with kzip backup --recovery-records=N% carry a Type5 entry per §13.4. Per §13 (header type byte legend), values 5-255 are reserved for kzip extensions; restic v0.18.x doesn't recognise this type byte. The current restic vendor parser returns invalid type 5 and aborts — same posture as Type4 (filter chain).

Skipbylength is the contract for both: a v1 reader that knows to advance by the u32encoded length field (Type4 and Type5 share this layout) can still read the regular blob entries. Stock restic doesn't do that — it returns an error on the first unknown type. The breakage is onedirectional: kzip can read packs that restic wrote (no Type=5 ever); restic can NOT read kzip packs with --recovery-records set.

Hard-fail is strictly safer than silent skip here: silent skip would mean a restic operator sees the pack as parseable but loses all repair capacity without knowing.

14.2 Second active divergence: recipientmode keys/id (kzip ticket #005, 202605-13)

Recipientmode key files (§3.4.1) carry recipients instead of kdf/data. A restic v0.18.x reader treats the file as JSONvalid but fails to open ("KDF is not scrypt"); it does NOT silently corrupt anything — readonly sideeffectfree. kzip provides OpenKeyWithIdentity to consume these files; mixedmode repos (some password keys, some recipient keys) work transparently in both directions via the routing-error sentinels.

This divergence is optin: users who don't pass `recipient to kzip init` get a fully resticbytecompat repo. Once a recipientmode key file lands, the operator commits to kzipaware tooling for opening — but the pack files and snapshots remain bytecompat as long as no --filter= is used (cf. §14.1).

14.1 First active divergence: Type=4 (kzip ticket #015, 20260513)

The filterchain descriptor (§13.3) became the *irst bytecompat-breaking divergence shipped to users* A restic v0.18.x reader hitting a pack with a Type=4 entry returns invalid type 4 and aborts. This is acceptable because:

  1. The break only manifests on packs *ctually written with --filter=…*

    Nofilter packs (the bytecompat default) still parse cleanly under stock restic.

  2. The reverse-pipeline contract (decrypt → decompress → chain.Reverse)

    is the only way a nofilteraware reader could honour the data; without it, plaintext would silently be the filtered (incorrect) form. Hard failure is strictly safer than silent corruption.

  3. The repository.version field is unchanged (still 2). Migration is

    oneway: kzipwritten filtered repos cannot be downgraded to restic without rearchiving from source. Plain (no `-filter=`) kzip repos remain interchangeable with restic v0.18.x.

Future bytecompatbreaking extensions (e.g. Type=5 RS pack-embedded,

009) follow the same pattern: gated by an optin CLI surface, hardfail

on legacy readers, no silent-misread escape hatch.

  • Tests de regressão em engines/compress/kzip/tests/regression/ devem incluir golden-hash compare contra binaries restic v0.18.x para garantir interop.
  • Tests em engines/compress/kzip/{cmd,internal}*_test.go (test suite herdado de restic v0.18.1, agora toplevel desde #008) preservados comois.

Anexo A — Mapeamento kzip ↔ restic

Durante o bootstrap, todos os termos restic são equivalentes aos termos kzip. Mapeamento:

Restic Kzip Notas
restic init kzip init mesmo behavior
repository repository (.kz/ ou .tar.kz) extensão única .kz, dispatch por magic (§11)
pack file pack file layout idêntico
blob blob idêntico
snapshot snapshot idêntico
Argon2id KDF Argon2id KDF idêntico
AES256CTR + Poly1305 AES256CTR + Poly1305 idêntico
Rabin chunker Rabin chunker idêntico

Quando começar a divergir (ticket #003 BCJ filters, etc.), entradas serão adicionadas a este anexo com data de divergência.

Source: ../home/koder/dev/koder/meta/docs/stack/specs/kzip/format.kmd