Kodec RFC 002 phase3 decoders

RFC002 — Phase 3: First real decoders (Opus, AACLC, H.264)

  • *tatus:*Draft (20260412)
  • *uthor:*Claude Opus 4.6 + Rodrigo
  • *arget release:*koder-cine v0.2.0 → v0.3.0
  • *upersedes:*RFC-001 §10 Phase 3 (expands it with concrete decisions)
  • *epends on:*RFC-001 (workspace architecture, error model, testing strategy)

1. Summary

Phase 1 and Phase 2 delivered *ontainer parsing*— koder-cine can open an MP4 or WebM, walk its atoms, enumerate tracks, emit keyframe and audio-frame byte ranges, and remux Opus tracks into Ogg. That was enough to land two real wins in koder-talk (ticket 002 probe backend, ticket 009 Opus audio fast-path).

*hase 3 crosses the decoder threshold.*This RFC scopes what that means concretely: which codecs, in which order, written from scratch vs wrapped from existing Rust crates, the API they share, the conformance bar, and what the end of Phase 3 does not deliver (spoiler: still no pixels reach a JPEG — that is Phase 4).

Phase 3 is the single largest commitment in the koder-cine roadmap. The RFC-001 estimate was "~12–18 months". This RFC reaffirms that and spends a section defending it.

2. Why decoders, why now

Two forces pull decoders onto the critical path:

  1. *udio demux is halfdone.*Ticket 009 covered Opusin, but

    every MP4 from iOS and every legacy recording is AAC-LC, which still shells out to ffmpeg for decode + re-encode to Opus. A native AAC decoder lets koder-cine produce Opus output for any input via an internal AAC→PCM→Opus-encode path, closing the audio half of the Phase 2 exit criterion.

  2. *eyframes are byte ranges, not images.*koder-cine already

    knows where the I-frames live; Phase 4 turns them into JPEGs. That requires a video decoder (H.264/H.265) to produce YUV frames, a scaler (Phase 4), and a JPEG encoder (Phase 4). The video decoder is the Phase 3 prerequisite.

Both gate the same RFC-001 Phase 2 exit criterion we only partially met. Phase 3 is the only way to raise that flag fully.

3. Scope

In scope

  • *mediacodecopus`*— Opus audio decoder
  • *mediacodecaac`*— AACLC decoder (LC profile only; HEAAC v1/v2 out of scope)
  • *mediacodech264`*— H.264 decoder (Baseline + Main + High profiles, 8-bit 4:2:0 only)
  • *ecoder API in media-core*— shared trait covering all three, packetin / frameout
  • *onformance harness*— checksumbased frameexact comparison against ffmpeg reference outputs
  • *uzz targets*for each decoder crate
  • *LI additions*— koder-cine decode <file> subcommand emitting raw YUV or PCM to stdout, for debugging and conformance

Explicitly out of scope (moves to later phases)

  • H.265/HEVC decoder — *hase 5*(higher complexity, less traffic)
  • VP9 / AV1 — *hase 5*(wrap dav1d-rs initially; write from scratch never)
  • Any encoder — *hase 5*(Opus encoder included — we keep using opus-extract for remux and ffmpeg for re-encode until then)
  • Hardware acceleration (VA-API / NVENC / VideoToolbox) — *hase 5*
  • JPEG encoding of decoded frames — *hase 4*
  • Scaling / colorspace conversion — *hase 4*
  • 10-bit or 4:2:2/4:4:4 H.264 — out of scope permanently unless Koder products demand it
  • AAC-HE (SBR/PS) — out of scope permanently unless a real consumer needs it

4. Ordering and why

*pdated 20260412 after symphonia audit (§14).*The audit flipped the original plan. The new ordering is *AC → H.264 → Opus* for these reasons:

  1. *ACLC (≈1–2 weeks, pending MPL2.0 license approval).*Was

    expected to take 2 months. symphonia-codec-aac is a production crate (v0.5.5, 3.3M downloads, pure safe Rust), and the wiring to our Decoder trait is days, not months. *ated on §13 license approval for MPL-2.0 dependencies*(see §9, §14). If approved, AAC becomes the fastest real win of Phase 3 and the first crate to land — making it the natural calibration point for the Decoder trait, conformance harness and fuzz infra.

  2. *.264 (≈9–12 months).*Unchanged. Still the big one, still

    from scratch, still the gate for Phase 4. See §5.3. Milestones in §7.

  3. *pus (≈2–4 months, reassessed upward).*Was expected to take

    1 month via symphonia-codec-opus re-export. *he audit (§14) revealed that crate does not exist*— it is a planned entry, never shipped. The realistic pure-Rust Opus paths are: port libopus (BSD, ~30k LOC of C), or adopt a hobby crate (opus-decoder, mousiki) after serious audit. Neither is a 1-month job. Opus also delivers *ero new talk value*on its own because ticket 009 already serves Opus via remux. Moving it last: it can ship after H.264 with reduced urgency, and if Phase 5 priorities shift (hardware accel, encoders) it can be deferred without blocking anything.

Total: ~11–14 months if work runs backtoback single-threaded. The headline change is that the *irst real integration win arrives in weeks, not a year*— as long as MPL-2.0 gets approved.

5. Per-crate decisions

5.1 media-codec-aac (FIRST to land — was 5.2)

*pproach:*reexport `symphoniacodec-aac` behind our Decoder trait.

The audit in §14 confirmed symphonia-codec-aac is production-grade: v0.5.5, 3.3M downloads, actively maintained, pure safe Rust, no C deps, AACLC complete. HEAAC is unimplemented — matches our scope exactly.

*stimated effort:*1–2 weeks. The work is:

  • Add symphonia-codec-aac as a dep in crates/media-codec-aac/
  • Thin adapter from symphonia's Decoder trait to our media_core::Decoder
  • Wire into koder-cine decode --audio path
  • Conformance harness: 35 MP4 fixtures with AACLC tracks, sha256

    vectors committed

  • Fuzz target wrapping the adapter

*locker:*symphonia is licensed *PL-2.0* which is not on the RFC-001 §11 allow list. This requires either:

  • (a) An explicit amendment to RFC001 §11 adding MPL2.0 for

    dependencies only (not for code Koder writes), with the reasoning that MPL2.0 is filelevel copyleft and does not contaminate consumers, or

  • (b) Rejection of MPL-2.0, in which case we fall back to "write

    AAC-LC from scratch" and the estimate goes back up to ~2 months.

See §13 and §14 for the open decision.

5.2 media-codec-h264 (SECOND to land — unchanged from 5.3)

*pproach:*this is the hard one and symphonia does *ot*cover video codecs. The candidates in pure Rust:

  • *openh264-sys2`*— bindings to Cisco's openh264 (BSD), which is

    C. Rejected for the same FFI reason as Opus wrapper.

  • *h264-reader`*— pure Rust, but it is a parser, not a decoder.

    Useful for NAL / SPS / PPS parsing (we already have NAL parsers from tickets 021/024), not for pixel output.

  • *rustav/h264` (avcodec)*— abandoned since 2019, incomplete.
  • *rite from scratch.*Last resort; 9–12 months.

*ecision:*write from scratch, targeting the Baseline + Main + Highprofile 8bit 4:2:0 subset. Reuse media-bitstream for exp-Golomb + CABAC bit I/O. Ship incremental milestones (§7) so review happens every few weeks, not at the end. Leverage our own NAL parsers from media-container-mp4 and media-bitstream rather than duplicating.

*scape hatch:*if by month 3 of H.264 work the conformance harness shows we are less than 40% through the feature matrix, we pause H.264 and wrap openh264 via subprocess — not via FFI — so the Rust core stays clean. The subprocess is explicit debt, not a long-term home. This escape hatch exists so Phase 4 and Phase 5 are not held hostage to H.264 slip.

5.3 media-codec-opus (THIRD to land — reassessed)

*pproach:*this was supposed to be a cheap symphonia re-export. The audit (§14) killed that path: symphonia-codec-opus *oes not exist on crates.io*— it is a planned entry in the Symphonia README, nothing shipped. Real options:

  • *opus-decoder`*(769 downloads, v0.1.1, "RFC 8251 conformant,

    no unsafe, no FFI") — promising description, essentially no production adoption. Requires a conformance audit before trusting.

  • *mousiki`*(466 downloads, v0.2.1, "Pure Rust Opus codec") —

    similar situation. Hobbyist-grade.

  • *ort libopus*(BSD, ~30k LOC of C) — 2–3 months by a focused

    author. Canonical source of truth.

  • *ait for Symphonia's planned implementation*— no timeline,

    unwise to block on.

*ecision:*defer the decision to when Opus's turn actually comes. At that point, audit opus-decoder and mousiki against our conformance fixtures. If either passes, re-export with a thin adapter. If both fail, port libopus.

*hy third, not first:*ticket 009 already serves the Opus case without decoding (remux-only), so Opus delivers no new talk value. It can ship after H.264 without blocking anything downstream.

5.4 The meta-rule that changed

RFC-001 §2 said "write our own". This RFC's original §5 softened that to "symphonia first, fork if needed". The audit forces a further revision:

*o not reinvent codecs that already exist in safe pure Rust with production adoption. Do not depend on codecs that exist only in hobby crates or plannedbutunshipped crates. When neither applies, port from the C reference — do not wrap it.*

The practical consequence is that Phase 3 fragmentation is higher than expected: AAC is a reexport, H.264 is writefrom-scratch, Opus is TBD. That's fine — the shared Decoder trait (§6) keeps the consumer API uniform regardless of how each crate is implemented internally.

5.3 media-codec-h264

*pproach:*this is the hard one and symphonia does *ot*cover video codecs. The candidates in pure Rust:

  • *openh264-sys2`*— bindings to Cisco's openh264 (BSD), which is

    C. Rejected for the same FFI reason as Opus wrapper.

  • *h264-reader`*— pure Rust, but it is a parser, not a decoder.

    Useful for NAL / SPS / PPS parsing (we already have NAL parsers from tickets 021/024), not for pixel output.

  • *rustav/h264` (avcodec)*— abandoned since 2019, incomplete.
  • *rite from scratch.*Last resort; 9–12 months.

*ecision:*write from scratch, targeting the Baseline + Main + Highprofile 8bit 4:2:0 subset. Reuse media-bitstream for exp-Golomb + CABAC bit I/O. Ship incremental milestones (§7) so review happens every few weeks, not at the end. Leverage our own NAL parsers from media-container-mp4 and media-bitstream rather than duplicating.

*scape hatch:*if by month 3 of H.264 work the conformance harness shows we are less than 40% through the feature matrix, we pause H.264 and wrap openh264 via subprocess — not via FFI — so the Rust core stays clean. The subprocess is explicit debt, not a long-term home. This escape hatch exists so Phase 4 and Phase 5 are not held hostage to H.264 slip.

6. Decoder API (media-core)

All three decoders implement the same trait in media-core:

pub trait Decoder {
    type Frame;

    /// Configure from codec-specific metadata (e.g. avcC for H.264,
    /// AudioSpecificConfig for AAC, OpusHead for Opus). Returns an
    /// error if the configuration is unsupported (e.g. 10-bit H.264).
    fn configure(&mut self, config: &[u8]) -> Result<(), Error>;

    /// Feed one packet (one NAL access unit, one AAC raw block, one
    /// Opus packet). Returns zero or more decoded frames. Decoders
    /// may buffer — a packet in does not imply a frame out.
    fn decode(&mut self, packet: &[u8], pts: i64) -> Result<Vec<Self::Frame>, Error>;

    /// Flush any buffered frames. Called at end-of-stream.
    fn flush(&mut self) -> Result<Vec<Self::Frame>, Error>;
}

pub struct VideoFrame {
    pub pts: i64,
    pub width: u32,
    pub height: u32,
    pub format: PixelFormat, // I420 only in Phase 3
    pub planes: [Plane; 3],  // Y, U, V
}

pub struct AudioFrame {
    pub pts: i64,
    pub sample_rate: u32,
    pub channels: u8,
    pub samples: Vec<f32>,   // interleaved
}

Key choices:

  • *ero async.*Consistent with RFC-001 §7.
  • *wned frame data, not borrowed.*Decoders own internal reference

    buffers; callers get owned copies. This sacrifices some memory performance for API simplicity; we can revisit with a decode_into(&mut Frame) variant if profiling demands it.

  • *o trait object by default.*Each decoder crate exports a

    concrete type; dyn Decoder is available but not the primary path. This keeps inlining aggressive in hot loops.

7. Milestones and review cadence

H.264 is the only crate large enough to warrant internal milestones. Proposed breakdown:

M Deliverable Verification
M1 SPS/PPS parse, NAL extraction wired to Decoder trait Unit tests match h264-reader output
M2 Intraonly frame decode (Islice only, no CABAC) Decode a 1-frame MP4 to YUV, checksum vs ffmpeg
M3 CAVLC interframe decode (Pslice), no deblocking Decode a 30-frame MP4, checksum vs ffmpeg
M4 CABAC Same test with CABAC-encoded fixture
M5 Deblocking filter Same checksums now byte-exact with ffmpeg
M6 B-slice + reference frame management 10second fixture endto-end
M7 Conformance suite (JM reference vectors) passing > 90% of applicable vectors green

Each milestone has its own backlog ticket and its own review. We do not wait 12 months for a big-bang merge.

8. Conformance strategy

The question left open by RFC-001 §13.1 is: how do we test decoder output against ffmpeg reference without taking a runtime dep on ffmpeg?

*ecision:*checked-in reference fixtures, regenerated rarely. For each input fixture in tests/fixtures/, a maintainer with ffmpeg installed runs a documented command like:

ffmpeg -i input.mp4 -f rawvideo -pix_fmt yuv420p reference.yuv
sha256sum reference.yuv > reference.yuv.sha256

The .sha256 file is committed; the .yuv is not (too large, easy to regenerate). Tests decode via koder-cine, compute the sha256 of the output, and compare against the committed reference. When ffmpeg changes output (rare in mature decoders), a maintainer regenerates.

For *it-exactness disputes* use the JM reference decoder conformance vectors from the H.264 / H.265 JVT. Those are the ground truth for decoder correctness; ffmpeg is just convenient.

Fuzz targets (cargo-fuzz) run continuously in CI as they do today for the parsers (see tickets 004, 012, 022, 023). Every decoder crate ships with at least one fuzz target at its first milestone.

9. License constraints recap

From RFC001 §11: MIT / Apache2.0 / BSD / ISC only. The Phase 3 audit (§14) surfaced *ne real license issue* Symphonia is *PL-2.0* not MIT as I mistakenly assumed in the RFC's first draft.

9.1 MPL-2.0 analysis

  • *PL2.0 is filelevel copyleft.*Modifications to MPL-licensed

    source files must be released under MPL. Consumers of MPL code do not need to release their own code under MPL — the copyleft stops at the file boundary.

  • *ractical consequence for kodercine:*taking `symphoniacodec-aac`

    as a Cargo dependency is fine. koder-cine stays MIT. Koder products linking koder-cine stay unaffected. We cannot copy symphonia's source into our tree and re-license it, and we cannot modify it in-place without sharing back — but neither is needed for a dep.

  • *tatic linking (Rust/Cargo default) is explicitly allowed*

    by MPL2.0 §3.2 and §3.3 — this is precisely the usecase MPL-2.0 was designed around (Mozilla's relationship with proprietary Firefox plugins).

*ecommendation:*amend RFC001 §11 to allow MPL2.0 *or dependencies only* keeping the Koder-authored code under MIT. This is a safe, narrow amendment that unlocks symphonia without broader policy drift.

*isk if rejected:*AAC-LC falls back to "write from scratch" (~2 months). Phase 3 timeline grows, but nothing else breaks.

9.2 Other Phase 3 license confirmations

  • *ibopus*(BSD): porting C→Rust is fine
  • *pus-decoder, mousiki*(need audit; probably MIT/Apache): fine

    if they are, reject if they aren't

  • *M H.264 reference decoder* BSD, spec freely published
  • *orbidden, unchanged* x264 (GPL), x265 (GPL) — encoders, not

    in Phase 3 scope anyway

10. Risks and escape hatches

Risk Likelihood Impact Escape
H.264 decoder slips past 12 months *igh* Blocks Phase 4 indefinitely §5.3 subprocess escape: wrap openh264 via std::process::Command, document as debt
symphonia codecs fail conformance on real fixtures Medium +2 months to write custom decoders §5.1, §5.2 fallback to scratch
CABAC implementation is subtly wrong and only shows up on specific streams Medium Silent corruption in production JM reference vectors catch most; fuzz + real-world corpus catches the rest
Phase 3 effort starves Phase 4 / other Koder work Medium 12+ months of cine-only focus Only land Phase 3 if there is genuine capacity; otherwise keep ffmpeg
10-bit H.264 content appears in real talk traffic Low Decoder rejects, falls back to ffmpeg Acceptable — ffmpeg stays on PATH as fallback through Phase 5

The last row matters: *fmpeg is not removed from the VideoAnalyzer config until Phase 5* Phase 3 narrows its usage; it does not eliminate it.

11. Open questions

  1. *ingle crate for all three codecs or one per crate?*RFC-001 §4

    says one per crate. Confirming: yes, media-codec-opus, media-codec-aac, media-codec-h264 as independent crates, each behind its own Cargo feature. A consumer that only needs Opus (like talk's audio path) pays only Opus weight.

  2. *ixelFormat variants for Phase 3* I420 only (YUV 4:2:0, 8-bit).

    Anything else returns UnsupportedPixelFormat. Revisit in Phase 5 when HW-decoded frames bring NV12 into the picture.

  3. *rame allocator* should frames use a pool (bumpalo,

    typed-arena) to avoid malloc on every frame? Lean: no in Phase 3, measure first. Pool adds complexity that only pays off if profiling shows allocation as a bottleneck.

  4. *onformance harness location* new crates/media-conformance

    crate or percodec tests/conformance/ directories? Lean: percodec, so the harness lives next to the code it validates. A top-level crate is overkill for three codecs.

  5. *hether to adopt symphonia's Decoder trait directly*instead

    of writing our own in media-core. Lean: write our own. symphonia's trait has assumptions (e.g., around its packet type) that do not match our NAL-oriented container world cleanly. A thin adapter at the symphonia boundary is cheap.

  6. *enchmark baseline* when do we start tracking decode throughput

    (FPS / Mbps) vs ffmpeg? Proposal: after AAC-LC lands, before H.264 starts. Gives us a calibration point without blocking early work.

12. What this RFC does not decide

  • *o cdylib surface.*The Go side of the integration (how

    libkoder_cine.so exposes decoders to talkd without exec) is Phase 5. Phase 3 shipping means Rust crates + CLI only; Go keeps calling koder-cine as a subprocess.

  • *ncoder strategy.*Opus encoder, AAC encoder, H.264 encoder all

    deferred to Phase 5. Implications for talk ticket 009: AAC-in inputs still need an encoder to produce Opus output — until Phase 5, that encode stays in ffmpeg, and koder-cine's win is only the decode side of the pipeline.

  • *oder Koda port timing.*RFC-001 §9 still governs. Phase 3 adds

    three large crates to the "port later" pile; nothing changes about the ordering or the gating capabilities in lang/520-548.

13. Approval checklist

Before this RFC moves from Draft → Accepted:

  • [x] ~ymphonia audit: read symphonia-codec-opus and

    symphonia-codec-aac source; confirm re-export path is viable~*one 20260412 — results in §14. Flipped the ordering (AAC first, Opus last) and surfaced the MPL-2.0 license question.*

  • [x] ~*MPL-2.0 dependency policy decision.*Rodrigo to decide

    whether to amend RFC001 §11 to allow MPL2.0 for deps only (§9.1). Gating AAC reexport vs fromscratch implementation.~*pproved 20260412. RFC001 §11 amended with narrow MPL2.0 depsonly carveout (see RFC001 §11.1). AAC reexport path unblocked.*

  • [ ] Rodrigo signs off on the 11–14 month estimate and accepts the

    opportunity cost

  • [ ] H.264 feasibility spike: 1 week of exploratory work on M1 (SPS/PPS

    parse + Decoder trait wiring) to calibrate the monthbymonth estimate before committing

  • [x] ~onformance harness POC: one fixture, one sha256, one test, wired

    in CI — proves the approach before building more decoders on it~*one 20260412, upgraded from POC to full sha256 in ticket 103.*New crate crates/media-conformance provides PcmHasher, sha256_pcm_f32_le, and a Reference parser. Fixtures: tests/fixtures/sample-aac-lc-3s.aac.sha256 and sample-640x480-3s-avc-aac.mp4.sha256 (both baa63d0bc0ee637fcebd1ff03ddc36bf3a252fbe50f97e8c2d3aa4ff47eb6c8f). Tests: crates/media-codec-aac/tests/conformance_sha256.rs enforces MP4 ref match, ADTS ref match, and MP4↔ADTS bit-exact convergence in every cargo test. CLI anchor: koder-cine decode-audio <file> emits sha256_pcm_f32_le. Cross-host determinism validated between laptop note and s.k.lin in cine-v0.1.2.

  • [ ] opus-decoder and mousiki audit: only needed when Opus's turn

    comes (after H.264). Not on the Phase 3 kickoff path.

  • [x] ~icket breakdown in platform/cine/backlog/pending/ with a

    stable numbering scheme that keeps H.264 milestones distinct (e.g., 100aac, 200h264m1 … 206h264m7, 300opus)~*one 20260412.*Phase 3 AAC tickets allocated and shipped as 100101102103104. H.264 reserved as the 200-block (200h264m1 spike pending). Opus reserved as 300-block.

14. Audit results (20260412)

Concrete findings from the symphonia audit that reshaped §4, §5, §9:

symphoniacodecaac — production-ready

  • *xists on crates.io* v0.5.5, last updated 20260324
  • *ownloads* 3,355,159 — serious production adoption
  • *icense* MPL-2.0
  • *mplementation* pure safe Rust, no C/C++ deps
  • *overage* AACLC "Great" (per upstream README); HEAAC v1/v2

    not implemented

  • *erdict* *iable as reexport* gated on MPL2.0 policy

    approval. Saves ~2 months of Phase 3.

symphoniacodecopus — does not exist

  • *OT published on crates.io.*curl https://crates.io/api/v1/crates/symphonia-codec-opus

    returns "crate symphonia-codec-opus does not exist".

  • It is a planned entry in the Symphonia README codec table,

    marked "—" (not started). There is no shipped code.

  • *erdict* dead path. The RFC's original plan to "re-export

    symphoniacodecopus in week 1" is impossible.

Pure-Rust Opus alternatives on crates.io

Searched crates.io for "opus decoder", found:

Crate Version Downloads Verdict
opus-decoder 0.1.1 769 Hobbyist; needs serious audit before adoption
mousiki 0.2.1 466 Hobbyist; same caveat
opus-oxide 0.0.1 Pre-alpha; skip
opus / audiopus various FFI wrappers around libopus (C); rejected per RFC-001 §2

None has production adoption. When Opus's turn comes, the choice is between a deep audit of these (risky) or porting libopus (2–3 months). Deferred per §5.3.

License landscape

  • *ymphonia is MPL-2.0* not MIT as the RFC's first draft

    incorrectly assumed. This is the critical finding — see §9.1 for the impact analysis and recommended amendment to RFC-001 §11.

What the audit did NOT change

  • *.264 plan unchanged.*No pure-Rust video codec exists. Write

    from scratch remains the plan, with the subprocess escape hatch.

  • *ecoder trait API unchanged.*§6 design holds regardless of

    which crates implement it.

  • *onformance strategy unchanged.*§8 (checked-in sha256 fixtures

    + JM reference vectors) still applies.

Audit budget

Spent: ~30 minutes (crates.io API queries, Symphonia README reading, license text verification). *roved the value of doing audit before writing code*— the Opus-first plan would have cost a week of wiring against a non-existent crate.

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/kodec-RFC-002-phase3-decoders.md