Ai video

AI Video — Video Understanding + Generation Foundation

  • *rea:*Intelligence
  • *ath:*services/ai/video
  • *ind:*Video understanding (transcript + scenes + keyframes + summary) and generation (proxy)
  • *tatus:*v0.0.1 — sector bootstrapping (20260509)

Role in the stack

video is the single integration point for both video understanding and generation. Understanding orchestrates voice (audio track) + imaging (frame analysis) + scene detection — pure composition, no new model code. Generation is proxyonly in v1 (Sora, Veo, Runway, Pika via services/ai/gateway); selfhosted texttovideo stays out of scope until open models reach G4 capability.

It is the Koder analog of a videostack assemblage that today requires pasting together OpenAI Sora + Whisper + GPT4V + custom scene-detect — the foundation collapses that into a single normalized API with one auth boundary, one quota system, one billing event stream.

Boundary vs neighbors

  • services/ai/voice is the audio sibling and a hard dependency for analyze.
  • services/ai/imaging is the frame sibling for caption/keyframe.
  • services/ai/vision is the single-image counterpart.
  • `products/media

Source: ../home/koder/dev/koder/meta/docs/stack/modules/ai-video.md