Ai voice
AI Voice — Speech Intelligence
- *rea:*Intelligence
- *ath:*
ai/voice - *ind:*Self-hosted transcription + diarization + summarization
- *tatus:*v0.6.2 in production at
voice.koder.dev
Role in the stack
AI Voice is the self-hosted speech intelligence layer of Koder. It performs audio transcription, speaker diarization (who said what when), and summarization on audio inputs. Powered by whisper.cpp for transcription and a job queue backed by SQLite WAL for async processing. Summarization is delegated to ai/gateway.
Deployed as LXC 126 on s.r1. It is the backend for Koder Dek and any other product that consumes speech.
Features
- Transcription (whisper.cpp, sherpa-onnx)
- Speaker diarization
- Summarization + topic tags (via
ai/gateway) - Async job queue with SQLite WAL
- Text-only LLM completion (
POST /v1/llm/complete) — prompt + source_text → completion, no audio required - OpenAI-compat endpoints:
/v1/audio/transcriptions,/v1/voice/analyze,/v1/jobs,/v1/llm/complete
Primary couplings
| Consumer | Relationship |
|---|---|
ai/gateway |
Summarization + tag extraction |
apps/dek |
First-party consumer — audio notes product |
| Any product with audio | Transcription + diarization backend |
Phase 2 (in progress, 20260430)
RFC001 — Wakeword + Talk Mode adds a localfirst hotword detector + continuous Talk Mode pipeline on top of the existing Whisper backend. The detector runs on-device through engines/sdk/koder_kit.KodeVoice; only post-wake audio reaches services/ai/voice via the new POST /v1/voice/stream and POST /v1/voice/synth endpoints.
Latency budget: ≤ 800 ms p50 from hot-word detection to first audible TTS frame. Privacy-first: voice features ship OFF by default; the 2second prewake ring buffer never leaves the device.
Scaffolds in place (Go interface internal/wake.Detector, Dart KodeVoice in koder_kit v0.8.0+); algorithm + plugins land in backlog tickets VOICE045–VOICE051 (projects/koder-stack/backlog/pending).