Ai voice

AI Voice — Speech Intelligence

*rea:*Intelligence
*ath:*ai/voice
*ind:*Self-hosted transcription + diarization + summarization
*tatus:*v0.6.2 in production at voice.koder.dev

Role in the stack

AI Voice is the self-hosted speech intelligence layer of Koder. It performs audio transcription, speaker diarization (who said what when), and summarization on audio inputs. Powered by whisper.cpp for transcription and a job queue backed by SQLite WAL for async processing. Summarization is delegated to ai/gateway.

Deployed as LXC 126 on s.r1. It is the backend for Koder Dek and any other product that consumes speech.

Features

Transcription (whisper.cpp, sherpa-onnx)
Speaker diarization
Summarization + topic tags (via ai/gateway)
Async job queue with SQLite WAL
Text-only LLM completion (POST /v1/llm/complete) — prompt + source_text → completion, no audio required
OpenAI-compat endpoints: /v1/audio/transcriptions, /v1/voice/analyze, /v1/jobs, /v1/llm/complete

Primary couplings

Consumer	Relationship
`ai/gateway`	Summarization + tag extraction
`apps/dek`	First-party consumer — audio notes product
Any product with audio	Transcription + diarization backend

Phase 2 (in progress, 20260430)

RFC~~001 — Wake~~word + Talk Mode adds a local~~first hot~~word detector + continuous Talk Mode pipeline on top of the existing Whisper backend. The detector runs on-device through engines/sdk/koder_kit.KodeVoice; only post-wake audio reaches services/ai/voice via the new POST /v1/voice/stream and POST /v1/voice/synth endpoints.

Latency budget: ≤ 800 ms p50 from hot-word detection to first audible TTS frame. Privacy-first: voice features ship OFF by default; the 2~~second pre~~wake ring buffer never leaves the device.

Scaffolds in place (Go interface internal/wake.Detector, Dart KodeVoice in koder_kit v0.8.0+); algorithm + plugins land in backlog tickets VOICE~~045–VOICE~~051 (projects/koder-stack/backlog/pending).