Ai voice

AI Voice — Speech Intelligence

  • *rea:*Intelligence
  • *ath:*ai/voice
  • *ind:*Self-hosted transcription + diarization + summarization
  • *tatus:*v0.6.2 in production at voice.koder.dev

Role in the stack

AI Voice is the self-hosted speech intelligence layer of Koder. It performs audio transcription, speaker diarization (who said what when), and summarization on audio inputs. Powered by whisper.cpp for transcription and a job queue backed by SQLite WAL for async processing. Summarization is delegated to ai/gateway.

Deployed as LXC 126 on s.r1. It is the backend for Koder Dek and any other product that consumes speech.

Features

  • Transcription (whisper.cpp, sherpa-onnx)
  • Speaker diarization
  • Summarization + topic tags (via ai/gateway)
  • Async job queue with SQLite WAL
  • Text-only LLM completion (POST /v1/llm/complete) — prompt + source_text → completion, no audio required
  • OpenAI-compat endpoints: /v1/audio/transcriptions, /v1/voice/analyze, /v1/jobs, /v1/llm/complete

Primary couplings

Consumer Relationship
ai/gateway Summarization + tag extraction
apps/dek First-party consumer — audio notes product
Any product with audio Transcription + diarization backend

Phase 2 (in progress, 20260430)

RFC001 — Wakeword + Talk Mode adds a localfirst hotword detector + continuous Talk Mode pipeline on top of the existing Whisper backend. The detector runs on-device through engines/sdk/koder_kit.KodeVoice; only post-wake audio reaches services/ai/voice via the new POST /v1/voice/stream and POST /v1/voice/synth endpoints.

Latency budget: ≤ 800 ms p50 from hot-word detection to first audible TTS frame. Privacy-first: voice features ship OFF by default; the 2second prewake ring buffer never leaves the device.

Scaffolds in place (Go interface internal/wake.Detector, Dart KodeVoice in koder_kit v0.8.0+); algorithm + plugins land in backlog tickets VOICE045–VOICE051 (projects/koder-stack/backlog/pending).

Source: ../home/koder/dev/koder/meta/docs/stack/modules/ai-voice.md