AI voice mode UI
Fullscreen voice conversation mode UI: waveform (input/output color-coded), push-to-talk + always-on toggle, barge-in visual, mute, end session. Extends voice/wake-word.kmd with conversational mode semantics. Required for Talk product + future Kortex voice mode.
Spec — AI voice mode UI
Spec base:
voice/wake-word.kmdcobre toggles + backend. Esta spec cobre *X do modo conversacional ao vivo* Trigger: mic button nomultimodal-input.kmd(#116). Impl ticket:services/ai/ai#115.
Princípios
- *ullscreen focus*— voice mode = dedicated surface, não inline.
- *isual feedback*— waveform input vs output color-coded.
- *arge-in supported*— user fala enquanto IA fala; UX deve sinalizar transição.
- *ne-tap escape*— mute / end always reachable.
- *ptional transcription overlay*— toggle live STT visible.
R1 — Anatomia
┌─────────────────────────────────────────────┐
│ [✕] │ ← top: end session
│ │
│ [Assistant Icon] │
│ Generated by AI — verify │
│ │
│ ╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮ │
│ ─╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰─ │ ← waveform (output = assistant talking)
│ │
│ "Sure, here's what I think..." │ ← optional transcript overlay
│ │
│ │
│ │
│ ┌────────────────────────┐ │
│ │ Push to talk [PTT] │ │ ← mode toggle
│ │ Always on [AON] │ │
│ └────────────────────────┘ │
│ │
│ [🎙 mute] │
└─────────────────────────────────────────────┘Slots:
| Slot | Function |
|---|---|
| End button (✕) | Close voice mode; return to composer or chat surface |
| Avatar | Assistant icon (configurable) |
| AI disclaimer | Subtle label per ai-disclaimer.kmd R1 tier 1 |
| Waveform | Animated; color shifts per state (R2) |
| Transcript overlay | Optional toggle; shows STT live (input) + TTS source (output) |
| Mode toggle | Push |
| Mute button | 1-tap mute mic; visual indicator state |
R2 — Waveform states
Color coding per themes/color-roles.kmd:
| State | Waveform color | Behavior |
|---|---|---|
| idle | text-muted flat line |
no input/output |
| listening (input) | accent animated bars |
input audio captured |
| processing | text-muted pulsing dot |
waiting for response |
| speaking (output) | success animated bars |
output audio playing |
| barge-in transition | Crossfade accent ↔ success | brief overlap when user speaks during output |
Visual MUST honor reduced-motion: replace animation with static "...listening" / "speaking" labels.
R3 — Pushtotalk vs Always-on
| Mode | Behavior |
|---|---|
| *ush |
Hold button (or spacebar) to record; release to send. Default for noisy environments. |
| *lways-on (AON)* | Continuous capture with VAD (voice activity detection). Default for hands-free. |
Toggle persists peruser. Crosslink voice/wake-word.kmd R1 toggles (voice.enabled, talkMode).
R4 — Barge-in
When user speaks during assistant output (barge-in):
- Per
voice/wake-word.kmdR5:bargeIn: true→ output audio fades out + input fades in. - Visual: waveform color crossfades accent ↔ success over ~200ms.
- Output TTS interrupted; new input streamed to backend.
- Audit: barge-in event logged.
R5 — Transcript overlay
Toggle button (default OFF). When ON:
- Live STT text appears overlay below waveform (input state).
- Live TTS source text appears (output state).
- Auto-scroll; max 3 lines visible.
- After mode end: transcript persisted to conversation history per
conversation-history.kmd(#115).
R6 — End session
Topright ✕ button OR swipedown gesture (mobile):
- End TTS playback gracefully (fade out 200ms).
- Disconnect WebSocket from
services/ai/voice. - Return user to composer OR chat history (configurable per product).
- Final transcript saved to conversation history.
R7 — Mute
1tap toggle bottomcenter mic button:
- Mute: mic input gated locally; backend still connected.
- Unmute: input resumes.
- Mute state announced via aria-live.
- Visual: mic icon strikethrough.
R8 — Surface bindings
| Surface | API |
|---|---|
| Flutter | KoderVoiceModeSheet({onEnd, onMute, onBargeIn, transcriptToggle}) em koder_kit/lib/src/ai/voice_mode_sheet.dart |
| Web | <koder-voice-mode-sheet> |
| Compose Android | KoderVoiceModeSheet em koder-design-compose (futuro) |
| SwiftUI iOS | idem em koder-design-swift (futuro) |
| CLI / TUI | Plain prompt |
R9 — Acessibilidade
- Sheet:
role="dialog" aria-modal="true" aria-label="Voice conversation". - Waveform:
aria-hidden="true"(visual only); state announced via aria-live ("Listening", "Speaking"). - Buttons: keyboard accessible; spacebar for PTT.
- Reduced-motion: waveform replaced by labels.
- Screen reader: announces transition states.
R10 — i18n
| Key | en-US | pt-BR |
|---|---|---|
ai.voice.mode.title |
"Voice mode" | "Modo de voz" |
ai.voice.mode.ptt |
"Push to talk" | "Pressionar para falar" |
ai.voice.mode.aon |
"Always on" | "Sempre ativo" |
ai.voice.mode.mute |
"Mute" | "Silenciar" |
ai.voice.mode.unmute |
"Unmute" | "Reativar" |
ai.voice.mode.end |
"End conversation" | "Encerrar conversa" |
ai.voice.mode.transcript_toggle |
"Show transcript" | "Mostrar transcrição" |
ai.voice.state.listening |
"Listening..." | "Ouvindo..." |
ai.voice.state.processing |
"Processing..." | "Processando..." |
ai.voice.state.speaking |
"Speaking..." | "Falando..." |
R11 — Per-preset
Cosmetic only.
T-suite
- *1*Mount: voice mode sheet renders; default mode PTT or AON per user pref.
- *2*State transitions: idle → listening → processing → speaking → idle.
- *3*Waveform colors: each state correct color.
- *4*PTT mode: hold spacebar → recording; release → send.
- *5*AON mode: VAD-triggered start/stop.
- *6*Barge-in: speak during output → crossfade accent ↔ success; output TTS interrupted.
- *7*Mute: 1-tap → mic gated; aria announces "Muted".
- *8*End: tap ✕ → fade out + cleanup + return to composer; transcript saved.
- *9*Transcript toggle: enable → overlay visible during conversation.
- *10*Reduced-motion: waveform replaced by text labels.
- *11*A11y: aria-live announces state transitions.
- *1*Mic permission revoked mid
session: graceful end + prompt to regrant.
Cross-link
- Base spec:
voice/wake-word.kmd(toggles + backend) - Companion:
multimodal-input.kmd(#116 — composer mic button entry point),conversation-history.kmd(transcript persist),ai-disclaimer.kmd(disclaimer label R1) - Backend:
services/ai/voice/,services/ai/ai/backlog/pending/115-cli-desktop-voice-input.md