AI voice mode UI

mandatory

Fullscreen voice conversation mode UI: waveform (input/output color-coded), push-to-talk + always-on toggle, barge-in visual, mute, end session. Extends voice/wake-word.kmd with conversational mode semantics. Required for Talk product + future Kortex voice mode.

Spec — AI voice mode UI

Spec base: voice/wake-word.kmd cobre toggles + backend. Esta spec cobre *X do modo conversacional ao vivo* Trigger: mic button no multimodal-input.kmd (#116). Impl ticket: services/ai/ai#115.

Princípios

  1. *ullscreen focus*— voice mode = dedicated surface, não inline.
  2. *isual feedback*— waveform input vs output color-coded.
  3. *arge-in supported*— user fala enquanto IA fala; UX deve sinalizar transição.
  4. *ne-tap escape*— mute / end always reachable.
  5. *ptional transcription overlay*— toggle live STT visible.

R1 — Anatomia

┌─────────────────────────────────────────────┐
│  [✕]                                        │  ← top: end session
│                                             │
│            [Assistant Icon]                 │
│         Generated by AI — verify            │
│                                             │
│   ╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮╭─╮              │
│  ─╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰─             │  ← waveform (output = assistant talking)
│                                             │
│   "Sure, here's what I think..."            │  ← optional transcript overlay
│                                             │
│                                             │
│                                             │
│       ┌────────────────────────┐            │
│       │  Push to talk    [PTT] │            │  ← mode toggle
│       │  Always on       [AON] │            │
│       └────────────────────────┘            │
│                                             │
│                  [🎙 mute]                  │
└─────────────────────────────────────────────┘

Slots:

Slot Function
End button (✕) Close voice mode; return to composer or chat surface
Avatar Assistant icon (configurable)
AI disclaimer Subtle label per ai-disclaimer.kmd R1 tier 1
Waveform Animated; color shifts per state (R2)
Transcript overlay Optional toggle; shows STT live (input) + TTS source (output)
Mode toggle Pushtotalk vs Always-on
Mute button 1-tap mute mic; visual indicator state

R2 — Waveform states

Color coding per themes/color-roles.kmd:

State Waveform color Behavior
idle text-muted flat line no input/output
listening (input) accent animated bars input audio captured
processing text-muted pulsing dot waiting for response
speaking (output) success animated bars output audio playing
barge-in transition Crossfade accent ↔ success brief overlap when user speaks during output

Visual MUST honor reduced-motion: replace animation with static "...listening" / "speaking" labels.

R3 — Pushtotalk vs Always-on

Mode Behavior
*ushtotalk (PTT)* Hold button (or spacebar) to record; release to send. Default for noisy environments.
*lways-on (AON)* Continuous capture with VAD (voice activity detection). Default for hands-free.

Toggle persists peruser. Crosslink voice/wake-word.kmd R1 toggles (voice.enabled, talkMode).

R4 — Barge-in

When user speaks during assistant output (barge-in):

  • Per voice/wake-word.kmd R5: bargeIn: true → output audio fades out + input fades in.
  • Visual: waveform color crossfades accent ↔ success over ~200ms.
  • Output TTS interrupted; new input streamed to backend.
  • Audit: barge-in event logged.

R5 — Transcript overlay

Toggle button (default OFF). When ON:

  • Live STT text appears overlay below waveform (input state).
  • Live TTS source text appears (output state).
  • Auto-scroll; max 3 lines visible.
  • After mode end: transcript persisted to conversation history per conversation-history.kmd (#115).

R6 — End session

Topright ✕ button OR swipedown gesture (mobile):

  • End TTS playback gracefully (fade out 200ms).
  • Disconnect WebSocket from services/ai/voice.
  • Return user to composer OR chat history (configurable per product).
  • Final transcript saved to conversation history.

R7 — Mute

1tap toggle bottomcenter mic button:

  • Mute: mic input gated locally; backend still connected.
  • Unmute: input resumes.
  • Mute state announced via aria-live.
  • Visual: mic icon strikethrough.

R8 — Surface bindings

Surface API
Flutter KoderVoiceModeSheet({onEnd, onMute, onBargeIn, transcriptToggle}) em koder_kit/lib/src/ai/voice_mode_sheet.dart
Web <koder-voice-mode-sheet>
Compose Android KoderVoiceModeSheet em koder-design-compose (futuro)
SwiftUI iOS idem em koder-design-swift (futuro)
CLI / TUI Plain promptandresponse; mic via system; no waveform

R9 — Acessibilidade

  • Sheet: role="dialog" aria-modal="true" aria-label="Voice conversation".
  • Waveform: aria-hidden="true" (visual only); state announced via aria-live ("Listening", "Speaking").
  • Buttons: keyboard accessible; spacebar for PTT.
  • Reduced-motion: waveform replaced by labels.
  • Screen reader: announces transition states.

R10 — i18n

Key en-US pt-BR
ai.voice.mode.title "Voice mode" "Modo de voz"
ai.voice.mode.ptt "Push to talk" "Pressionar para falar"
ai.voice.mode.aon "Always on" "Sempre ativo"
ai.voice.mode.mute "Mute" "Silenciar"
ai.voice.mode.unmute "Unmute" "Reativar"
ai.voice.mode.end "End conversation" "Encerrar conversa"
ai.voice.mode.transcript_toggle "Show transcript" "Mostrar transcrição"
ai.voice.state.listening "Listening..." "Ouvindo..."
ai.voice.state.processing "Processing..." "Processando..."
ai.voice.state.speaking "Speaking..." "Falando..."

R11 — Per-preset

Cosmetic only.

T-suite

  • *1*Mount: voice mode sheet renders; default mode PTT or AON per user pref.
  • *2*State transitions: idle → listening → processing → speaking → idle.
  • *3*Waveform colors: each state correct color.
  • *4*PTT mode: hold spacebar → recording; release → send.
  • *5*AON mode: VAD-triggered start/stop.
  • *6*Barge-in: speak during output → crossfade accent ↔ success; output TTS interrupted.
  • *7*Mute: 1-tap → mic gated; aria announces "Muted".
  • *8*End: tap ✕ → fade out + cleanup + return to composer; transcript saved.
  • *9*Transcript toggle: enable → overlay visible during conversation.
  • *10*Reduced-motion: waveform replaced by text labels.
  • *11*A11y: aria-live announces state transitions.
  • *1*Mic permission revoked midsession: graceful end + prompt to regrant.

Source: ../home/koder/dev/koder/meta/docs/stack/specs/ai-ui/voice-mode.kmd