Ai playground

AI Playground — Prompt / Agent Workbench

*rea:*Intelligence
*ath:*services/ai/playground
*ind:*Web workbench for prompt and agent prototyping (multi~~model, side~~by-side, share, export)
*tatus:*v0.0.1 — sector bootstrapping (20260509)

Role in the stack

playground flattens the prompt~~iteration loop. Today prompt prototyping happens in scripts, ngrok~~exposed FastAPI demos, or — worst case — directly in production. Each developer has their own setup; comparing two prompts on the same input requires running them serially and squinting at logs. Trying a new model means editing config and redeploying. None of this scales beyond one or two people.

This sector is the workbench: open the page → type a prompt → pick two models sidebyside → tweak temperature → see streaming results → save the winning version to prompt/ → export a ticket. Iteration time drops from minutes to seconds. Pre~~product surface for designers, PMs, and customer~~success folks who shouldn't need dev help to experiment.

It is the Koder analog of OpenAI Playground, Anthropic Workbench, and HuggingFace Spaces. *istinct from kode*— kode is the IDE~~class product (full project context, file editing, multi~~turn agent loops); playground is the no-state workbench for "try this and tell me how it does."

Boundary vs neighbors

services/ai/kode is IDE-class — different surface, different audience.
services/ai/gateway, services/ai/runtime, services/ai/modelreg, services/ai/prompt are primary producers (model exec, available models, curation, prompt registry).
services/ai/tools provides the tool palette for agent-style runs.
services/ai/trace integrates for debug~~from~~trace-viewer.
services/ai/eval receives promoted runs as eval cases.

Features (v1 target)

Flutter Web app at playground.koder.dev (reuses koder_kit auththemei18n/safe-area)
Multi~~model side~~by~~side chat (1~~pane or 2-pane)
Streaming responses tokenbytoken; abort mid-stream
Parameter knobs per pane: temp, topp, topk, max_tokens, stop, seed
System prompt editor + tool palette (toggle per run)
Model picker driven by modelreg curated tags
Prompt registry integration: load, edit (with diff), save as new version, set active
Variable inputs auto-rendered from {{var}} template placeholders
Saved sessions library + 60s auto-snapshot
Signed share links (read-only, scoped, revocable, 30d default expiry)
One~~click export~~as-ticket to any accessible backlog
Promotetoeval action (curator role) → ships run as eval case
Per-pane timing badges (ttft, total) + token + cost estimate
Trace_id link for every run

Primary couplings

Producer	Relationship
`services/ai/gateway`	LLM execution (streaming SSE)
`services/ai/runtime`	Available models
`services/ai/modelreg`	Curated picker source
`services/ai/prompt`	Registry — load, save, version
`services/ai/tools`	Tool palette
`services/ai/trace`	Per-run trace_id

Consumer	Relationship
Internal team (devs, PMs, designers, CS)	Daily prototyping surface
`services/ai/eval`	Receives promoted runs
External (post-MVP)	Public access via API key

RFC and bootstrap

RFC: playground-RFC-001-foundations.kmd — *ccepted*20260509
Bootstrap ticket: services/ai/backlog/done/137-playground-bootstrap.md
Implementation tickets: services/ai/playground/backlog/pending/{001..005}

Selfhostedfirst analysis (5 gates)

Gate	Status	Notes
G1 Feature parity	pending	Multi-model + params + tools covers OpenAI Playground baseline
G2 Performance	pending	Streaming dominated by gateway; UI must stay 60fps during stream
G3 Stability	pending	Pre-MVP
G4 Capability	pending	Full IDE deferred to kode; mobile responsive but not primary
G5 Critical-path readiness	pending	Internal productivity unblock day 1