Ai playground
AI Playground — Prompt / Agent Workbench
- *rea:*Intelligence
- *ath:*
services/ai/playground - *ind:*Web workbench for prompt and agent prototyping (multi
model, sideby-side, share, export) - *tatus:*v0.0.1 — sector bootstrapping (2026
0509)
Role in the stack
playground flattens the promptiteration loop. Today prompt prototyping happens in scripts, ngrokexposed FastAPI demos, or — worst case — directly in production. Each developer has their own setup; comparing two prompts on the same input requires running them serially and squinting at logs. Trying a new model means editing config and redeploying. None of this scales beyond one or two people.
This sector is the workbench: open the page → type a prompt → pick two models sidebyside → tweak temperature → see streaming results → save the winning version to prompt/ → export a ticket. Iteration time drops from minutes to seconds. Preproduct surface for designers, PMs, and customersuccess folks who shouldn't need dev help to experiment.
It is the Koder analog of OpenAI Playground, Anthropic Workbench, and HuggingFace Spaces. *istinct from kode*— kode is the IDEclass product (full project context, file editing, multiturn agent loops); playground is the no-state workbench for "try this and tell me how it does."
Boundary vs neighbors
services/ai/kodeis IDE-class — different surface, different audience.services/ai/gateway,services/ai/runtime,services/ai/modelreg,services/ai/promptare primary producers (model exec, available models, curation, prompt registry).services/ai/toolsprovides the tool palette for agent-style runs.services/ai/traceintegrates for debugfromtrace-viewer.services/ai/evalreceives promoted runs as eval cases.
Features (v1 target)
- Flutter Web app at
playground.koder.dev(reuses koder_kit auththemei18n/safe-area) - Multi
model sidebyside chat (1pane or 2-pane) - Streaming responses token
bytoken; abort mid-stream - Parameter knobs per pane: temp, topp, topk, max_tokens, stop, seed
- System prompt editor + tool palette (toggle per run)
- Model picker driven by modelreg curated tags
- Prompt registry integration: load, edit (with diff), save as new version, set active
- Variable inputs auto-rendered from
{{var}}template placeholders - Saved sessions library + 60s auto-snapshot
- Signed share links (read-only, scoped, revocable, 30d default expiry)
- One
click exportas-ticket to any accessible backlog - Promote
toeval action (curator role) → ships run as eval case - Per-pane timing badges (ttft, total) + token + cost estimate
- Trace_id link for every run
Primary couplings
| Producer | Relationship |
|---|---|
services/ai/gateway |
LLM execution (streaming SSE) |
services/ai/runtime |
Available models |
services/ai/modelreg |
Curated picker source |
services/ai/prompt |
Registry — load, save, version |
services/ai/tools |
Tool palette |
services/ai/trace |
Per-run trace_id |
| Consumer | Relationship |
|---|---|
| Internal team (devs, PMs, designers, CS) | Daily prototyping surface |
services/ai/eval |
Receives promoted runs |
| External (post-MVP) | Public access via API key |
RFC and bootstrap
- RFC:
playground-RFC-001-foundations.kmd— *ccepted*20260509 - Bootstrap ticket:
services/ai/backlog/done/137-playground-bootstrap.md - Implementation tickets:
services/ai/playground/backlog/pending/{001..005}
Selfhostedfirst analysis (5 gates)
| Gate | Status | Notes |
|---|---|---|
| G1 Feature parity | pending | Multi-model + params + tools covers OpenAI Playground baseline |
| G2 Performance | pending | Streaming dominated by gateway; UI must stay 60fps during stream |
| G3 Stability | pending | Pre-MVP |
| G4 Capability | pending | Full IDE deferred to kode; mobile responsive but not primary |
| G5 Critical-path readiness | pending | Internal productivity unblock day 1 |