Ai playground

AI Playground — Prompt / Agent Workbench

  • *rea:*Intelligence
  • *ath:*services/ai/playground
  • *ind:*Web workbench for prompt and agent prototyping (multimodel, sideby-side, share, export)
  • *tatus:*v0.0.1 — sector bootstrapping (20260509)

Role in the stack

playground flattens the promptiteration loop. Today prompt prototyping happens in scripts, ngrokexposed FastAPI demos, or — worst case — directly in production. Each developer has their own setup; comparing two prompts on the same input requires running them serially and squinting at logs. Trying a new model means editing config and redeploying. None of this scales beyond one or two people.

This sector is the workbench: open the page → type a prompt → pick two models sidebyside → tweak temperature → see streaming results → save the winning version to prompt/ → export a ticket. Iteration time drops from minutes to seconds. Preproduct surface for designers, PMs, and customersuccess folks who shouldn't need dev help to experiment.

It is the Koder analog of OpenAI Playground, Anthropic Workbench, and HuggingFace Spaces. *istinct from kode*— kode is the IDEclass product (full project context, file editing, multiturn agent loops); playground is the no-state workbench for "try this and tell me how it does."

Boundary vs neighbors

  • services/ai/kode is IDE-class — different surface, different audience.
  • services/ai/gateway, services/ai/runtime, services/ai/modelreg, services/ai/prompt are primary producers (model exec, available models, curation, prompt registry).
  • services/ai/tools provides the tool palette for agent-style runs.
  • services/ai/trace integrates for debugfromtrace-viewer.
  • services/ai/eval receives promoted runs as eval cases.

Features (v1 target)

  • Flutter Web app at playground.koder.dev (reuses koder_kit auththemei18n/safe-area)
  • Multimodel sidebyside chat (1pane or 2-pane)
  • Streaming responses tokenbytoken; abort mid-stream
  • Parameter knobs per pane: temp, topp, topk, max_tokens, stop, seed
  • System prompt editor + tool palette (toggle per run)
  • Model picker driven by modelreg curated tags
  • Prompt registry integration: load, edit (with diff), save as new version, set active
  • Variable inputs auto-rendered from {{var}} template placeholders
  • Saved sessions library + 60s auto-snapshot
  • Signed share links (read-only, scoped, revocable, 30d default expiry)
  • Oneclick exportas-ticket to any accessible backlog
  • Promotetoeval action (curator role) → ships run as eval case
  • Per-pane timing badges (ttft, total) + token + cost estimate
  • Trace_id link for every run

Primary couplings

Producer Relationship
services/ai/gateway LLM execution (streaming SSE)
services/ai/runtime Available models
services/ai/modelreg Curated picker source
services/ai/prompt Registry — load, save, version
services/ai/tools Tool palette
services/ai/trace Per-run trace_id
Consumer Relationship
Internal team (devs, PMs, designers, CS) Daily prototyping surface
services/ai/eval Receives promoted runs
External (post-MVP) Public access via API key

RFC and bootstrap

  • RFC: playground-RFC-001-foundations.kmd — *ccepted*20260509
  • Bootstrap ticket: services/ai/backlog/done/137-playground-bootstrap.md
  • Implementation tickets: services/ai/playground/backlog/pending/{001..005}

Selfhostedfirst analysis (5 gates)

Gate Status Notes
G1 Feature parity pending Multi-model + params + tools covers OpenAI Playground baseline
G2 Performance pending Streaming dominated by gateway; UI must stay 60fps during stream
G3 Stability pending Pre-MVP
G4 Capability pending Full IDE deferred to kode; mobile responsive but not primary
G5 Critical-path readiness pending Internal productivity unblock day 1

Source: ../home/koder/dev/koder/meta/docs/stack/modules/ai-playground.md