Ai gateway
AI Gateway
- *rea:*Intelligence
- *ath:*
ai/gateway - *ind:*Unified LLM proxy / router (OpenAI-compatible API)
Role in the stack
AI Gateway is the single entry point for any Koder product that needs to call an LLM. It exposes an OpenAI-compatible API and routes requests to the right backend — local LLMs (via ai/runtime + Ollama), or proprietary providers (OpenAI, Anthropic, Google Gemini, HuggingFace, vLLM, Stability, Deepgram) when enabled in config.
Running in production on s.r1 as LXC 128, internal endpoint http://10.0.1.128:7800/v1/. Consumers today include ai/voice (for transcription summarization); new products plug in by pointing their SDK at the gateway without hardcoding provider keys.
Primary couplings
| Consumer | Relationship |
|---|---|
ai/voice |
Summarizes transcriptions via gateway routing |
| Future products | Any Koder product that needs LLM inference |
Providers (configurable)
- *nabled by default:*local Koder AI Runtime (Ollama) on LXC 127
- *vailable when enabled:*OpenAI, Anthropic, Google Gemini, HuggingFace, vLLM, Stability, Deepgram
Interfaces
- OpenAI-compat
/v1/API (chat, completions, embeddings) - YAML configuration at
/etc/koder-ai-gateway/config.yaml - systemd service
koder-ai-gateway
Storage
| Aspect | Current (2026 |
Target (#048) |
|---|---|---|
| Driver | SQLite local (modernc.org/sqlite) |
Postgres via pgx/v5 against kdb LXC 10.0.1.20 |
| Tables | usage_events, usage_daily, usage_monthly (string-based tenant cols) |
Same DDL translated to Postgres dialect (migrations/0001_initial.up.sql) + later llm_token_usage w/ koder_user_id BIGINT + RLS (#043) |
| Multi-tenancy | App |
RLS policies per policies/multi-tenant-by-default.kmd §3 |
| Time-series | Manual hourly aggregation | Timescale hypertable on usage_events (Phase 2 of #048) |
Caminho ratificado em backlog/done/047-tut-discovery-sqlite-vs-postgres.md (caminho A — migrar pra Postgres antes do tracker centralizado RFC001). Migration plan em [`backlogpending048...](../../../../services/ai/gateway/backlog/pending/048-tut-gateway-sqlite-to-postgres-migration.md). Detalhe de current state em [docstechnicalstorage.kmd`](........servicesaigatewaydocstechnical/storage.kmd).
Agent stream protocol (20260516)
The unified Koder AI product (services/ai/ai/) ships its own gateway that proxies the OpenAIcompatible API AND exposes the [Koder Agent Stream Protocol](..specsai/agentstream.kmd) for live agent run subscription.
- *oute:*
GET /v1/agent/runs/{id}/stream— auto-detects SSE vs WebSocket upgrade - *ire:*7 event types (
reasoning.delta,text.delta,tool.start,tool.end,step.boundary,child.spawn,run.lifecycle) - *hroughput shaping (R1-R4):*100ms aggregation by default;
?detail=fullopts out - *econnect:*
Last-Event-ID(SSE) /{op:"resume",seq:N}(WS) replay from persistent store - *uth + tenancy:*JWT required (API
key auth refused for streams); crosstenant → 404 perspecs/multi-tenancy/contract.kmd - *ersistence:*
RunEventStoreinterface with two backends —InMemoryRunEventStore(default, bootsafe, ephemeral) orin viaKDBRunEventStore(optRUN_EVENT_STORE=kdb+KDB_BASE_URLKDB_API_KEYKDB_NAMESPACE). KDB backend runs idempotentCREATE TABLE/INDEX IF NOT EXISTSfrommigrations/001_agent_run_events.sqlon boot. Cross-tenant reads return 0 rows → handler surfaces 404 viaCountForTenant. Retention: 90d done / 30d aborted, pruned hourly. - *eference impls:*
- Go:
services/ai/ai/gateway/internal/stream/{events,hub,sse,ws,aggregator,store,batchwriter}.go - JS:
engines/sdk/js/agent-stream/index.js(exported as@koder/sdk/agent-stream)
- Go:
- *ests:*29 Go sub
tests + 7 JS conformance tests, all green under `race` - *ickets:*AICORE
120 (parent) → AICORE127, 128, 129, 130, 131, 132 (all closed 20260516)
Run control plane (20260516)
Companion to the agent stream protocol: REST surface for the outofband controls Manus / Devin 2.0 / Cline expose (pause, redirect, rewind, planning gate, follow-up suggestions).
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/agent/runs/{id}/pause |
Suspend run (Running → Paused) |
POST |
/v1/agent/runs/{id}/resume |
Resume paused run |
POST |
/v1/agent/runs/{id}/redirect |
Inject {message, rewind_to?}; emit run.lifecycle=redirecting |
POST |
/v1/agent/runs/{id}/rewind |
Roll back to a checkpoint by id or step-index |
GET POST |
/v1/agent/runs/{id}/checkpoints |
List / create checkpoints (runtime persists diff_ref to blob) |
POST GET |
/v1/agent/runs/{id}/plan |
Save / read the proposed plan |
POST |
/v1/agent/runs/{id}/approve_plan |
Approve with optional edits (smuggled step IDs rejected) |
POST |
/v1/agent/runs/{id}/reject_plan |
Reject → state Aborted |
POST GET |
/v1/agent/runs/{id}/follow_ups |
Save / read 3–5 follow-up suggestions |
GET |
/v1/agent/runs/{id}/events?from_seq=N&format=jsonl |
Export run events (AICORE-117 backend) |
- *tate machine:*see
services/ai/ai/gateway/internal/runctl/types.go— 8 states + transition table; terminal states reject all transitions. - *lanning gate modes:*
off(ClaudeCodeclassic),preview(default; agent emits plan and starts running),required(blocks on approval). The runtime decides which based on workspace settings + request. - *ulti
tenancy:*every endpoint enforces JWTbound tenant viaauth.ClaimsFromContext; crosstenant resolution returns 404 per [`specs/multitenancycontract.kmd`](..specsmulti-tenancycontract.kmd). - *rror codes:*
AICORE-CTL-0001..0008andAICORE-CTL-0099perspecs/errors/user-facing-messages.kmd. - *ersistence:*
runctl.Storeinterface (InMemory ships today); kdbbacked impl tracked alongside AICORE133. - *pec extension:*
agent-stream.kmdv0.2.0 adds theplan.proposalevent type. - *ickets:*AICORE
117, 121, 122, 123, 126 (gateway slice closed 20260516; runtime consumer slice AICORE134134.1134.3 closed 20260518; checkpoint+followup concrete impls 134.2134.4 + KITWEBKIT widgets 134.5/134.6 still pending). - *edirect intent persistence (AICORE-134.1):*
/redirectand/rewindroutes now persist arunctl.RedirectIntent(message + resolved rewind target) before transitioning state. The runtime fetches+clears the intent inwaitForRunnableand jumps the Execute loop index when a checkpoint resolved.
Agent runtime (20260516, AICORE-137 first slice)
The muscle that consumes everything shipped above. internal/agent/ owns a Runtime that drives a Plan of (tool, input) steps endtoend and surfaces the protocol events.
- *Tool
interface:**Name() / Description() / Call(ctx, input) (output, error). Implementations MUST be concurrent-safe. Built-inEchoTool` for smoke tests. - *ToolRegistry
:** concurrent-safe map;Register / Get / Names / Len`. Overwriting is allowed so tests can swap impls. - *Runtime.Execute(ctx, Plan)`:*
- State pre-check: rejects unless run is in
Running/Paused/Redirecting. - For each step:
waitForRunnable(pollsrunctl.Storeevery 250ms while Paused) → emitstep.boundary→ emittool.start→Tool.Call(ctx, input)(60s timeout) → emittool.end(ok + duration_ms, or error withKAI-TOOL-CALL-0001). Redirectingconsumes the persistedrunctl.RedirectIntent(set by/redirector/rewind), clears it, transitions back toRunning, and when the intent carries a resolvedRewindToCheckpointIDjumps the loop index to the matching step (AICORE-134.1).- Aborted during pause returns
ErrAborted; tool errors transition the run toStateError; unknown tools emittool.endwithKAI-TOOL-LOOKUP-0001and returnErrToolNotFound. - Clean completion transitions to
StateDoneand emits finalrun.lifecycle done. - *heckpoints (AICORE
134.2):*only.Runtime.Checkpointeris a pluggableCheckpointCaptureinvoked between successful steps; defaultnil(no checkpoints),NopCheckpointCapture{}enables timeline - *ollow
ups (AICORE134.4):*Runtime.FollowUpsis a pluggableFollowUpGeneratorinvoked afterDoneto populate/follow_ups; defaultnil(no suggestions). Validates viarunctl.ValidateFollowUps(3–5 items). - *lanAndExecute (AICORE
134.3):*alternate entry that wiresclassifies steps before SavePlan so UIs render the right icon.Planner→runctl.Plansave →PlanningGateflow →Execute.IntentFromToolkeyword
- State pre-check: rejects unless run is in
- *Sink
interface** mirrors*stream.PersistentSink.Publishso the runtime is decoupled from the Hub + persistence types. Production binds the gateway singleton; tests injectcaptureSink`. - *ests:*10 sub
tests verde under `race` (ToolRegistry, EchoTool, Execute happy path / lookup error / pausedthenresumed / aborted / redirect / bad initial state / missing run_id / tool error).
The runtime deliberately leaves the LLMdriven planner to *ICORE137b*— this slice ships the executable primitive; planning ships on top of it.
Default tool registry (AICORE-137d)
NewServer builds the agent runtime with agent.DefaultRegistry(fetcher, pipeline), exposing three tools out of the box:
| Tool | Wraps | Purpose |
|---|---|---|
echo |
n/a | Canonical reference / smoke check |
echo_stream |
n/a | StreamingTool reference (AICORE-137c) — emits text.delta per word |
web_fetch_rich |
web.FetchRich |
Static HTTP fetch + boilerplate-stripped markdown |
web_extract |
web.Pipeline.Run |
URL + schema → JSON via LLMExtractor |
Tool inputs mirror the REST body shapes of /v1/tools/web_fetch_rich and /v1/tools/web_extract so Plans can reuse payload templates verbatim. Server.AgentRuntime() and Server.AgentTools() are public accessors for the admin handlers below.
Agent admin endpoints (AICORE-137e)
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/agent/runs |
Create + start a run from a Plan; returns 202 with {run_id, state, stream_url, step_count} |
GET |
/v1/agent/tools |
List the registered tools (name, description) for planner consumption |
- *ody:*
{steps: [{tool, input?, id?}], prompt?, planner_options?, planning_gate?: off|preview|required, workspace_id?}. Steps OR Prompt must be set (Steps wins when both — debug path). Missing tool names or unknown tools return 400 withKAI-AGENT-PROTO-0003/0004before any run is created. - *sync semantics:*202 returns immediately; the runtime runs on a goroutine with a 1h ceiling. Clients subscribe to
stream_urlto observe progress. - *lanning gate:*
offpreviewstart the runtime immediately;requiredparks the run inawaiting_approval, auto-saves a passthroughrunctl.Plan, and a background watcher polls (10min max) for the user's `POST ...approve_planbefore firingRuntime.Execute`. - *enancy:*every run is registered with
stream.RunRegistryfor the authenticatedkoder_user_idso the live stream route enforces cross-tenant → 404.
This is the endtoend user surface for the agent — the cycle from "I want X done" → planning → execution → live event stream → completion now closes inside the gateway.
LLM planner (AICORE-137b)
When POST /v1/agent/runs body carries prompt instead of steps, the gateway asks the LLM to turn the prompt into an executable Plan. The planner reads the same tool catalog /v1/agent/tools lists; every step is validated against the registry before any run is created (defense against tool hallucination + prompt-injection of unplanned capabilities).
- *efault model:*
claude-haiku-4-5-20251001(cheap + JSONtight). Override percall viaplanner_options.modelor globally viaEXTRACTOR_MODEL. - *tep cap:*
planner_options.max_steps(default 8). LLM replies are truncated. - *SON tolerance:*the parser strips
json fences and accepts both bare arrays[...]and wrapper objects{steps: [...]} / {plan: [...]} / {actions: [...]}. Accepts bothtoolandnamefield aliases. Step IDs auto-assigned (s1,s2`, ...) when LLM omits. - *rror codes:*
KAI-AGENT-PLAN-0001(planner not configured) /KAI-AGENT-PLAN-0002(planner returned error: empty prompt, unknown tool, malformed reply, chat upstream failure). - *ests:*15 sub
tests inlevel tests covering promptinternal/agent/planner_test.go+ 3 routedriven happy path, nopromptnosteps 400, planner-error 502.
Production binds LLMPlanner to the same LoopbackChatCompleter that backs the LLMExtractor (AICORE-135) — single egress path, single retry policy, same observable surface.
Selfrepair loop (AICORE116)
When a tool call fails AND the configured Matcher(toolName, err) returns true, the runtime reprompts the planner with the failure context and runs the replacement step. Defaulton for test-runner tools (pytestjestplaywrightgotestflutter_test/etc.) with MaxIterations=3.
- *onfig struct:*
agent.RepairConfig{Planner, Matcher, MaxIterations}. Zerovalue disables repair (backcompat with #137). - *efault:*
agent.TestRunnerMatcher()— matches any tool name containing one of 11 test-runner tokens. Extend viaTestRunnerMatcher("custom_runner"). - *eplacement step IDs:*
<original>_repair_<N>so emittedstep.boundary/tool.start/tool.endevents don't collide on the wire. - *elemetry:*
RepairTelemetry.Snapshot()returns(attempts, repaired, failed)atomic counters. Prometheus exposed viakoder_ai_test_repair_attempts_total{outcome}(AICORE116c) +116e).koder_ai_test_repair_iterations{outcome}histogram (AICORE - *udit log (AICORE
116d, 116f):*every attempt + final outcome emits onemonthagent.repairslog Info entry with{run_id, koder_user_id, tool, step_id, attempt, outcome, original_error, iterations, ts}.REPAIR_AUDIT_BACKEND=kdbopts intoagent.KDBAuditSink(persistentagent_repair_attemptstable + 12PruneOlderThanretention). Misconfig falls back to slog with WARN. Write path is fireandforget so audit infra outages don't disturb the runtime. - *verride:*
Server.AgentRuntime().Repair = ...before serving traffic to tighten/widen the matcher or disable repair entirely.
Streaming tool output (AICORE-137c)
Tools that produce textshaped output (markdown, generated content, LLMextracted streams) can opt into incremental delivery by implementing the StreamingTool interface:
type StreamingTool interface {
Tool
CallStream(ctx, input, chunks chan<- string) (json.RawMessage, error)
}When the runtime encounters a StreamingTool, every value sent on chunks becomes a text.delta envelope between tool.start and tool.end. The final JSON return value is still surfaced via tool.end.output, so replay + persistence remain canonical.
- *ackward compatible:*non-streaming tools (only
Tool) run unchanged — notext.deltaevents emitted. - *eference impl:*
StreamingEchoTool(registered asecho_stream) splits input on whitespace, emits one chunk per word; useful for UI smoke-tests of streaming consumers. - *ancellation:*
ctx.Done()aborts streaming cleanly mid-flight.
Provider plane (20260516, AICORE-118)
Provider interface gained two optional extensions and the registry grew a factory layer:
- *CatalogProvider
** —Models(ctx) ([]ModelInfo, error). Implemented by all 6 providers (OpenAI, Anthropic, Gemini, Moonshot, Manus, OpenRouter) via static catalogs inproviders/catalog.go`. - *HealthChecker
** —HealthCheck(ctx) error.HEAD` against each provider's models endpoint; auth-gated 401/403 counts as healthy. - *actory registry*—
Registry.RegisterFactory(name, func(apiKey) Provider)+BuildFromKeys(map[name]apiKey)replaces the longif cfg.X != "" { Register(NewX(cfg.X)) }chain inmain.go. - *Registry.AggregateModels(ctx)
** — fans out to every CatalogProvider; 1h cache; sorted by (OwnedBy, ID). BacksGET v1models` (previously a hardcoded list). - *Registry.Health(ctx)
** — fans out to every HealthChecker; non-implementers reported as OK=false +error"not implemented"(vserror<probe failure>`). - *ew route:*
GET /v1/providers/health— JSON aggregate.
Web tools (20260516, AICORE113, AICORE114)
Two LLMtool endpoints on the gateway, stdlib only (no chromedp / no readability dep yet).
| Method | Path | Purpose |
|---|---|---|
POST |
/v1/tools/web_fetch_rich |
Static HTTP fetch + boilerplate-stripped markdowntexthtml; metadata + links extraction |
POST |
/v1/tools/web_extract |
URL + schema (flat map of field→type) → JSON; behind Extractor interface (nullExtractor placeholder ships today) |
- *ody cap:*5MB on
web_fetch_rich(MaxBodyBytes). - *S render:*
opts.js_render=truereturns 501KAI-WEB-FETCH-JS-0001until chromedp engine ships (AICORE-135). - *chema dialect (MVP):*
{name: "string", price: "number", in_stock: "boolean"}— full JSON Schema in followup. Unsupported types return 400 `KAIWEBEXTRACTSCHEMA-0001`. - *LM step (AICORE
135):*haikuLLMExtractorwires to aChatCompleterinterface; production bindsLoopbackChatCompleter(POSTs to the gateway's own/v1/chat/completions— reuses provider routing, retries, auth). Default model `claude4520251001process URL + any registeredoverridable viaEXTRACTORMODEL. Loopback URL/key viaEXTRACTORBASEURL/EXTRACTORAPI_KEY` (defaults: inkgw_*key). - *aching:*
web_extractkeys onsha256(url + sorted schema), default TTL 1h (opts.cache_ttl_secs < 0disables,0= default). - *uth:*JWT
only (API keys forbidden — outbound HTTP is peruserratelimited). All routes useauth.ClaimsFromContext. - *rror codes:*
KAI-WEB-AUTH-0001,KAI-WEB-FETCH-0001/0002/0099,KAI-WEB-FETCH-JS-0001,KAI-WEB-EXTRACT-0001/0002/0099,KAI-WEB-EXTRACT-SCHEMA-0001. - *ickets:*AICORE
113 (rich fetch), AICORE114 (extract), AICORE118 (provider refactor backing both), AICORE135 (LLMExtractor wireup) — all closed 20260516. Followup AICORE136 (chromedp JS render).
Checkpoint retention + erasure cascade (20260523, AICORE134.2c + AICORE138)
The retention surface that backs the agent runtime's CheckpointCapture pipeline. Policy lives in meta/docs/stack/policies/checkpoint-retention.kmd; this section is the operator's view.
- *er
run cap (R1):*1` disables (debug only).agent.EnforcePerRunCapruns inline inmaybeCheckpointafter every successful capture. Default 10 envelopes per run; `RETENTIONPERRUN_CAP= - *er
tenant cap (R2):*across-runs until the tenant's totalagent.EnforcePerTenantCapsweeps oldestSizeBytes≤ cap. Default 500 MiB (RETENTION_PER_TENANT_BYTES). - *one-run grace (R3):*
agent.EnforceGraceExpirydrops every checkpoint belonging to terminal runs older than the configured window. Default 7 days, clamps at 90 (RETENTION_DONE_RUN_GRACE_SECS). - *eriodic worker:*
Server.StartRetentionTickerspawns the crosstenant ticker that runs R2 + R3 everybacked registries will plug in via the sameRETENTION_INTERVAL(default 15 min). TenantSource derives fromInMemoryRunRegistry.Tenants()— kdbtenantListerinterface. - *ulti
tenant isolation (R4):*everytenant deletes mask to silent nil perRetentionStorecall passeskoder_user_id; crossmulti-tenant-by-default.kmd§ 5. - *rasure cascade (R5, AICORE
138):*servicegateway/internal/erasure/Subscriberconsumesidentity.user.erasedenvelopes from thekoder:events:idRedis Stream (crossevents.kmd canonical type + stream key). On every erasure event,delivery on an emptied tenant is a noop. Wired inagent.EnforceErasurewalks every checkpoint owned by the subjectkoder_user_idand drivesdeleteOnewithReasonErasure. Idempotent — remain.gowhenREDIS_URLis set; emptyREDIS_URLkeeps R1R2R3 local enforcement without R5. Consumer group:koder-ai-gateway-erasure(pattern<service>-erasure-cascadeper crossserviceevents.kmd § R3). - *udit log (R6):*every deletion emits a structured slog Info entry
event=checkpoint.deleted run_id=… checkpoint_id=… koder_user_id=… size_bytes=… reason=…wherereason ∈ {per_run_cap, per_tenant_cap, grace_expired, erasure}. ShareddeleteOnehelper guarantees the same shape regardless of which enforcer fired. - *lob store:*
agent.FsBlobStorerooted atCHECKPOINT_DIR(default/var/lib/koder-ai-gateway/checkpoints); contentaddressed (tenant<tenant>/<scheme>/<sha256>); crossGet/Deletemask toErrBlobNotFound. - *estore path:*
Server.SetRewindRestorersplugs aRestorerRouter; default boot wiresNopRewindRestorer+TarZstdRewindRestorer.OverlayfsRewindRestorer(AICORE134.2b) blocked on AICORE106 sandbox-model RFC. - *ickets:*AICORE
134.2c (BlobStore + RewindRestorer + R1R2R3R4R6 + periodic worker + main.go wiring) closed 20260519; AICORE138 (R5 erasure cascade subscriber) closed 20260523. AICORE-134.2b (overlayfs producer + restorer) still pending.