Ai gateway

AI Gateway

  • *rea:*Intelligence
  • *ath:*ai/gateway
  • *ind:*Unified LLM proxy / router (OpenAI-compatible API)

Role in the stack

AI Gateway is the single entry point for any Koder product that needs to call an LLM. It exposes an OpenAI-compatible API and routes requests to the right backend — local LLMs (via ai/runtime + Ollama), or proprietary providers (OpenAI, Anthropic, Google Gemini, HuggingFace, vLLM, Stability, Deepgram) when enabled in config.

Running in production on s.r1 as LXC 128, internal endpoint http://10.0.1.128:7800/v1/. Consumers today include ai/voice (for transcription summarization); new products plug in by pointing their SDK at the gateway without hardcoding provider keys.

Primary couplings

Consumer Relationship
ai/voice Summarizes transcriptions via gateway routing
Future products Any Koder product that needs LLM inference

Providers (configurable)

  • *nabled by default:*local Koder AI Runtime (Ollama) on LXC 127
  • *vailable when enabled:*OpenAI, Anthropic, Google Gemini, HuggingFace, vLLM, Stability, Deepgram

Interfaces

  • OpenAI-compat /v1/ API (chat, completions, embeddings)
  • YAML configuration at /etc/koder-ai-gateway/config.yaml
  • systemd service koder-ai-gateway

Storage

Aspect Current (20260509) Target (#048)
Driver SQLite local (modernc.org/sqlite) Postgres via pgx/v5 against kdb LXC 10.0.1.20
Tables usage_events, usage_daily, usage_monthly (string-based tenant cols) Same DDL translated to Postgres dialect (migrations/0001_initial.up.sql) + later llm_token_usage w/ koder_user_id BIGINT + RLS (#043)
Multi-tenancy Applayeronly (no RLS) RLS policies per policies/multi-tenant-by-default.kmd §3
Time-series Manual hourly aggregation Timescale hypertable on usage_events (Phase 2 of #048)

Caminho ratificado em backlog/done/047-tut-discovery-sqlite-vs-postgres.md (caminho A — migrar pra Postgres antes do tracker centralizado RFC001). Migration plan em [`backlogpending048...](../../../../services/ai/gateway/backlog/pending/048-tut-gateway-sqlite-to-postgres-migration.md). Detalhe de current state em [docstechnicalstorage.kmd`](........servicesaigatewaydocstechnical/storage.kmd).

Agent stream protocol (20260516)

The unified Koder AI product (services/ai/ai/) ships its own gateway that proxies the OpenAIcompatible API AND exposes the [Koder Agent Stream Protocol](..specsai/agentstream.kmd) for live agent run subscription.

  • *oute:*GET /v1/agent/runs/{id}/stream — auto-detects SSE vs WebSocket upgrade
  • *ire:*7 event types (reasoning.delta, text.delta, tool.start, tool.end, step.boundary, child.spawn, run.lifecycle)
  • *hroughput shaping (R1-R4):*100ms aggregation by default; ?detail=full opts out
  • *econnect:*Last-Event-ID (SSE) / {op:"resume",seq:N} (WS) replay from persistent store
  • *uth + tenancy:*JWT required (APIkey auth refused for streams); crosstenant → 404 per specs/multi-tenancy/contract.kmd
  • *ersistence:*RunEventStore interface with two backends — InMemoryRunEventStore (default, bootsafe, ephemeral) or KDBRunEventStore (optin via RUN_EVENT_STORE=kdb + KDB_BASE_URLKDB_API_KEYKDB_NAMESPACE). KDB backend runs idempotent CREATE TABLE/INDEX IF NOT EXISTS from migrations/001_agent_run_events.sql on boot. Cross-tenant reads return 0 rows → handler surfaces 404 via CountForTenant. Retention: 90d done / 30d aborted, pruned hourly.
  • *eference impls:*
    • Go: services/ai/ai/gateway/internal/stream/{events,hub,sse,ws,aggregator,store,batchwriter}.go
    • JS: engines/sdk/js/agent-stream/index.js (exported as @koder/sdk/agent-stream)
  • *ests:*29 Go subtests + 7 JS conformance tests, all green under `race`
  • *ickets:*AICORE120 (parent) → AICORE127, 128, 129, 130, 131, 132 (all closed 20260516)

Run control plane (20260516)

Companion to the agent stream protocol: REST surface for the outofband controls Manus / Devin 2.0 / Cline expose (pause, redirect, rewind, planning gate, follow-up suggestions).

Method Path Purpose
POST /v1/agent/runs/{id}/pause Suspend run (Running → Paused)
POST /v1/agent/runs/{id}/resume Resume paused run
POST /v1/agent/runs/{id}/redirect Inject {message, rewind_to?}; emit run.lifecycle=redirecting
POST /v1/agent/runs/{id}/rewind Roll back to a checkpoint by id or step-index
GET POST /v1/agent/runs/{id}/checkpoints List / create checkpoints (runtime persists diff_ref to blob)
POST GET /v1/agent/runs/{id}/plan Save / read the proposed plan
POST /v1/agent/runs/{id}/approve_plan Approve with optional edits (smuggled step IDs rejected)
POST /v1/agent/runs/{id}/reject_plan Reject → state Aborted
POST GET /v1/agent/runs/{id}/follow_ups Save / read 3–5 follow-up suggestions
GET /v1/agent/runs/{id}/events?from_seq=N&format=jsonl Export run events (AICORE-117 backend)
  • *tate machine:*see services/ai/ai/gateway/internal/runctl/types.go — 8 states + transition table; terminal states reject all transitions.
  • *lanning gate modes:*off (ClaudeCodeclassic), preview (default; agent emits plan and starts running), required (blocks on approval). The runtime decides which based on workspace settings + request.
  • *ultitenancy:*every endpoint enforces JWTbound tenant via auth.ClaimsFromContext; crosstenant resolution returns 404 per [`specs/multitenancycontract.kmd`](..specsmulti-tenancycontract.kmd).
  • *rror codes:*AICORE-CTL-0001..0008 and AICORE-CTL-0099 per specs/errors/user-facing-messages.kmd.
  • *ersistence:*runctl.Store interface (InMemory ships today); kdbbacked impl tracked alongside AICORE133.
  • *pec extension:*agent-stream.kmd v0.2.0 adds the plan.proposal event type.
  • *ickets:*AICORE117, 121, 122, 123, 126 (gateway slice closed 20260516; runtime consumer slice AICORE134134.1134.3 closed 20260518; checkpoint+followup concrete impls 134.2134.4 + KITWEBKIT widgets 134.5/134.6 still pending).
  • *edirect intent persistence (AICORE-134.1):*/redirect and /rewind routes now persist a runctl.RedirectIntent (message + resolved rewind target) before transitioning state. The runtime fetches+clears the intent in waitForRunnable and jumps the Execute loop index when a checkpoint resolved.

Agent runtime (20260516, AICORE-137 first slice)

The muscle that consumes everything shipped above. internal/agent/ owns a Runtime that drives a Plan of (tool, input) steps endtoend and surfaces the protocol events.

  • *Tool interface:** Name() / Description() / Call(ctx, input) (output, error). Implementations MUST be concurrent-safe. Built-in EchoTool` for smoke tests.
  • *ToolRegistry:** concurrent-safe map; Register / Get / Names / Len`. Overwriting is allowed so tests can swap impls.
  • *Runtime.Execute(ctx, Plan)`:*
    • State pre-check: rejects unless run is in Running / Paused / Redirecting.
    • For each step: waitForRunnable (polls runctl.Store every 250ms while Paused) → emit step.boundary → emit tool.startTool.Call(ctx, input) (60s timeout) → emit tool.end (ok + duration_ms, or error with KAI-TOOL-CALL-0001).
    • Redirecting consumes the persisted runctl.RedirectIntent (set by /redirect or /rewind), clears it, transitions back to Running, and when the intent carries a resolved RewindToCheckpointID jumps the loop index to the matching step (AICORE-134.1).
    • Aborted during pause returns ErrAborted; tool errors transition the run to StateError; unknown tools emit tool.end with KAI-TOOL-LOOKUP-0001 and return ErrToolNotFound.
    • Clean completion transitions to StateDone and emits final run.lifecycle done.
    • *heckpoints (AICORE134.2):*Runtime.Checkpointer is a pluggable CheckpointCapture invoked between successful steps; default nil (no checkpoints), NopCheckpointCapture{} enables timelineonly.
    • *ollowups (AICORE134.4):*Runtime.FollowUps is a pluggable FollowUpGenerator invoked after Done to populate /follow_ups; default nil (no suggestions). Validates via runctl.ValidateFollowUps (3–5 items).
    • *lanAndExecute (AICORE134.3):*alternate entry that wires Plannerrunctl.Plan save → PlanningGate flow → Execute. IntentFromTool keywordclassifies steps before SavePlan so UIs render the right icon.
  • *Sink interface** mirrors *stream.PersistentSink.Publish so the runtime is decoupled from the Hub + persistence types. Production binds the gateway singleton; tests inject captureSink`.
  • *ests:*10 subtests verde under `race` (ToolRegistry, EchoTool, Execute happy path / lookup error / pausedthenresumed / aborted / redirect / bad initial state / missing run_id / tool error).

The runtime deliberately leaves the LLMdriven planner to *ICORE137b*— this slice ships the executable primitive; planning ships on top of it.

Default tool registry (AICORE-137d)

NewServer builds the agent runtime with agent.DefaultRegistry(fetcher, pipeline), exposing three tools out of the box:

Tool Wraps Purpose
echo n/a Canonical reference / smoke check
echo_stream n/a StreamingTool reference (AICORE-137c) — emits text.delta per word
web_fetch_rich web.FetchRich Static HTTP fetch + boilerplate-stripped markdown
web_extract web.Pipeline.Run URL + schema → JSON via LLMExtractor

Tool inputs mirror the REST body shapes of /v1/tools/web_fetch_rich and /v1/tools/web_extract so Plans can reuse payload templates verbatim. Server.AgentRuntime() and Server.AgentTools() are public accessors for the admin handlers below.

Agent admin endpoints (AICORE-137e)

Method Path Purpose
POST /v1/agent/runs Create + start a run from a Plan; returns 202 with {run_id, state, stream_url, step_count}
GET /v1/agent/tools List the registered tools (name, description) for planner consumption
  • *ody:*{steps: [{tool, input?, id?}], prompt?, planner_options?, planning_gate?: off|preview|required, workspace_id?}. Steps OR Prompt must be set (Steps wins when both — debug path). Missing tool names or unknown tools return 400 with KAI-AGENT-PROTO-0003/0004 before any run is created.
  • *sync semantics:*202 returns immediately; the runtime runs on a goroutine with a 1h ceiling. Clients subscribe to stream_url to observe progress.
  • *lanning gate:*offpreview start the runtime immediately; required parks the run in awaiting_approval, auto-saves a passthrough runctl.Plan, and a background watcher polls (10min max) for the user's `POST ...approve_plan before firing Runtime.Execute`.
  • *enancy:*every run is registered with stream.RunRegistry for the authenticated koder_user_id so the live stream route enforces cross-tenant → 404.

This is the endtoend user surface for the agent — the cycle from "I want X done" → planning → execution → live event stream → completion now closes inside the gateway.

LLM planner (AICORE-137b)

When POST /v1/agent/runs body carries prompt instead of steps, the gateway asks the LLM to turn the prompt into an executable Plan. The planner reads the same tool catalog /v1/agent/tools lists; every step is validated against the registry before any run is created (defense against tool hallucination + prompt-injection of unplanned capabilities).

  • *efault model:*claude-haiku-4-5-20251001 (cheap + JSONtight). Override percall via planner_options.model or globally via EXTRACTOR_MODEL.
  • *tep cap:*planner_options.max_steps (default 8). LLM replies are truncated.
  • *SON tolerance:*the parser strips json fences and accepts both bare arrays [...] and wrapper objects {steps: [...]} / {plan: [...]} / {actions: [...]}. Accepts both tool and name field aliases. Step IDs auto-assigned (s1, s2`, ...) when LLM omits.
  • *rror codes:*KAI-AGENT-PLAN-0001 (planner not configured) / KAI-AGENT-PLAN-0002 (planner returned error: empty prompt, unknown tool, malformed reply, chat upstream failure).
  • *ests:*15 subtests in internal/agent/planner_test.go + 3 routelevel tests covering promptdriven happy path, nopromptnosteps 400, planner-error 502.

Production binds LLMPlanner to the same LoopbackChatCompleter that backs the LLMExtractor (AICORE-135) — single egress path, single retry policy, same observable surface.

Selfrepair loop (AICORE116)

When a tool call fails AND the configured Matcher(toolName, err) returns true, the runtime reprompts the planner with the failure context and runs the replacement step. Defaulton for test-runner tools (pytestjestplaywrightgotestflutter_test/etc.) with MaxIterations=3.

  • *onfig struct:*agent.RepairConfig{Planner, Matcher, MaxIterations}. Zerovalue disables repair (backcompat with #137).
  • *efault:*agent.TestRunnerMatcher() — matches any tool name containing one of 11 test-runner tokens. Extend via TestRunnerMatcher("custom_runner").
  • *eplacement step IDs:*<original>_repair_<N> so emitted step.boundary / tool.start / tool.end events don't collide on the wire.
  • *elemetry:*RepairTelemetry.Snapshot() returns (attempts, repaired, failed) atomic counters. Prometheus exposed via koder_ai_test_repair_attempts_total{outcome} (AICORE116c) + koder_ai_test_repair_iterations{outcome} histogram (AICORE116e).
  • *udit log (AICORE116d, 116f):*every attempt + final outcome emits one agent.repair slog Info entry with {run_id, koder_user_id, tool, step_id, attempt, outcome, original_error, iterations, ts}. REPAIR_AUDIT_BACKEND=kdb opts into agent.KDBAuditSink (persistent agent_repair_attempts table + 12month PruneOlderThan retention). Misconfig falls back to slog with WARN. Write path is fireandforget so audit infra outages don't disturb the runtime.
  • *verride:*Server.AgentRuntime().Repair = ... before serving traffic to tighten/widen the matcher or disable repair entirely.

Streaming tool output (AICORE-137c)

Tools that produce textshaped output (markdown, generated content, LLMextracted streams) can opt into incremental delivery by implementing the StreamingTool interface:

type StreamingTool interface {
    Tool
    CallStream(ctx, input, chunks chan<- string) (json.RawMessage, error)
}

When the runtime encounters a StreamingTool, every value sent on chunks becomes a text.delta envelope between tool.start and tool.end. The final JSON return value is still surfaced via tool.end.output, so replay + persistence remain canonical.

  • *ackward compatible:*non-streaming tools (only Tool) run unchanged — no text.delta events emitted.
  • *eference impl:*StreamingEchoTool (registered as echo_stream) splits input on whitespace, emits one chunk per word; useful for UI smoke-tests of streaming consumers.
  • *ancellation:*ctx.Done() aborts streaming cleanly mid-flight.

Provider plane (20260516, AICORE-118)

Provider interface gained two optional extensions and the registry grew a factory layer:

  • *CatalogProvider** — Models(ctx) ([]ModelInfo, error). Implemented by all 6 providers (OpenAI, Anthropic, Gemini, Moonshot, Manus, OpenRouter) via static catalogs in providers/catalog.go`.
  • *HealthChecker** — HealthCheck(ctx) error. HEAD` against each provider's models endpoint; auth-gated 401/403 counts as healthy.
  • *actory registry*— Registry.RegisterFactory(name, func(apiKey) Provider) + BuildFromKeys(map[name]apiKey) replaces the long if cfg.X != "" { Register(NewX(cfg.X)) } chain in main.go.
  • *Registry.AggregateModels(ctx)** — fans out to every CatalogProvider; 1h cache; sorted by (OwnedBy, ID). Backs GET v1models` (previously a hardcoded list).
  • *Registry.Health(ctx)** — fans out to every HealthChecker; non-implementers reported as OK=false + error"not implemented" (vs error<probe failure>`).
  • *ew route:*GET /v1/providers/health — JSON aggregate.

Web tools (20260516, AICORE113, AICORE114)

Two LLMtool endpoints on the gateway, stdlib only (no chromedp / no readability dep yet).

Method Path Purpose
POST /v1/tools/web_fetch_rich Static HTTP fetch + boilerplate-stripped markdowntexthtml; metadata + links extraction
POST /v1/tools/web_extract URL + schema (flat map of field→type) → JSON; behind Extractor interface (nullExtractor placeholder ships today)
  • *ody cap:*5MB on web_fetch_rich (MaxBodyBytes).
  • *S render:*opts.js_render=true returns 501 KAI-WEB-FETCH-JS-0001 until chromedp engine ships (AICORE-135).
  • *chema dialect (MVP):*{name: "string", price: "number", in_stock: "boolean"} — full JSON Schema in followup. Unsupported types return 400 `KAIWEBEXTRACTSCHEMA-0001`.
  • *LM step (AICORE135):*LLMExtractor wires to a ChatCompleter interface; production binds LoopbackChatCompleter (POSTs to the gateway's own /v1/chat/completions — reuses provider routing, retries, auth). Default model `claudehaiku4520251001 overridable via EXTRACTORMODEL. Loopback URL/key via EXTRACTORBASEURL / EXTRACTORAPI_KEY` (defaults: inprocess URL + any registered kgw_* key).
  • *aching:*web_extract keys on sha256(url + sorted schema), default TTL 1h (opts.cache_ttl_secs < 0 disables, 0 = default).
  • *uth:*JWTonly (API keys forbidden — outbound HTTP is peruserratelimited). All routes use auth.ClaimsFromContext.
  • *rror codes:*KAI-WEB-AUTH-0001, KAI-WEB-FETCH-0001/0002/0099, KAI-WEB-FETCH-JS-0001, KAI-WEB-EXTRACT-0001/0002/0099, KAI-WEB-EXTRACT-SCHEMA-0001.
  • *ickets:*AICORE113 (rich fetch), AICORE114 (extract), AICORE118 (provider refactor backing both), AICORE135 (LLMExtractor wireup) — all closed 20260516. Followup AICORE136 (chromedp JS render).

Checkpoint retention + erasure cascade (20260523, AICORE134.2c + AICORE138)

The retention surface that backs the agent runtime's CheckpointCapture pipeline. Policy lives in meta/docs/stack/policies/checkpoint-retention.kmd; this section is the operator's view.

  • *errun cap (R1):*agent.EnforcePerRunCap runs inline in maybeCheckpoint after every successful capture. Default 10 envelopes per run; `RETENTIONPERRUN_CAP=1` disables (debug only).
  • *ertenant cap (R2):*agent.EnforcePerTenantCap sweeps oldestacross-runs until the tenant's total SizeBytes ≤ cap. Default 500 MiB (RETENTION_PER_TENANT_BYTES).
  • *one-run grace (R3):*agent.EnforceGraceExpiry drops every checkpoint belonging to terminal runs older than the configured window. Default 7 days, clamps at 90 (RETENTION_DONE_RUN_GRACE_SECS).
  • *eriodic worker:*Server.StartRetentionTicker spawns the crosstenant ticker that runs R2 + R3 every RETENTION_INTERVAL (default 15 min). TenantSource derives from InMemoryRunRegistry.Tenants() — kdbbacked registries will plug in via the same tenantLister interface.
  • *ultitenant isolation (R4):*every RetentionStore call passes koder_user_id; crosstenant deletes mask to silent nil per multi-tenant-by-default.kmd § 5.
  • *rasure cascade (R5, AICORE138):*gateway/internal/erasure/Subscriber consumes identity.user.erased envelopes from the koder:events:id Redis Stream (crossserviceevents.kmd canonical type + stream key). On every erasure event, agent.EnforceErasure walks every checkpoint owned by the subject koder_user_id and drives deleteOne with ReasonErasure. Idempotent — redelivery on an emptied tenant is a noop. Wired in main.go when REDIS_URL is set; empty REDIS_URL keeps R1R2R3 local enforcement without R5. Consumer group: koder-ai-gateway-erasure (pattern <service>-erasure-cascade per crossserviceevents.kmd § R3).
  • *udit log (R6):*every deletion emits a structured slog Info entry event=checkpoint.deleted run_id=… checkpoint_id=… koder_user_id=… size_bytes=… reason=… where reason ∈ {per_run_cap, per_tenant_cap, grace_expired, erasure}. Shared deleteOne helper guarantees the same shape regardless of which enforcer fired.
  • *lob store:*agent.FsBlobStore rooted at CHECKPOINT_DIR (default /var/lib/koder-ai-gateway/checkpoints); contentaddressed (<tenant>/<scheme>/<sha256>); crosstenant Get/Delete mask to ErrBlobNotFound.
  • *estore path:*Server.SetRewindRestorers plugs a RestorerRouter; default boot wires NopRewindRestorer + TarZstdRewindRestorer. OverlayfsRewindRestorer (AICORE134.2b) blocked on AICORE106 sandbox-model RFC.
  • *ickets:*AICORE134.2c (BlobStore + RewindRestorer + R1R2R3R4R6 + periodic worker + main.go wiring) closed 20260519; AICORE138 (R5 erasure cascade subscriber) closed 20260523. AICORE-134.2b (overlayfs producer + restorer) still pending.

Source: ../home/koder/dev/koder/meta/docs/stack/modules/ai-gateway.md