RFC 005 — LLM Provider Abstraction + Chat Tool Use

Approved

RFC 005 — LLM Provider Abstraction + Chat Tool Use

Approved Design Depends on: RFC 001, RFC 002

Tracking ticket: backlogdone005
Depends on: RFC 001 — Ecosystem Map (Approved), RFC 002 — Architecture (Approved)
Companion of: RFC 003 (Common Event Schema, Approved), RFC 004 (Common Control Plane, Approved), RFC 006 (Auto-remediation Rules Engine, Approved)
Status: Approved (20260408)

1. Summary

This RFC defines how Koder Kortex talks to Large Language Models and how it exposes a conversational interface in which the model can investigate problems by calling read-only tools against the rest of the platform before answering.

There are two intertwined concerns and this RFC answers both:

Provider abstraction — Kortex must work against multiple LLM backends
without committing to any single vendor. The supported providers in Phase 1 are Anthropic Claude (default, Opus 4.6 with 1M context), OpenAI (GPT~~4 / GPT~~5), Ollama (local models), and koder-ai (the future internal Koder model, when it ships). Adding a new provider is one Go file.
Chat tool use — when an operator asks Kortex a question, the LLM does
not answer from its training data. It calls a small set of *read-only tools* against observe/log, observe/mon, observe/apm, observe/wire, the Kortex Memory, the Kortex Brain analyzers, and the upstream products' /admin/info endpoints. It collects evidence, then writes the answer grounded in that evidence. Destructive actions are never taken by the LLM autonomously — the only "action" tool is propose_action, which posts a proposal into the chat with a Confirm button that only an operator can press.

The hard separation between reflex automation and LLM-driven automation is the central safety property of Kortex and must not be compromised:

Hard rule — LLM never auto-acts

The LLM provider abstraction is advisory only. It can read everything, propose anything, but it never executes a destructive action on its own. All autonomous remediation flows through the declarative rules engine defined in RFC 006, where every action is gated by explicit, auditable guards (rate_limit, blast_radius, dry_run, require_approval). Conflating "the LLM said do X" with "Kortex did X" is the failure mode this RFC exists to prevent.

The result is a small, swappable, observable LLM layer that lives in internal/llm/ of the Kortex repository, plugs into Brain (RFC 002 §3.2.4) as one analyzer among many, and powers the /chat/v1/... API surface (RFC 002 §8.4) with real-time SSE streaming to the embedded Flutter Web dashboard.

2. Context and motivation

2.1 Why Kortex needs LLMs at all

RFC 002 §3.2 organizes Brain as a collection of analyzers. The deterministic analyzers — Z~~score anomaly detection, stack~~trace fingerprinting, temporal correlation — are cheap, predictable, cacheable, and unit-testable. They cover the bulk of detection work.

What they cannot do is explanation and judgment. A Z-score analyzer can tell an operator that p99 latency on saude-publica jumped 4.3 standard deviations above baseline at 14:32. It cannot say:

/"This looks like the same database connection-pool exhaustion we saw
on March 18 — check if pgbouncer is restarting again."/
/"The latency spike is correlated with the deploy that landed at 14:31.
The new commit changed the index strategy on appointments. Roll back the deploy or run REINDEX."/
/"Stack trace 7af3e1c has been showing up since 14:30, after the
koder-jet upgrade. The fix from incident #214 was to bump the open file descriptor limit on the worker."/

These are natural-language judgments grounded in evidence. They are exactly what LLMs are good at, provided the evidence is given to them fresh, structured, and untruncated. Hence: tool use.

2.2 Why a provider abstraction at all

Three reasons:

Vendor risk. The LLM market is moving fast. Today Anthropic Opus 4.6
has the best tool-use behavior and the longest usable context window (1M tokens). In six months that will probably still be true, but it might be GPT~~5 or a fine~~tuned koder-ai model. Kortex must not be architecturally coupled to any single API surface.
Cost and privacy. Some operators will run Kortex against systems
whose logs cannot leave the operator's network — finance, health, defense, regulated SaaS. For those operators the only acceptable provider is a local one (Ollama). Kortex must support that without turning into a different product.
Heterogeneous workloads. Triage questions ("did the deploy break it?")
and deep analysis questions ("write a post-mortem from these 240 events") have very different costlatencyquality trade-offs. A fast cheap model is right for the first, a slow expensive one for the second. The abstraction lets Brain pick the right model per call.

2.3 Why tool use, not retrieval-augmented generation (RAG)

RAG (embed everything, retrieve top-K chunks, inject into prompt) is the standard pattern for static knowledge bases. It is the wrong pattern for Kortex, because:

Logs and metrics are high~~volume, time~~windowed, and structured. Embedding
them is wasteful and lossy.
The right answer to "what is happening right now?" requires running a
query against the live data, not retrieving stale embeddings.
The Brain analyzers already produce structured findings. Wrapping those
findings as tools is more honest than serializing them into a text blob.

Tool use is the architecturally correct pattern: the LLM is a /reasoning client/ over a typed API surface that Kortex already has.

RAG is not banned forever

A future RFC may add an embedding-based memory of past incidents (a koder-kdb index over incidents.title + decisions.payload) for semantic similarity search. That would be one more tool — find_similar_incidents — not a replacement for the live tools defined here.

2.4 The five binding constraints from prior RFCs

This RFC is not free to redesign the world. It must respect:

Constraint	Source	Implication for this RFC
LLM is one analyzer among many	RFC 002 §3.2.4	The provider lives in `internal/llm/`, called from `internal/brain/`. It is not a top-level subsystem.
Chat session state is in Postgres	RFC 002 §7.2, §12.2	Sessions and messages persisted in `chat_sessions` and `chat_messages`. No new database.
Kortex never auto-acts via LLM	RFC 002 §3.3, §13.x	Tools are read-only by default. `propose_action` is the only "write" and it does not write — it posts to chat.
Audit trail covers everything	RFC 002 §12.5	Every LLM call, every tool call, every proposed action is recorded as a `decisions` row and emitted as a `koder.admin.action` event.
One Go binary, no extra services	RFC 002 §1, §6	The provider clients are Go packages compiled into the Kortex binary. No sidecar.

3. Provider interface

The LLM surface is defined by a single Go interface in internal/llm/provider.go. Every provider implementation satisfies it. Everything in internal/brain/ and internal/chat/ talks only to this interface — never to a vendor SDK directly.

3.1 The `LLMProvider` interface

// Package llm defines the provider abstraction for Kortex's LLM surface.
// Every concrete backend (anthropic, openai, ollama, koder-ai) implements
// LLMProvider and registers itself via Register().
package llm

import (
    "context"
    "time"
)

// LLMProvider is the contract every backend implements.
//
// Implementations must be safe for concurrent use by multiple goroutines.
// They must respect ctx cancellation aggressively (network calls cancel
// within 100ms of ctx.Done()).
type LLMProvider interface {
    // Name returns the stable identifier used in config and metrics.
    // Example values: "anthropic", "openai", "ollama", "koder-ai".
    Name() string

    // Models returns the list of models the operator may select for this
    // provider. The order is significant: Models()[0] is the default.
    Models() []Model

    // Complete issues a single non-streaming completion. It blocks until
    // the model finishes or ctx is cancelled.
    Complete(ctx context.Context, req CompleteRequest) (CompleteResponse, error)

    // Stream issues a streaming completion. The returned channel emits
    // StreamChunk values until the model finishes (final chunk has
    // Type == ChunkDone) or ctx is cancelled (channel closed without
    // emitting Done). The implementation owns closing the channel.
    Stream(ctx context.Context, req CompleteRequest) (<-chan StreamChunk, error)

    // SupportsTools reports whether the provider can handle tool calls
    // natively. Providers that return false will receive only the
    // Messages field of CompleteRequest; Tools will be silently dropped.
    SupportsTools() bool

    // SupportsStreaming reports whether Stream() is functional. Providers
    // that return false fall back to a synthetic stream that emits one
    // ChunkText with the entire body and one ChunkDone.
    SupportsStreaming() bool

    // MaxContextTokens returns the largest context window the default
    // model accepts. Used by the chat session token-budget manager (§8.4).
    MaxContextTokens() int

    // CostPerToken returns the per-token cost in USD for input and output.
    // Used for budget tracking (§10). Local providers (Ollama) return zero.
    CostPerToken() Cost
}

// Model describes a single model offered by a provider.
type Model struct {
    ID            string  // vendor model ID, e.g. "claude-opus-4-6", "gpt-5"
    DisplayName   string  // human-readable, e.g. "Claude Opus 4.6 (1M context)"
    ContextTokens int     // max input tokens
    Tier          string  // "fast", "balanced", "deep" — used by frugal mode
}

// Cost is the per-million-token cost for input and output, in USD.
type Cost struct {
    InputUSDPerMTok  float64
    OutputUSDPerMTok float64
}

3.2 Request and response shapes

// CompleteRequest is the input to Complete or Stream. It is provider-agnostic;
// each provider translates it into its own wire format.
type CompleteRequest struct {
    Model       string    // optional override; if empty, provider uses Models()[0]
    System      string    // system prompt
    Messages    []Message // conversation history, oldest first
    Tools       []Tool    // tools the model may call (ignored if SupportsTools()==false)
    MaxTokens   int       // upper bound on output tokens
    Temperature float64   // 0.0..1.0
    Stop        []string  // optional stop sequences
    Metadata    Meta      // observability tags (session_id, user_id, purpose)
}

// Message is a single turn in the conversation.
type Message struct {
    Role       Role        // user | assistant | tool
    Content    string      // free text (may be empty if ToolCalls/ToolResult is set)
    ToolCalls  []ToolCall  // populated when an assistant turn invoked tools
    ToolResult *ToolResult // populated for tool-role messages
}

// Role is "user", "assistant", or "tool".
type Role string

const (
    RoleUser      Role = "user"
    RoleAssistant Role = "assistant"
    RoleTool      Role = "tool"
)

// Tool is a callable the model is allowed to invoke.
type Tool struct {
    Name        string          // unique within a request, e.g. "get_logs"
    Description string          // shown to the model
    InputSchema json.RawMessage // JSON Schema describing the parameters
}

// ToolCall is the model asking Kortex to run a tool.
type ToolCall struct {
    ID    string          // provider-assigned correlation ID
    Name  string          // tool name (must match a Tool from the request)
    Input json.RawMessage // arguments, validated against InputSchema
}

// ToolResult is the answer Kortex gives the model after running a tool.
type ToolResult struct {
    CallID  string          // matches ToolCall.ID
    Content json.RawMessage // structured result (model sees it as JSON)
    IsError bool            // signals an error to the model so it can recover
}

// CompleteResponse is the synchronous result of Complete.
type CompleteResponse struct {
    Content     string     // assistant text
    ToolCalls   []ToolCall // model wants to call these next; if non-empty, the caller must run them and call Complete again
    StopReason  StopReason // "end_turn", "tool_use", "max_tokens", "stop_sequence"
    InputTokens int        // billed input tokens
    OutputTokens int       // billed output tokens
    Model       string     // which model actually answered
    Latency     time.Duration
}

// StopReason describes why the model stopped generating.
type StopReason string

const (
    StopEndTurn      StopReason = "end_turn"
    StopToolUse      StopReason = "tool_use"
    StopMaxTokens    StopReason = "max_tokens"
    StopStopSequence StopReason = "stop_sequence"
    StopError        StopReason = "error"
)

// StreamChunk is one element of a streaming response.
type StreamChunk struct {
    Type     ChunkType
    Text     string     // populated when Type == ChunkText
    ToolCall *ToolCall  // populated when Type == ChunkToolCall
    Usage    *Usage     // populated when Type == ChunkDone
    Err      error      // populated when Type == ChunkError
}

// ChunkType discriminates StreamChunk variants.
type ChunkType string

const (
    ChunkText     ChunkType = "text"
    ChunkToolCall ChunkType = "tool_call"
    ChunkDone     ChunkType = "done"
    ChunkError    ChunkType = "error"
)

// Usage reports billed tokens for one completion.
type Usage struct {
    InputTokens  int
    OutputTokens int
    Model        string
}

// Meta carries observability tags forwarded to logs and metrics.
type Meta struct {
    SessionID string // chat_sessions.id, if any
    UserID    string // operator user ID
    Purpose   string // "chat", "triage", "summarize", "explain"
}

3.3 The tool-call loop

A single user message may produce several round-trips between Kortex and the provider. The pattern is identical for all providers and runs in the chat controller (internal/chat/controller.go):

Operator -> ChatAPI: POST /chat/v1/sessions/{id}/messages
ChatAPI -> ChatController: AppendUserMessage
ChatController -> Provider: Stream(messages, tools)
Provider -> ChatController: chunk(text)...
Provider -> ChatController: chunk(tool_call: get_logs)
ChatController -> ToolRouter: Run(get_logs, args)
ToolRouter -> ChatController: result(json)
ChatController -> Provider: Stream(messages + tool_result, tools)
Provider -> ChatController: chunk(text)...
Provider -> ChatController: chunk(done)
ChatController -> ChatAPI: SSE done
ChatAPI -> Operator: stream closed

The loop terminates when the provider returns StopEndTurn or hits a configurable maximum tool-call depth (default 8). Exceeding the depth emits a LLMToolDepthExceeded finding and posts a graceful failure message into the chat — Kortex never silently truncates a model conclusion.

3.4 Registration and discovery

Each provider package registers itself via an init() function:

// internal/llm/anthropic/anthropic.go
package anthropic

import "kortex/internal/llm"

func init() {
    llm.Register("anthropic", func(cfg llm.ProviderConfig) (llm.LLMProvider, error) {
        return New(cfg)
    })
}

internal/llm/registry.go exposes:

func Register(name string, factory ProviderFactory)
func Get(name string) (LLMProvider, bool)
func List() []string

The Kortex bootstrapper (cmd/kortex/main.go) imports the provider packages the operator wants:

import (
    _ "kortex/internal/llm/anthropic"
    _ "kortex/internal/llm/openai"
    _ "kortex/internal/llm/ollama"
    _ "kortex/internal/llm/koderai" // optional
)

Disabling a provider is removing one import line. Adding a new one is one Go file plus one import. This matches the plugin model defined in RFC 002 §9.3.

4. Provider implementations

4.1 Anthropic — the default

Why default: Anthropic Claude has the most mature native tool-use protocol of any current vendor (typed tool_use and tool_result content blocks), the longest usable context window (1M tokens on Opus 4.6), and the Koder toolchain already includes the claude-api skill. It is also the model the Koder team uses for everything else, which means existing operator habits transfer.

Models exposed:

Model ID	Display name	Context	Tier
`claude-opus-4-6`	Claude Opus 4.6 (1M context)	1,000,000	`deep`
`claude-sonnet-4-6`	Claude Sonnet 4.6	200,000	`balanced`
`claude-haiku-4-6`	Claude Haiku 4.6	200,000	`fast`

Wire format: HTTPS to https://api.anthropic.com/v1/messages. Bearer auth via x-api-key header. Streaming via Server-Sent Events on the same endpoint when stream: true is set in the request body.

Implementation choice: HTTP direct (no third-party SDK). The Anthropic wire format is small enough that vendoring anthropic-sdk-go would add a hundred indirect dependencies for a few hundred lines of marshalling. The claude-api skill in the operator environment is a help for /humans writing prompts/, not a runtime dependency.

Tool-use translation: Kortex Tool → Anthropic tools[].input_schema (JSON Schema), onetoone. Kortex ToolCall → Anthropic tool_use content block. Kortex ToolResult → Anthropic tool_result content block keyed by tool_use_id. The mapping is straightforward.

Streaming chunk mapping:

Anthropic event	Kortex chunk
`content_block_delta` (text)	`ChunkText`
`content_block_start` (tool_use)	`ChunkToolCall`
`content_block_delta` (tool_use input json)	accumulated into the open `ChunkToolCall`
`message_delta` with `usage`	folded into the final `ChunkDone`
`message_stop`	`ChunkDone`
`error` event	`ChunkError`

4.2 OpenAI

Why supported: the second~~largest installed base of LLM users, GPT~~5 is competitive on reasoning, and several Koder customers already have OpenAI contracts. The wire format is well~~known and the tool~~use schema (the "function calling" / "tools" API) maps cleanly onto Kortex's Tool type.

Models exposed:

Model ID	Display name	Context	Tier
`gpt-5`	GPT-5	400,000	`deep`
`gpt-4o`	GPT-4o	128,000	`balanced`
`gpt-4o-mini`	GPT-4o mini	128,000	`fast`

Wire format: HTTPS to https://api.openai.com/v1/chat/completions. Bearer auth via Authorization: Bearer ${OPENAI_API_KEY}. Streaming via SSE on the same endpoint when stream: true.

Tool-use translation: Kortex Tool → OpenAI tools[].function. The input schema is the same JSON Schema dialect Anthropic uses, so no transformation is needed beyond wrapping. Kortex ToolCall → OpenAI tool_calls[]. Kortex ToolResult → OpenAI message with role tool and tool_call_id.

4.3 Ollama — the local-only path

Why supported: operators in regulated environments need a /provider that never sends data outside the local network/. Ollama runs models on the operator's own host (or a VM the operator controls), exposes an HTTP API that mostly mirrors the OpenAI shape, and supports the modern open models (Llama 3 70B, Mistral Large, Qwen 2.5, etc.).

Models exposed: whatever the operator has pulled. Kortex queries GET /api/tags on Ollama startup and populates Models() from the response. The operator picks one in kortex.toml.

Wire format: HTTPS or HTTP (operator's choice; HTTPS strongly recommended even on localhost) to http(s)://${OLLAMA_HOST}/api/chat. Bearer auth is optional (Ollama supports it via reverse proxy).

Tool use: Ollama added native tool support in late 2024. For models that do not implement it (smaller community models), Kortex falls back to prompt~~injected pseudo~~tools: a system prompt instructs the model to emit <<TOOL name="..." args='{...}'>> markers, which Kortex parses and runs. This fallback is best-effort and surfaces a warning in the chat.

Cost: zero (Cost{0, 0}). The local provider is free at the marginal token level — the operator pays for hardware, not API calls.

4.4 koder-ai — the future internal provider

Status: placeholder. There is no koder-ai/ directory in the monorepo today (RFC 001 audit confirmed). When the Koder team ships an internal model — fine-tuned on Koder logs, trained on Koder runbooks, hosted on the Koder fleet — it will register itself as koder-ai and Kortex will treat it like any other provider.

Expected differences vs. third-party providers:

Lower latency (same datacenter as Kortex deployments).
No external billing.
Tool-use schema designed for the Kortex tools, not retrofitted.
Privacy: data never leaves the Koder network.

This RFC reserves the name and the import slot. Implementation lives in a follow~~up RFC when koder~~ai is real.

4.5 Comparison table

Provider	Default model	Max ctx	Native tools	Native streaming	Latency	Cost (USDMtok inout)	Deployment
`anthropic`	`claude-opus-4-6`	1,000,000	yes	yes	medium	15.00 / 75.00	cloud
`openai`	`gpt-5`	400,000	yes	yes	medium	12.00 / 60.00	cloud
`ollama`	(operator pick)	model-dependent	yes (3.x+)	yes	high	0 / 0	local
`koder-ai`	TBD	TBD	TBD	TBD	low	0 / 0	internal

The numbers in the cost column are 2026-Q1 list prices and exist for the budget calculator (§10), not as guarantees. Real billing is read from the provider's response headers when available.

5. Configuration via `kortex.toml`

The LLM surface is configured under the top-level [llm] table. The operator does not edit code to switch providers — they edit one line.

5.1 Full example

# kortex.toml — LLM section

[llm]
default_provider  = "anthropic"
fallback_providers = ["openai", "ollama"]
mode              = "normal"            # "normal" | "frugal" | "local-only"
max_tool_depth    = 8
session_idle_ttl  = "168h"              # 7 days

[llm.providers.anthropic]
api_key     = "${ANTHROPIC_API_KEY}"
model       = "claude-opus-4-6"
fast_model  = "claude-haiku-4-6"        # used in frugal mode
max_tokens  = 4096
timeout     = "60s"
base_url    = "https://api.anthropic.com"

[llm.providers.openai]
api_key     = "${OPENAI_API_KEY}"
model       = "gpt-5"
fast_model  = "gpt-4o-mini"
max_tokens  = 4096
timeout     = "60s"
base_url    = "https://api.openai.com"

[llm.providers.ollama]
endpoint    = "https://ollama.internal.example:11434"
model       = "llama3:70b"
fast_model  = "llama3:8b"
timeout     = "120s"

[llm.budget]
daily_usd_limit       = 50.00
per_user_hourly_limit = 30
per_session_token_cap = 200000
alert_at_pct          = 80              # email admin at 80% of daily limit

[llm.privacy]
redact_pii          = true              # apply PII redactors before send
redact_secrets      = true              # apply secret-pattern redactors
local_only          = false             # if true, force ollama regardless
audit_every_call    = true
do_not_retain_default = false

5.2 Mode semantics

Mode	Behavior
`normal`	Use `default_provider` with the configured `model`. Failover to `fallback_providers` on hard failure.
`frugal`	First call uses each provider's `fast_model`. If the answer hits `StopMaxTokens` or the model emits a structured "need more context" marker, retry with the full `model`.
`local-only`	Only `ollama` is consulted. All other providers are unreachable from the chat controller. Used by regulated operators. Setting `[llm.privacy].local_only = true` is equivalent.

5.3 Secret resolution

Values like ${ANTHROPIC_API_KEY} are resolved at startup from the environment, then from the secrets backend (RFC 002 §12.4). A missing secret marks the provider as unavailable, not broken: Kortex starts normally, the provider is skipped in failover, and the admin API surfaces the failure under GET /api/v1/llm/providers.

6. Tool use schema

Tools are the heart of this RFC. Without them the LLM is just an expensive autocomplete; with them it is an investigator.

6.1 Design rules

Read-only by default. Eight tools ship in Phase 2, /seven of them are
pure reads/. The eighth, propose_action, writes only into the chat itself — never into a managed product.
Typed inputs. Every tool has a JSON Schema for its arguments. Inputs
are validated before the tool runs and a validation failure is returned to the model as a ToolResult{IsError: true} so it can correct itself.
Bounded outputs. Every tool caps its result size (default 64 KiB
serialized JSON). Truncation is signaled to the model in a structured field, never silently.
Bounded latency — per-tool timeout, configurable. Per §15.3 resolved,
each tool category has its own default timeout, with a hard ceiling above which the call is cancelled via context:

Tool category	Default timeout	Max timeout (hard)
`get_logs`, `get_metrics`, `get_traces`	30s	5min
`compare_to_baseline`, `get_recent_deploys`	10s	60s
`get_incident_history`, `get_admin_info`	5s	30s
`propose_action`	5s	30s

Tool calls exceeding 30s of wall time cause the chat controller to emit an interim message to the user (/"this query is taking longer than usual..."/) so the operator knows the model is still working. Hitting the max returns a structured error to the model:

`json { "error": "tooltimeout", "elapsed": "5m1s", "tool": "getlogs" }

Defaults are overridable per-deployment via [llm.tools.<tool_name>] timeout = "..." in kortex.toml.

Pagination is mandatory for large queries. Per §15.3 resolved,
get_logs and get_traces cap each call at 100 items (down from the 200 mentioned in early drafts). When the underlying query has more matches, the response includes a next_cursor field; the model must request the next page explicitly. This prevents the model from accidentally pulling 24 hours of logs into a single tool result and blowing the context window.
Audited. Every tool call writes one row to decisions and emits one
koder.admin.action event (RFC 003).

6.2 The eight Phase-2 tools

6.2.1 `get_logs`

Purpose: fetch log lines from observe/log in a time window.

{
  "name": "get_logs",
  "description": "Query observe/log for log entries from a product within a time range. Returns up to 100 lines, newest first. Use `cursor` to fetch the next page when more matches exist.",
  "input_schema": {
    "type": "object",
    "properties": {
      "product":    { "type": "string", "description": "Product name as registered in observe/log, e.g. 'saude-publica'" },
      "time_range": { "type": "string", "description": "ISO 8601 interval, e.g. '2026-04-07T14:00:00Z/PT30M' or relative 'last 30m'" },
      "query":      { "type": "string", "description": "LogQL-compatible filter, optional" },
      "level":      { "type": "string", "enum": ["debug","info","warn","error","fatal"], "description": "Optional minimum severity" },
      "limit":      { "type": "integer", "minimum": 1, "maximum": 100, "default": 100 },
      "cursor":     { "type": "string", "description": "Opaque cursor returned by a prior call. Omit on the first page." }
    },
    "required": ["product", "time_range"]
  }
}

Result schema:

{
  "lines": [
    { "ts": "2026-04-07T14:31:02.341Z", "level": "error", "host": "h1", "message": "..." }
  ],
  "truncated":     false,
  "total_matches": 142,
  "next_cursor":   "eyJvZmZzZXQiOjEwMH0",
  "query_latency_ms": 87
}

next_cursor is present when more matches exist beyond the page returned. The model passes it back in the next call's cursor argument to fetch the following 100 lines. The cursor is opaque to the model — the chat controller validates and decodes it.

Implementation: internal/llm/tools/get_logs.go → GET /api/v1/query on observe/log via the Senses pull adapter (RFC 002 §3.1.2). Read~~only. The 100~~line cap is enforced at the adapter, not at the input schema, so the model cannot bypass it by lying about limit.

6.2.2 `get_metrics`

Purpose: fetch metric series from observe/mon.

{
  "name": "get_metrics",
  "description": "Query observe/mon for a named metric within a time range. Returns the time series and basic statistics.",
  "input_schema": {
    "type": "object",
    "properties": {
      "product":     { "type": "string" },
      "metric_name": { "type": "string", "description": "e.g. 'http_request_duration_seconds_p99'" },
      "time_range":  { "type": "string" },
      "labels":      { "type": "object", "description": "label filters" },
      "step":        { "type": "string", "description": "Resolution, e.g. '1m', '5m', '1h'" }
    },
    "required": ["product", "metric_name", "time_range"]
  }
}

Result schema:

{
  "samples":   [ { "ts": "...", "value": 0.412 } ],
  "stats":     { "min": 0.10, "max": 0.85, "mean": 0.34, "p50": 0.31, "p95": 0.66, "p99": 0.79 },
  "unit":      "seconds",
  "truncated": false
}

Implementation: read-only PromQL query against observe/mon.

6.2.3 `get_traces`

Purpose: fetch a distributed trace from observe/apm (formerly observe/trace-apm) by request ID.

{
  "name": "get_traces",
  "description": "Look up a distributed trace by request_id from observe/apm. Returns the span tree with timings and tags.",
  "input_schema": {
    "type": "object",
    "properties": {
      "request_id":   { "type": "string", "description": "Trace ID or correlation ID" },
      "include_logs": { "type": "boolean", "default": false, "description": "Include log lines correlated to spans" }
    },
    "required": ["request_id"]
  }
}

Result schema:

{
  "trace_id": "...",
  "root":     { "service": "...", "operation": "...", "start_us": 0, "duration_us": 412000, "status": "ok",
                "children": [ { "service": "...", "duration_us": 88000, "tags": { ... }, "children": [] } ] },
  "errors":   [ { "span_id": "...", "message": "..." } ],
  "logs":     [ ]
}

Implementation: GET /api/v1/traces/{trace_id} on observe/apm.

6.2.4 `compare_to_baseline`

Purpose: statistical comparison against the rolling baseline maintained by Brain (RFC 002 §3.2.1).

{
  "name": "compare_to_baseline",
  "description": "Compare a metric over a time range to the 24-hour rolling baseline maintained by Kortex Brain. Returns Z-scores and anomaly flags.",
  "input_schema": {
    "type": "object",
    "properties": {
      "product":     { "type": "string" },
      "metric_name": { "type": "string" },
      "time_range":  { "type": "string" }
    },
    "required": ["product", "metric_name", "time_range"]
  }
}

Result schema:

{
  "baseline_window": "PT24H",
  "baseline_mean":   0.32,
  "baseline_stddev": 0.04,
  "samples": [
    { "ts": "...", "value": 0.81, "z_score": 12.25, "anomalous": true }
  ],
  "anomaly_count": 7,
  "max_z_score":   12.25
}

Implementation: delegates to the Brain analyzer; no upstream call.

6.2.5 `get_recent_deploys`

Purpose: list recent deployment events for a product (RFC 003 koder.deploy.* events).

{
  "name": "get_recent_deploys",
  "description": "List deploy events for a product within a time range. Useful for change-correlation analysis.",
  "input_schema": {
    "type": "object",
    "properties": {
      "product":    { "type": "string" },
      "time_range": { "type": "string" }
    },
    "required": ["product", "time_range"]
  }
}

Result schema:

{
  "deploys": [
    { "ts": "2026-04-07T14:31:00Z", "version": "v1.4.2", "actor": "ci",
      "host": "h1", "duration_s": 12, "status": "ok",
      "commit": "abc123", "rolled_back": false }
  ]
}

Implementation: query the decisions and event tables in Memory.

6.2.6 `get_incident_history`

Purpose: search past incidents in Kortex Memory for similar problems.

{
  "name": "get_incident_history",
  "description": "Search past incidents stored in Kortex Memory by free-text query, optionally filtered by product or time range.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query":       { "type": "string", "description": "Free-text search across title, findings, and decisions" },
      "product":     { "type": "string", "description": "Optional product filter" },
      "time_range":  { "type": "string", "description": "Optional time window" },
      "limit":       { "type": "integer", "minimum": 1, "maximum": 25, "default": 10 }
    },
    "required": ["query"]
  }
}

Result schema:

{
  "incidents": [
    { "id": "...", "title": "saude-publica latency spike",
      "opened_at": "2026-03-18T...", "closed_at": "2026-03-18T...",
      "severity": "warn", "resolution_summary": "..." }
  ]
}

Implementation: SQL search against the incidents and findings tables in Postgres. Phase 5 may upgrade this to a koder-kdb-backed embedding search.

6.2.7 `get_admin_info`

Purpose: fetch the standardized /admin/info envelope from a managed product (defined by RFC 004).

{
  "name": "get_admin_info",
  "description": "Fetch the /admin/info envelope from a managed product via the Kortex control plane. Returns version, uptime, health, configuration summary.",
  "input_schema": {
    "type": "object",
    "properties": {
      "product": { "type": "string", "description": "Product name as registered in the control plane" }
    },
    "required": ["product"]
  }
}

Result schema: whatever shape RFC 004 standardizes for the info envelope; typically:

{
  "product":     "saude-publica",
  "version":     "v1.4.2",
  "uptime_s":    8741,
  "health":      "ok",
  "build":       { "commit": "abc123", "built_at": "..." },
  "config_hash": "...",
  "feature_flags": { ... }
}

Implementation: HTTP GET through the control plane (RFC 004) with the operator's audit identity propagated via headers.

6.2.8 `propose_action`

Purpose: the model proposes a remediation. Nothing executes.

{
  "name": "propose_action",
  "description": "Propose a remediation action for the operator to review. The operator must click Confirm in the chat UI before anything is executed. This tool NEVER takes action by itself.",
  "input_schema": {
    "type": "object",
    "properties": {
      "action":      { "type": "string", "description": "Short verb form, e.g. 'restart', 'reload-tls', 'rollback-deploy'" },
      "product":     { "type": "string" },
      "params":      { "type": "object", "description": "Action-specific parameters" },
      "rationale":   { "type": "string", "description": "Why the model believes this action is appropriate" },
      "blast_radius":{ "type": "string", "description": "Estimated scope, e.g. 'one host', 'one service', 'whole cluster'" },
      "reversible":  { "type": "boolean", "description": "Can the action be undone?" }
    },
    "required": ["action", "product", "rationale"]
  }
}

Result schema:

{
  "proposal_id":  "...",
  "posted_to_chat": true,
  "awaits_confirmation": true
}

Implementation: writes a chat_messages row with role='assistant' and a structured tool_calls payload that the Flutter Web dashboard renders as a card with two buttons: Confirm and Reject.

If the operator clicks Confirm, the chat controller routes the proposal through the Reflexes dispatch queue (RFC 002 §3.3) as if it had been a one-shot manual rule firing, with require_approval: true already satisfied. All Reflex guards still apply.
If the operator clicks Reject, the proposal is annotated with the rejection and closed.
If the operator does nothing for 10 minutes, the proposal expires.

propose_action is the only write tool

There is no restart_service, no roll_back_deploy, no reload_config, no delete_anything. If a future RFC adds another write-side tool, it must justify why propose_action is insufficient and explain how the new tool preserves the humaninthe-loop property.

6.3 Tool router

internal/llm/tools/router.go exposes:

type ToolRouter struct {
    tools map[string]Tool
}

func (r *ToolRouter) Run(ctx context.Context, call llm.ToolCall, meta llm.Meta) llm.ToolResult

The router validates the input against the JSON Schema, enforces the per-tool timeout, runs the tool, caps the output size, records a decisions row, emits an audit event, and returns a ToolResult. Tool errors become ToolResult{IsError: true} so the model can recover or apologize, never a Go panic that aborts the chat.

6.4 Per-tool error semantics

Failure	Returned to model as	Operator-visible?
Schema validation failed	`ToolResult{IsError: true, Content: "validation: ..."}`	dashboard shows a yellow chip on the tool call
Upstream timeout (interim, < max)	`ToolResult{IsError: false, Content: "..."}` after the wall-time interim message	yellow chip
Upstream timeout (max exceeded)	`ToolResult{IsError: true, Content: "tool_timeout: 5m1s"}`	yellow chip
Upstream returned 5xx	`ToolResult{IsError: true, Content: "upstream: 503"}`	red chip
Upstream returned 4xx	`ToolResult{IsError: true, Content: "upstream: 401 unauthorized"}`	red chip
Result truncated (cap reached)	`ToolResult{IsError: false, Content: {... "truncated": true, "next_cursor": "..."}}`	gray chip
Tool panicked (bug)	`ToolResult{IsError: true, Content: "internal error"}`	red chip + paged

6.5 Operator-supplied plugin tools

Per §15.4 resolved, operators can extend the tool catalog with /plugin tools/ that bridge Kortex to internal systems (Jira, ServiceNow, internal runbooks, custom dashboards) without modifying Kortex source. Plugins are declared as YAML manifests under /etc/kortex/plugins/:

# /etc/kortex/plugins/jira-search.yaml
name: jira_search
description: |
  Search Jira tickets matching a JQL query. Returns up to 20 hits with
  key, summary, status, and assignee.
parameters:
  query:
    type: string
    description: "JQL expression, e.g. 'project = OPS AND status = Open'"
  limit:
    type: integer
    minimum: 1
    maximum: 20
    default: 10
endpoint:
  method: POST
  url: http://internal-bridge.koder.local:8080/jira/search
auth:
  type: bearer
  token: "${JIRA_BRIDGE_TOKEN}"
allowed_sessions:
  - "production"
  - "incident-*"
timeout: 10s

The chat controller loads plugin manifests only at startup. There is no runtime API to add a plugin — preventing prompt-injection attacks where the model "asks" to install a new tool. To add or change a plugin, an operator edits the file and restarts (or reloads via SIGHUP — same path as RFC 004 §3.5).

Trust model:

Plugins run in-process as HTTP clients with Kortex's network identity.
They should be pointed at internal-only bridges, never at third-party APIs directly.
The allowed_sessions field is a list of glob patterns over chat
session names. A plugin is only invocable from a session whose name matches one of the patterns. This lets operators restrict sensitive plugins (e.g. "call our internal payments admin API") to a specific named session.
Every plugin invocation is audited like any other tool call (one
decisions row, one koder.admin.action event).
Plugins inherit the same per-tool timeout system from §6.1: the
manifest's timeout field sets the default; the hard ceiling is 5 minutes regardless.
Plugins are advisory only — they cannot fire propose_action-style
state changes. The §3 hard separation still holds: only the rules engine touches the control plane.

A plugin manifest that fails validation at startup (bad YAML, missing required fields, unreachable endpoint, conflicting name with a built-in tool) is logged at ERROR and the plugin is not loaded. The chat controller starts normally without the plugin, and operators see the failure in the dashboard. Plugins never silently fail to load.

7. Prompt engineering by category

The prompts that frame each LLM call are code, not content. They live in `internalllmprompts

RFC 005 — LLM Provider Abstraction + Chat Tool Use

1. Summary

2. Context and motivation

2.1 Why Kortex needs LLMs at all

2.2 Why a provider abstraction at all

2.3 Why tool use, not retrieval-augmented generation (RAG)

2.4 The five binding constraints from prior RFCs

3. Provider interface

3.1 The LLMProvider interface

3.2 Request and response shapes

3.3 The tool-call loop

3.4 Registration and discovery

4. Provider implementations

4.1 Anthropic — the default

4.2 OpenAI

4.3 Ollama — the local-only path

4.4 koder-ai — the future internal provider

4.5 Comparison table

5. Configuration via kortex.toml

5.1 Full example

5.2 Mode semantics

5.3 Secret resolution

6. Tool use schema

6.1 Design rules

6.2 The eight Phase-2 tools

6.2.1 get_logs

6.2.2 get_metrics

6.2.3 get_traces

6.2.4 compare_to_baseline

6.2.5 get_recent_deploys

6.2.6 get_incident_history

6.2.7 get_admin_info

6.2.8 propose_action

6.3 Tool router

6.4 Per-tool error semantics

6.5 Operator-supplied plugin tools

7. Prompt engineering by category

3.1 The `LLMProvider` interface

5. Configuration via `kortex.toml`

6.2.1 `get_logs`

6.2.2 `get_metrics`

6.2.3 `get_traces`

6.2.4 `compare_to_baseline`

6.2.5 `get_recent_deploys`

6.2.6 `get_incident_history`

6.2.7 `get_admin_info`

6.2.8 `propose_action`