RFC 005 — LLM Provider Abstraction + Chat Tool Use
RFC 005 — LLM Provider Abstraction + Chat Tool Use
Approved Design Depends on: RFC 001, RFC 002
- Tracking ticket: backlogdone005
- Depends on: RFC 001 — Ecosystem Map (Approved), RFC 002 — Architecture (Approved)
- Companion of: RFC 003 (Common Event Schema, Approved), RFC 004 (Common Control Plane, Approved), RFC 006 (Auto-remediation Rules Engine, Approved)
- Status: Approved (2026
0408)
1. Summary
This RFC defines how Koder Kortex talks to Large Language Models and how it exposes a conversational interface in which the model can investigate problems by calling read-only tools against the rest of the platform before answering.
There are two intertwined concerns and this RFC answers both:
- Provider abstraction — Kortex must work against multiple LLM backends
without committing to any single vendor. The supported providers in Phase 1 are Anthropic Claude (default, Opus 4.6 with 1M context), OpenAI (GPT
4 / GPT5), Ollama (local models), and koder-ai (the future internal Koder model, when it ships). Adding a new provider is one Go file. - Chat tool use — when an operator asks Kortex a question, the LLM does
not answer from its training data. It calls a small set of *read-only tools* against
observe/log,observe/mon,observe/apm,observe/wire, the Kortex Memory, the Kortex Brain analyzers, and the upstream products'/admin/infoendpoints. It collects evidence, then writes the answer grounded in that evidence. Destructive actions are never taken by the LLM autonomously — the only "action" tool ispropose_action, which posts a proposal into the chat with a Confirm button that only an operator can press.
The hard separation between reflex automation and LLM-driven automation is the central safety property of Kortex and must not be compromised:
The LLM provider abstraction is advisory only. It can read everything, propose anything, but it never executes a destructive action on its own. All autonomous remediation flows through the declarative rules engine defined in RFC 006, where every action is gated by explicit, auditable guards (rate_limit, blast_radius, dry_run, require_approval). Conflating "the LLM said do X" with "Kortex did X" is the failure mode this RFC exists to prevent.
The result is a small, swappable, observable LLM layer that lives in internal/llm/ of the Kortex repository, plugs into Brain (RFC 002 §3.2.4) as one analyzer among many, and powers the /chat/v1/... API surface (RFC 002 §8.4) with real-time SSE streaming to the embedded Flutter Web dashboard.
2. Context and motivation
2.1 Why Kortex needs LLMs at all
RFC 002 §3.2 organizes Brain as a collection of analyzers. The deterministic analyzers — Zscore anomaly detection, stacktrace fingerprinting, temporal correlation — are cheap, predictable, cacheable, and unit-testable. They cover the bulk of detection work.
What they cannot do is explanation and judgment. A Z-score analyzer can tell an operator that p99 latency on saude-publica jumped 4.3 standard deviations above baseline at 14:32. It cannot say:
- /"This looks like the same database connection-pool exhaustion we saw
on March 18 — check if
pgbounceris restarting again."/ - /"The latency spike is correlated with the deploy that landed at 14:31.
The new commit changed the index strategy on
appointments. Roll back the deploy or runREINDEX."/ - /"Stack trace 7af3e1c has been showing up since 14:30, after the
koder-jetupgrade. The fix from incident #214 was to bump the open file descriptor limit on the worker."/
These are natural-language judgments grounded in evidence. They are exactly what LLMs are good at, provided the evidence is given to them fresh, structured, and untruncated. Hence: tool use.
2.2 Why a provider abstraction at all
Three reasons:
- Vendor risk. The LLM market is moving fast. Today Anthropic Opus 4.6
has the best tool-use behavior and the longest usable context window (1M tokens). In six months that will probably still be true, but it might be GPT
5 or a finetuned koder-ai model. Kortex must not be architecturally coupled to any single API surface. - Cost and privacy. Some operators will run Kortex against systems
whose logs cannot leave the operator's network — finance, health, defense, regulated SaaS. For those operators the only acceptable provider is a local one (Ollama). Kortex must support that without turning into a different product.
- Heterogeneous workloads. Triage questions ("did the deploy break it?")
and deep analysis questions ("write a post-mortem from these 240 events") have very different costlatencyquality trade-offs. A fast cheap model is right for the first, a slow expensive one for the second. The abstraction lets Brain pick the right model per call.
2.3 Why tool use, not retrieval-augmented generation (RAG)
RAG (embed everything, retrieve top-K chunks, inject into prompt) is the standard pattern for static knowledge bases. It is the wrong pattern for Kortex, because:
- Logs and metrics are high
volume, timewindowed, and structured. Embeddingthem is wasteful and lossy.
- The right answer to "what is happening right now?" requires running a
query against the live data, not retrieving stale embeddings.
- The Brain analyzers already produce structured findings. Wrapping those
findings as tools is more honest than serializing them into a text blob.
Tool use is the architecturally correct pattern: the LLM is a /reasoning client/ over a typed API surface that Kortex already has.
A future RFC may add an embedding-based memory of past incidents (a koder-kdb index over incidents.title + decisions.payload) for semantic similarity search. That would be one more tool — find_similar_incidents — not a replacement for the live tools defined here.
2.4 The five binding constraints from prior RFCs
This RFC is not free to redesign the world. It must respect:
| Constraint | Source | Implication for this RFC |
|---|---|---|
| LLM is one analyzer among many | RFC 002 §3.2.4 | The provider lives in internal/llm/, called from internal/brain/. It is not a top-level subsystem. |
| Chat session state is in Postgres | RFC 002 §7.2, §12.2 | Sessions and messages persisted in chat_sessions and chat_messages. No new database. |
| Kortex never auto-acts via LLM | RFC 002 §3.3, §13.x | Tools are read-only by default. propose_action is the only "write" and it does not write — it posts to chat. |
| Audit trail covers everything | RFC 002 §12.5 | Every LLM call, every tool call, every proposed action is recorded as a decisions row and emitted as a koder.admin.action event. |
| One Go binary, no extra services | RFC 002 §1, §6 | The provider clients are Go packages compiled into the Kortex binary. No sidecar. |
3. Provider interface
The LLM surface is defined by a single Go interface in internal/llm/provider.go. Every provider implementation satisfies it. Everything in internal/brain/ and internal/chat/ talks only to this interface — never to a vendor SDK directly.
3.1 The LLMProvider interface
// Package llm defines the provider abstraction for Kortex's LLM surface.
// Every concrete backend (anthropic, openai, ollama, koder-ai) implements
// LLMProvider and registers itself via Register().
package llm
import (
"context"
"time"
)
// LLMProvider is the contract every backend implements.
//
// Implementations must be safe for concurrent use by multiple goroutines.
// They must respect ctx cancellation aggressively (network calls cancel
// within 100ms of ctx.Done()).
type LLMProvider interface {
// Name returns the stable identifier used in config and metrics.
// Example values: "anthropic", "openai", "ollama", "koder-ai".
Name() string
// Models returns the list of models the operator may select for this
// provider. The order is significant: Models()[0] is the default.
Models() []Model
// Complete issues a single non-streaming completion. It blocks until
// the model finishes or ctx is cancelled.
Complete(ctx context.Context, req CompleteRequest) (CompleteResponse, error)
// Stream issues a streaming completion. The returned channel emits
// StreamChunk values until the model finishes (final chunk has
// Type == ChunkDone) or ctx is cancelled (channel closed without
// emitting Done). The implementation owns closing the channel.
Stream(ctx context.Context, req CompleteRequest) (<-chan StreamChunk, error)
// SupportsTools reports whether the provider can handle tool calls
// natively. Providers that return false will receive only the
// Messages field of CompleteRequest; Tools will be silently dropped.
SupportsTools() bool
// SupportsStreaming reports whether Stream() is functional. Providers
// that return false fall back to a synthetic stream that emits one
// ChunkText with the entire body and one ChunkDone.
SupportsStreaming() bool
// MaxContextTokens returns the largest context window the default
// model accepts. Used by the chat session token-budget manager (§8.4).
MaxContextTokens() int
// CostPerToken returns the per-token cost in USD for input and output.
// Used for budget tracking (§10). Local providers (Ollama) return zero.
CostPerToken() Cost
}
// Model describes a single model offered by a provider.
type Model struct {
ID string // vendor model ID, e.g. "claude-opus-4-6", "gpt-5"
DisplayName string // human-readable, e.g. "Claude Opus 4.6 (1M context)"
ContextTokens int // max input tokens
Tier string // "fast", "balanced", "deep" — used by frugal mode
}
// Cost is the per-million-token cost for input and output, in USD.
type Cost struct {
InputUSDPerMTok float64
OutputUSDPerMTok float64
}3.2 Request and response shapes
// CompleteRequest is the input to Complete or Stream. It is provider-agnostic;
// each provider translates it into its own wire format.
type CompleteRequest struct {
Model string // optional override; if empty, provider uses Models()[0]
System string // system prompt
Messages []Message // conversation history, oldest first
Tools []Tool // tools the model may call (ignored if SupportsTools()==false)
MaxTokens int // upper bound on output tokens
Temperature float64 // 0.0..1.0
Stop []string // optional stop sequences
Metadata Meta // observability tags (session_id, user_id, purpose)
}
// Message is a single turn in the conversation.
type Message struct {
Role Role // user | assistant | tool
Content string // free text (may be empty if ToolCalls/ToolResult is set)
ToolCalls []ToolCall // populated when an assistant turn invoked tools
ToolResult *ToolResult // populated for tool-role messages
}
// Role is "user", "assistant", or "tool".
type Role string
const (
RoleUser Role = "user"
RoleAssistant Role = "assistant"
RoleTool Role = "tool"
)
// Tool is a callable the model is allowed to invoke.
type Tool struct {
Name string // unique within a request, e.g. "get_logs"
Description string // shown to the model
InputSchema json.RawMessage // JSON Schema describing the parameters
}
// ToolCall is the model asking Kortex to run a tool.
type ToolCall struct {
ID string // provider-assigned correlation ID
Name string // tool name (must match a Tool from the request)
Input json.RawMessage // arguments, validated against InputSchema
}
// ToolResult is the answer Kortex gives the model after running a tool.
type ToolResult struct {
CallID string // matches ToolCall.ID
Content json.RawMessage // structured result (model sees it as JSON)
IsError bool // signals an error to the model so it can recover
}
// CompleteResponse is the synchronous result of Complete.
type CompleteResponse struct {
Content string // assistant text
ToolCalls []ToolCall // model wants to call these next; if non-empty, the caller must run them and call Complete again
StopReason StopReason // "end_turn", "tool_use", "max_tokens", "stop_sequence"
InputTokens int // billed input tokens
OutputTokens int // billed output tokens
Model string // which model actually answered
Latency time.Duration
}
// StopReason describes why the model stopped generating.
type StopReason string
const (
StopEndTurn StopReason = "end_turn"
StopToolUse StopReason = "tool_use"
StopMaxTokens StopReason = "max_tokens"
StopStopSequence StopReason = "stop_sequence"
StopError StopReason = "error"
)
// StreamChunk is one element of a streaming response.
type StreamChunk struct {
Type ChunkType
Text string // populated when Type == ChunkText
ToolCall *ToolCall // populated when Type == ChunkToolCall
Usage *Usage // populated when Type == ChunkDone
Err error // populated when Type == ChunkError
}
// ChunkType discriminates StreamChunk variants.
type ChunkType string
const (
ChunkText ChunkType = "text"
ChunkToolCall ChunkType = "tool_call"
ChunkDone ChunkType = "done"
ChunkError ChunkType = "error"
)
// Usage reports billed tokens for one completion.
type Usage struct {
InputTokens int
OutputTokens int
Model string
}
// Meta carries observability tags forwarded to logs and metrics.
type Meta struct {
SessionID string // chat_sessions.id, if any
UserID string // operator user ID
Purpose string // "chat", "triage", "summarize", "explain"
}3.3 The tool-call loop
A single user message may produce several round-trips between Kortex and the provider. The pattern is identical for all providers and runs in the chat controller (internal/chat/controller.go):
Operator -> ChatAPI: POST /chat/v1/sessions/{id}/messages
ChatAPI -> ChatController: AppendUserMessage
ChatController -> Provider: Stream(messages, tools)
Provider -> ChatController: chunk(text)...
Provider -> ChatController: chunk(tool_call: get_logs)
ChatController -> ToolRouter: Run(get_logs, args)
ToolRouter -> ChatController: result(json)
ChatController -> Provider: Stream(messages + tool_result, tools)
Provider -> ChatController: chunk(text)...
Provider -> ChatController: chunk(done)
ChatController -> ChatAPI: SSE done
ChatAPI -> Operator: stream closedThe loop terminates when the provider returns StopEndTurn or hits a configurable maximum tool-call depth (default 8). Exceeding the depth emits a LLMToolDepthExceeded finding and posts a graceful failure message into the chat — Kortex never silently truncates a model conclusion.
3.4 Registration and discovery
Each provider package registers itself via an init() function:
// internal/llm/anthropic/anthropic.go
package anthropic
import "kortex/internal/llm"
func init() {
llm.Register("anthropic", func(cfg llm.ProviderConfig) (llm.LLMProvider, error) {
return New(cfg)
})
}internal/llm/registry.go exposes:
func Register(name string, factory ProviderFactory)
func Get(name string) (LLMProvider, bool)
func List() []stringThe Kortex bootstrapper (cmd/kortex/main.go) imports the provider packages the operator wants:
import (
_ "kortex/internal/llm/anthropic"
_ "kortex/internal/llm/openai"
_ "kortex/internal/llm/ollama"
_ "kortex/internal/llm/koderai" // optional
)Disabling a provider is removing one import line. Adding a new one is one Go file plus one import. This matches the plugin model defined in RFC 002 §9.3.
4. Provider implementations
4.1 Anthropic — the default
Why default: Anthropic Claude has the most mature native tool-use protocol of any current vendor (typed tool_use and tool_result content blocks), the longest usable context window (1M tokens on Opus 4.6), and the Koder toolchain already includes the claude-api skill. It is also the model the Koder team uses for everything else, which means existing operator habits transfer.
Models exposed:
| Model ID | Display name | Context | Tier |
|---|---|---|---|
claude-opus-4-6 |
Claude Opus 4.6 (1M context) | 1,000,000 | deep |
claude-sonnet-4-6 |
Claude Sonnet 4.6 | 200,000 | balanced |
claude-haiku-4-6 |
Claude Haiku 4.6 | 200,000 | fast |
Wire format: HTTPS to https://api.anthropic.com/v1/messages. Bearer auth via x-api-key header. Streaming via Server-Sent Events on the same endpoint when stream: true is set in the request body.
Implementation choice: HTTP direct (no third-party SDK). The Anthropic wire format is small enough that vendoring anthropic-sdk-go would add a hundred indirect dependencies for a few hundred lines of marshalling. The claude-api skill in the operator environment is a help for /humans writing prompts/, not a runtime dependency.
Tool-use translation: Kortex Tool → Anthropic tools[].input_schema (JSON Schema), onetoone. Kortex ToolCall → Anthropic tool_use content block. Kortex ToolResult → Anthropic tool_result content block keyed by tool_use_id. The mapping is straightforward.
Streaming chunk mapping:
| Anthropic event | Kortex chunk |
|---|---|
content_block_delta (text) |
ChunkText |
content_block_start (tool_use) |
ChunkToolCall |
content_block_delta (tool_use input json) |
accumulated into the open ChunkToolCall |
message_delta with usage |
folded into the final ChunkDone |
message_stop |
ChunkDone |
error event |
ChunkError |
4.2 OpenAI
Why supported: the secondlargest installed base of LLM users, GPT5 is competitive on reasoning, and several Koder customers already have OpenAI contracts. The wire format is wellknown and the tooluse schema (the "function calling" / "tools" API) maps cleanly onto Kortex's Tool type.
Models exposed:
| Model ID | Display name | Context | Tier |
|---|---|---|---|
gpt-5 |
GPT-5 | 400,000 | deep |
gpt-4o |
GPT-4o | 128,000 | balanced |
gpt-4o-mini |
GPT-4o mini | 128,000 | fast |
Wire format: HTTPS to https://api.openai.com/v1/chat/completions. Bearer auth via Authorization: Bearer ${OPENAI_API_KEY}. Streaming via SSE on the same endpoint when stream: true.
Tool-use translation: Kortex Tool → OpenAI tools[].function. The input schema is the same JSON Schema dialect Anthropic uses, so no transformation is needed beyond wrapping. Kortex ToolCall → OpenAI tool_calls[]. Kortex ToolResult → OpenAI message with role tool and tool_call_id.
4.3 Ollama — the local-only path
Why supported: operators in regulated environments need a /provider that never sends data outside the local network/. Ollama runs models on the operator's own host (or a VM the operator controls), exposes an HTTP API that mostly mirrors the OpenAI shape, and supports the modern open models (Llama 3 70B, Mistral Large, Qwen 2.5, etc.).
Models exposed: whatever the operator has pulled. Kortex queries GET /api/tags on Ollama startup and populates Models() from the response. The operator picks one in kortex.toml.
Wire format: HTTPS or HTTP (operator's choice; HTTPS strongly recommended even on localhost) to http(s)://${OLLAMA_HOST}/api/chat. Bearer auth is optional (Ollama supports it via reverse proxy).
Tool use: Ollama added native tool support in late 2024. For models that do not implement it (smaller community models), Kortex falls back to promptinjected pseudotools: a system prompt instructs the model to emit <<TOOL name="..." args='{...}'>> markers, which Kortex parses and runs. This fallback is best-effort and surfaces a warning in the chat.
Cost: zero (Cost{0, 0}). The local provider is free at the marginal token level — the operator pays for hardware, not API calls.
4.4 koder-ai — the future internal provider
Status: placeholder. There is no koder-ai/ directory in the monorepo today (RFC 001 audit confirmed). When the Koder team ships an internal model — fine-tuned on Koder logs, trained on Koder runbooks, hosted on the Koder fleet — it will register itself as koder-ai and Kortex will treat it like any other provider.
Expected differences vs. third-party providers:
- Lower latency (same datacenter as Kortex deployments).
- No external billing.
- Tool-use schema designed for the Kortex tools, not retrofitted.
- Privacy: data never leaves the Koder network.
This RFC reserves the name and the import slot. Implementation lives in a followup RFC when koderai is real.
4.5 Comparison table
| Provider | Default model | Max ctx | Native tools | Native streaming | Latency | Cost (USDMtok inout) | Deployment |
|---|---|---|---|---|---|---|---|
anthropic |
claude-opus-4-6 |
1,000,000 | yes | yes | medium | 15.00 / 75.00 | cloud |
openai |
gpt-5 |
400,000 | yes | yes | medium | 12.00 / 60.00 | cloud |
ollama |
(operator pick) | model-dependent | yes (3.x+) | yes | high | 0 / 0 | local |
koder-ai |
TBD | TBD | TBD | TBD | low | 0 / 0 | internal |
The numbers in the cost column are 2026-Q1 list prices and exist for the budget calculator (§10), not as guarantees. Real billing is read from the provider's response headers when available.
5. Configuration via kortex.toml
The LLM surface is configured under the top-level [llm] table. The operator does not edit code to switch providers — they edit one line.
5.1 Full example
# kortex.toml — LLM section
[llm]
default_provider = "anthropic"
fallback_providers = ["openai", "ollama"]
mode = "normal" # "normal" | "frugal" | "local-only"
max_tool_depth = 8
session_idle_ttl = "168h" # 7 days
[llm.providers.anthropic]
api_key = "${ANTHROPIC_API_KEY}"
model = "claude-opus-4-6"
fast_model = "claude-haiku-4-6" # used in frugal mode
max_tokens = 4096
timeout = "60s"
base_url = "https://api.anthropic.com"
[llm.providers.openai]
api_key = "${OPENAI_API_KEY}"
model = "gpt-5"
fast_model = "gpt-4o-mini"
max_tokens = 4096
timeout = "60s"
base_url = "https://api.openai.com"
[llm.providers.ollama]
endpoint = "https://ollama.internal.example:11434"
model = "llama3:70b"
fast_model = "llama3:8b"
timeout = "120s"
[llm.budget]
daily_usd_limit = 50.00
per_user_hourly_limit = 30
per_session_token_cap = 200000
alert_at_pct = 80 # email admin at 80% of daily limit
[llm.privacy]
redact_pii = true # apply PII redactors before send
redact_secrets = true # apply secret-pattern redactors
local_only = false # if true, force ollama regardless
audit_every_call = true
do_not_retain_default = false5.2 Mode semantics
| Mode | Behavior |
|---|---|
normal |
Use default_provider with the configured model. Failover to fallback_providers on hard failure. |
frugal |
First call uses each provider's fast_model. If the answer hits StopMaxTokens or the model emits a structured "need more context" marker, retry with the full model. |
local-only |
Only ollama is consulted. All other providers are unreachable from the chat controller. Used by regulated operators. Setting [llm.privacy].local_only = true is equivalent. |
5.3 Secret resolution
Values like ${ANTHROPIC_API_KEY} are resolved at startup from the environment, then from the secrets backend (RFC 002 §12.4). A missing secret marks the provider as unavailable, not broken: Kortex starts normally, the provider is skipped in failover, and the admin API surfaces the failure under GET /api/v1/llm/providers.
6. Tool use schema
Tools are the heart of this RFC. Without them the LLM is just an expensive autocomplete; with them it is an investigator.
6.1 Design rules
- Read-only by default. Eight tools ship in Phase 2, /seven of them are
pure reads/. The eighth,
propose_action, writes only into the chat itself — never into a managed product. - Typed inputs. Every tool has a JSON Schema for its arguments. Inputs
are validated before the tool runs and a validation failure is returned to the model as a
ToolResult{IsError: true}so it can correct itself. - Bounded outputs. Every tool caps its result size (default
64 KiBserialized JSON). Truncation is signaled to the model in a structured field, never silently.
- Bounded latency — per-tool timeout, configurable. Per §15.3 resolved,
each tool category has its own default timeout, with a hard ceiling above which the call is cancelled via context:
| Tool category | Default timeout | Max timeout (hard) |
|---|---|---|
get_logs, get_metrics, get_traces |
30s | 5min |
compare_to_baseline, get_recent_deploys |
10s | 60s |
get_incident_history, get_admin_info |
5s | 30s |
propose_action |
5s | 30s |
Tool calls exceeding 30s of wall time cause the chat controller to emit an interim message to the user (/"this query is taking longer than usual..."/) so the operator knows the model is still working. Hitting the max returns a structured error to the model:
`json { "error": "tooltimeout", "elapsed": "5m1s", "tool": "getlogs" }
`
Defaults are overridable per-deployment via [llm.tools.<tool_name>] timeout = "..." in kortex.toml.
- Pagination is mandatory for large queries. Per §15.3 resolved,
get_logsandget_tracescap each call at 100 items (down from the 200 mentioned in early drafts). When the underlying query has more matches, the response includes anext_cursorfield; the model must request the next page explicitly. This prevents the model from accidentally pulling 24 hours of logs into a single tool result and blowing the context window. - Audited. Every tool call writes one row to
decisionsand emits onekoder.admin.actionevent (RFC 003).
6.2 The eight Phase-2 tools
6.2.1 get_logs
Purpose: fetch log lines from observe/log in a time window.
{
"name": "get_logs",
"description": "Query observe/log for log entries from a product within a time range. Returns up to 100 lines, newest first. Use `cursor` to fetch the next page when more matches exist.",
"input_schema": {
"type": "object",
"properties": {
"product": { "type": "string", "description": "Product name as registered in observe/log, e.g. 'saude-publica'" },
"time_range": { "type": "string", "description": "ISO 8601 interval, e.g. '2026-04-07T14:00:00Z/PT30M' or relative 'last 30m'" },
"query": { "type": "string", "description": "LogQL-compatible filter, optional" },
"level": { "type": "string", "enum": ["debug","info","warn","error","fatal"], "description": "Optional minimum severity" },
"limit": { "type": "integer", "minimum": 1, "maximum": 100, "default": 100 },
"cursor": { "type": "string", "description": "Opaque cursor returned by a prior call. Omit on the first page." }
},
"required": ["product", "time_range"]
}
}Result schema:
{
"lines": [
{ "ts": "2026-04-07T14:31:02.341Z", "level": "error", "host": "h1", "message": "..." }
],
"truncated": false,
"total_matches": 142,
"next_cursor": "eyJvZmZzZXQiOjEwMH0",
"query_latency_ms": 87
}next_cursor is present when more matches exist beyond the page returned. The model passes it back in the next call's cursor argument to fetch the following 100 lines. The cursor is opaque to the model — the chat controller validates and decodes it.
Implementation: internal/llm/tools/get_logs.go → GET /api/v1/query on observe/log via the Senses pull adapter (RFC 002 §3.1.2). Readonly. The 100line cap is enforced at the adapter, not at the input schema, so the model cannot bypass it by lying about limit.
6.2.2 get_metrics
Purpose: fetch metric series from observe/mon.
{
"name": "get_metrics",
"description": "Query observe/mon for a named metric within a time range. Returns the time series and basic statistics.",
"input_schema": {
"type": "object",
"properties": {
"product": { "type": "string" },
"metric_name": { "type": "string", "description": "e.g. 'http_request_duration_seconds_p99'" },
"time_range": { "type": "string" },
"labels": { "type": "object", "description": "label filters" },
"step": { "type": "string", "description": "Resolution, e.g. '1m', '5m', '1h'" }
},
"required": ["product", "metric_name", "time_range"]
}
}Result schema:
{
"samples": [ { "ts": "...", "value": 0.412 } ],
"stats": { "min": 0.10, "max": 0.85, "mean": 0.34, "p50": 0.31, "p95": 0.66, "p99": 0.79 },
"unit": "seconds",
"truncated": false
}Implementation: read-only PromQL query against observe/mon.
6.2.3 get_traces
Purpose: fetch a distributed trace from observe/apm (formerly observe/trace-apm) by request ID.
{
"name": "get_traces",
"description": "Look up a distributed trace by request_id from observe/apm. Returns the span tree with timings and tags.",
"input_schema": {
"type": "object",
"properties": {
"request_id": { "type": "string", "description": "Trace ID or correlation ID" },
"include_logs": { "type": "boolean", "default": false, "description": "Include log lines correlated to spans" }
},
"required": ["request_id"]
}
}Result schema:
{
"trace_id": "...",
"root": { "service": "...", "operation": "...", "start_us": 0, "duration_us": 412000, "status": "ok",
"children": [ { "service": "...", "duration_us": 88000, "tags": { ... }, "children": [] } ] },
"errors": [ { "span_id": "...", "message": "..." } ],
"logs": [ ]
}Implementation: GET /api/v1/traces/{trace_id} on observe/apm.
6.2.4 compare_to_baseline
Purpose: statistical comparison against the rolling baseline maintained by Brain (RFC 002 §3.2.1).
{
"name": "compare_to_baseline",
"description": "Compare a metric over a time range to the 24-hour rolling baseline maintained by Kortex Brain. Returns Z-scores and anomaly flags.",
"input_schema": {
"type": "object",
"properties": {
"product": { "type": "string" },
"metric_name": { "type": "string" },
"time_range": { "type": "string" }
},
"required": ["product", "metric_name", "time_range"]
}
}Result schema:
{
"baseline_window": "PT24H",
"baseline_mean": 0.32,
"baseline_stddev": 0.04,
"samples": [
{ "ts": "...", "value": 0.81, "z_score": 12.25, "anomalous": true }
],
"anomaly_count": 7,
"max_z_score": 12.25
}Implementation: delegates to the Brain analyzer; no upstream call.
6.2.5 get_recent_deploys
Purpose: list recent deployment events for a product (RFC 003 koder.deploy.* events).
{
"name": "get_recent_deploys",
"description": "List deploy events for a product within a time range. Useful for change-correlation analysis.",
"input_schema": {
"type": "object",
"properties": {
"product": { "type": "string" },
"time_range": { "type": "string" }
},
"required": ["product", "time_range"]
}
}Result schema:
{
"deploys": [
{ "ts": "2026-04-07T14:31:00Z", "version": "v1.4.2", "actor": "ci",
"host": "h1", "duration_s": 12, "status": "ok",
"commit": "abc123", "rolled_back": false }
]
}Implementation: query the decisions and event tables in Memory.
6.2.6 get_incident_history
Purpose: search past incidents in Kortex Memory for similar problems.
{
"name": "get_incident_history",
"description": "Search past incidents stored in Kortex Memory by free-text query, optionally filtered by product or time range.",
"input_schema": {
"type": "object",
"properties": {
"query": { "type": "string", "description": "Free-text search across title, findings, and decisions" },
"product": { "type": "string", "description": "Optional product filter" },
"time_range": { "type": "string", "description": "Optional time window" },
"limit": { "type": "integer", "minimum": 1, "maximum": 25, "default": 10 }
},
"required": ["query"]
}
}Result schema:
{
"incidents": [
{ "id": "...", "title": "saude-publica latency spike",
"opened_at": "2026-03-18T...", "closed_at": "2026-03-18T...",
"severity": "warn", "resolution_summary": "..." }
]
}Implementation: SQL search against the incidents and findings tables in Postgres. Phase 5 may upgrade this to a koder-kdb-backed embedding search.
6.2.7 get_admin_info
Purpose: fetch the standardized /admin/info envelope from a managed product (defined by RFC 004).
{
"name": "get_admin_info",
"description": "Fetch the /admin/info envelope from a managed product via the Kortex control plane. Returns version, uptime, health, configuration summary.",
"input_schema": {
"type": "object",
"properties": {
"product": { "type": "string", "description": "Product name as registered in the control plane" }
},
"required": ["product"]
}
}Result schema: whatever shape RFC 004 standardizes for the info envelope; typically:
{
"product": "saude-publica",
"version": "v1.4.2",
"uptime_s": 8741,
"health": "ok",
"build": { "commit": "abc123", "built_at": "..." },
"config_hash": "...",
"feature_flags": { ... }
}Implementation: HTTP GET through the control plane (RFC 004) with the operator's audit identity propagated via headers.
6.2.8 propose_action
Purpose: the model proposes a remediation. Nothing executes.
{
"name": "propose_action",
"description": "Propose a remediation action for the operator to review. The operator must click Confirm in the chat UI before anything is executed. This tool NEVER takes action by itself.",
"input_schema": {
"type": "object",
"properties": {
"action": { "type": "string", "description": "Short verb form, e.g. 'restart', 'reload-tls', 'rollback-deploy'" },
"product": { "type": "string" },
"params": { "type": "object", "description": "Action-specific parameters" },
"rationale": { "type": "string", "description": "Why the model believes this action is appropriate" },
"blast_radius":{ "type": "string", "description": "Estimated scope, e.g. 'one host', 'one service', 'whole cluster'" },
"reversible": { "type": "boolean", "description": "Can the action be undone?" }
},
"required": ["action", "product", "rationale"]
}
}Result schema:
{
"proposal_id": "...",
"posted_to_chat": true,
"awaits_confirmation": true
}Implementation: writes a chat_messages row with role='assistant' and a structured tool_calls payload that the Flutter Web dashboard renders as a card with two buttons: Confirm and Reject.
- If the operator clicks Confirm, the chat controller routes the proposal through the Reflexes dispatch queue (RFC 002 §3.3) as if it had been a one-shot manual rule firing, with
require_approval: truealready satisfied. All Reflex guards still apply. - If the operator clicks Reject, the proposal is annotated with the rejection and closed.
- If the operator does nothing for 10 minutes, the proposal expires.
There is no restart_service, no roll_back_deploy, no reload_config, no delete_anything. If a future RFC adds another write-side tool, it must justify why propose_action is insufficient and explain how the new tool preserves the humaninthe-loop property.
6.3 Tool router
internal/llm/tools/router.go exposes:
type ToolRouter struct {
tools map[string]Tool
}
func (r *ToolRouter) Run(ctx context.Context, call llm.ToolCall, meta llm.Meta) llm.ToolResultThe router validates the input against the JSON Schema, enforces the per-tool timeout, runs the tool, caps the output size, records a decisions row, emits an audit event, and returns a ToolResult. Tool errors become ToolResult{IsError: true} so the model can recover or apologize, never a Go panic that aborts the chat.
6.4 Per-tool error semantics
| Failure | Returned to model as | Operator-visible? |
|---|---|---|
| Schema validation failed | ToolResult{IsError: true, Content: "validation: ..."} |
dashboard shows a yellow chip on the tool call |
| Upstream timeout (interim, < max) | ToolResult{IsError: false, Content: "..."} after the wall-time interim message |
yellow chip |
| Upstream timeout (max exceeded) | ToolResult{IsError: true, Content: "tool_timeout: 5m1s"} |
yellow chip |
| Upstream returned 5xx | ToolResult{IsError: true, Content: "upstream: 503"} |
red chip |
| Upstream returned 4xx | ToolResult{IsError: true, Content: "upstream: 401 unauthorized"} |
red chip |
| Result truncated (cap reached) | ToolResult{IsError: false, Content: {... "truncated": true, "next_cursor": "..."}} |
gray chip |
| Tool panicked (bug) | ToolResult{IsError: true, Content: "internal error"} |
red chip + paged |
6.5 Operator-supplied plugin tools
Per §15.4 resolved, operators can extend the tool catalog with /plugin tools/ that bridge Kortex to internal systems (Jira, ServiceNow, internal runbooks, custom dashboards) without modifying Kortex source. Plugins are declared as YAML manifests under /etc/kortex/plugins/:
# /etc/kortex/plugins/jira-search.yaml
name: jira_search
description: |
Search Jira tickets matching a JQL query. Returns up to 20 hits with
key, summary, status, and assignee.
parameters:
query:
type: string
description: "JQL expression, e.g. 'project = OPS AND status = Open'"
limit:
type: integer
minimum: 1
maximum: 20
default: 10
endpoint:
method: POST
url: http://internal-bridge.koder.local:8080/jira/search
auth:
type: bearer
token: "${JIRA_BRIDGE_TOKEN}"
allowed_sessions:
- "production"
- "incident-*"
timeout: 10sThe chat controller loads plugin manifests only at startup. There is no runtime API to add a plugin — preventing prompt-injection attacks where the model "asks" to install a new tool. To add or change a plugin, an operator edits the file and restarts (or reloads via SIGHUP — same path as RFC 004 §3.5).
Trust model:
- Plugins run in-process as HTTP clients with Kortex's network identity.
They should be pointed at internal-only bridges, never at third-party APIs directly.
- The
allowed_sessionsfield is a list of glob patterns over chatsession names. A plugin is only invocable from a session whose name matches one of the patterns. This lets operators restrict sensitive plugins (e.g. "call our internal payments admin API") to a specific named session.
- Every plugin invocation is audited like any other tool call (one
decisionsrow, onekoder.admin.actionevent). - Plugins inherit the same per-tool timeout system from §6.1: the
manifest's
timeoutfield sets the default; the hard ceiling is 5 minutes regardless. - Plugins are advisory only — they cannot fire
propose_action-stylestate changes. The §3 hard separation still holds: only the rules engine touches the control plane.
A plugin manifest that fails validation at startup (bad YAML, missing required fields, unreachable endpoint, conflicting name with a built-in tool) is logged at ERROR and the plugin is not loaded. The chat controller starts normally without the plugin, and operators see the failure in the dashboard. Plugins never silently fail to load.
7. Prompt engineering by category
The prompts that frame each LLM call are code, not content. They live in `internalllmprompts