Ai runtime

AI Runtime — Local LLM Serving

*rea:*Intelligence
*ath:*ai/runtime
*ind:*LLM runtime (download, serve, quantize, fine-tune)

Role in the stack

AI Runtime is the local LLM serving layer of Koder. It downloads models from ai/zoo (or upstream providers), serves them via an OpenAI~~compatible API, quantizes them for the target hardware, and optionally fine~~tunes them on tenant data. Its consumers are ai/gateway (which routes traffic to Runtime as one of its backends) and any service that needs local inference without leaving the Koder perimeter.

Deployed in production as LXC 127 on s.r1 using Ollama as the inference backend.

Features

Download + serve models from ai/zoo or upstream registries
Quantization for GPU / CPU targets
Fine-tuning pipeline
OpenAI-compat /v1/ API
Scales from laptop to cluster

Primary couplings

Consumer	Relationship
`ai/gateway`	Routes local-model requests to Runtime
`ai/zoo`	Model registry source
`ai/voice`	Calls Runtime via Gateway for transcription summarization