Ai runtime
AI Runtime — Local LLM Serving
- *rea:*Intelligence
- *ath:*
ai/runtime - *ind:*LLM runtime (download, serve, quantize, fine-tune)
Role in the stack
AI Runtime is the local LLM serving layer of Koder. It downloads models from ai/zoo (or upstream providers), serves them via an OpenAIcompatible API, quantizes them for the target hardware, and optionally finetunes them on tenant data. Its consumers are ai/gateway (which routes traffic to Runtime as one of its backends) and any service that needs local inference without leaving the Koder perimeter.
Deployed in production as LXC 127 on s.r1 using Ollama as the inference backend.
Features
- Download + serve models from
ai/zooor upstream registries - Quantization for GPU / CPU targets
- Fine-tuning pipeline
- OpenAI-compat
/v1/API - Scales from laptop to cluster
Primary couplings
| Consumer | Relationship |
|---|---|
ai/gateway |
Routes local-model requests to Runtime |
ai/zoo |
Model registry source |
ai/voice |
Calls Runtime via Gateway for transcription summarization |