AI model recommendations by use case
Single source of truth for *hich model to use for which job* consumed by:
- *oder AI Gateway*via the
Aliases map (services/ai/gateway/internal/config/config.go) — clients pass an alias like model: "reasoning" and the gateway resolves to the current pick. Updating the pick is a one-line edit; consumers stay untouched.
- *ode*(the agentic Koder AI) and other AI agents — read this file when deciding which backend to invoke for a given task.
- *umans*— review on every flagship release; promotedemote alternatives based on `k-bench` results.
*ow to read* "primary" is what the gateway alias resolves to today. "Alternatives" are equally valid choices with different costqualitylatency trade-offs. "Why not the others" notes why something is not primary, so future revisions are informed rather than impulsive.
1. Reasoning pesado · alias reasoning
| *rimary* |
anthropic/claude-opus-4-7 |
| *lternatives* |
openai/o3, xai/grok-4.20-0309-reasoning, deepseek/deepseek-reasoner, kimi/kimi-k2.6 |
| *hy primary* |
Best raw reasoning quality for high-stakes architectural decisions, RFC analysis, ambiguous bugs. Worth the cost. |
| *hy not OpenAI o3* |
Comparable quality; pick Opus when context > 100K or when you need strong code understanding. o3 wins on math-heavy reasoning. |
| *hy not DeepSeek-R1* |
~95% as good in many cases at 120 the cost — promote to primary if `k-bench` gap closes further. |
2. Código (arquitetura/refactor) · alias code-arch
| *rimary* |
anthropic/claude-opus-4-7 |
| *lternatives* |
openai/o3, openai/gpt-5, xai/grok-4.3 |
| *hy primary* |
Best at multifile refactors, understanding crosscutting concerns, suggesting policies. Same model as reasoning — same justification. |
3. Código (boilerplate / fast iteration) · alias code-fast
| *rimary* |
xai/grok-code-fast-1 |
| *lternatives* |
mistral/codestral-latest, deepseek/deepseek-chat, openai/gpt-5-mini |
| *hy primary* |
Specifically tuned for code; subsecond latency; cheap. Paretooptimal for "scaffold this CRUD endpoint" or "write 50 unit tests". |
| *hen to escalate* |
Logic gets nontrivial → switch to `codearch`. |
4. General-purpose conversational · alias general
| *rimary* |
openai/gpt-5 |
| *lternatives* |
anthropic/claude-sonnet-4-6, xai/grok-4.3, kimi/kimi-k2.6 |
| *hy primary* |
Strong allrounder; multimodal native (imagesaudio); best ecosystem of toolsfunctioncalling. Defaults that "just work". |
5. Cheap commodity · alias cheap
| *rimary* |
deepseek/deepseek-chat |
| *lternatives* |
google/gemini-2.0-flash, openai/gpt-4o-mini, mistral/mistral-small-latest |
| *hy primary* |
~$0.14/M input with cache hit — 10× cheaper than gpt-4o-mini, comparable quality on common tasks (classification, summarization, simple Q&A). |
*hen to switch to gemini2.0flash* |
Need free tier (1500 req/day no billing) OR Google ecosystem features (grounding, native search). |
6. Vision / multimodal · alias vision
| *rimary* |
openai/gpt-4o |
| *lternatives* |
anthropic/claude-sonnet-4-6, mistral/pixtral-large-latest, kimi/moonshot-v1-128k-vision-preview |
| *hy primary* |
Best chart/diagram OCR, screenshot understanding, framebyframe video analysis (relevant for Koder Eye, Koru, dek workflows). |
7. Audio / transcription · alias audio
| *rimary* |
openai/whisper-1 |
| *lternatives* |
koder-ai-voice (local, postVOICE52 with CUDA backend) |
| *hy primary* |
Mature, multilingual, accurate for Brazilian Portuguese. |
| *uture* |
When VOICE52 ships, default to local for cost savings (T4 GPU = $0/hour); keep whisper1 as fallback for high-quality batch jobs. |
8. Embeddings · alias embed
| *rimary* |
openai/text-embedding-3-large |
| *lternatives* |
mistral/mistral-embed, google/embedding-001 |
| *hy primary* |
3072dim, wellsupported by every vector DB. Mature SDKs. |
9. Image generation · alias image
| *rimary* |
openai/dall-e-3 |
| *lternatives* |
xai/grok-imagine-image, xai/grok-imagine-image-pro, stability/... (when key added) |
| *hy primary* |
Best instructionfollowing on textin-image, branded layouts. |
10. Search-augmented · alias search
| *rimary* |
xai/grok-3 |
| *lternatives* |
perplexity/sonar-pro (when key added) |
| *hy primary* |
Builtin realtime search, no extra setup. Already paid (xAI auto-reload active). |
| *hy not Perplexity yet* |
Card declined 20260506 — pending payment fix. When paid, sonarpro likely takes primary because it's purposebuilt for citations. |
11. Long context (>200K tokens) · alias long-context
| *rimary* |
google/gemini-2.5-pro |
| *lternatives* |
anthropic/claude-sonnet-4-6 (~1M ctx), kimi/moonshot-v1-128k |
| *hy primary* |
~2M token context window; cheapest per token at that scale; still on free tier today. |
12. Local zero-cost · alias local
| *rimary* |
ollama/qwen2.5:3b |
| *lternatives* |
future ollama models pulled on demand |
| *hy primary* |
Already running on Tesla T4 GPU at ~172 toks. Free percall. Use for: lowstakes classification, embeddings of bulk data, devstaging defaults. |
Tier shortcuts (legacy — back-compat)
These predate the use-case aliases. Kept so existing code doesn't break.
| Alias |
Resolves to |
Notes |
fast |
anthropic/claude-haiku-4-5-20251001 |
cheap fast Anthropic |
smart |
anthropic/claude-sonnet-4-6 |
balanced Anthropic |
best |
anthropic/claude-opus-4-7 |
flagship Anthropic |
claude |
anthropic/claude-sonnet-4-6 |
"I want Claude" |
Update protocol
When updating this registry, *lso update the Aliases map in services/ai/gateway/internal/config/config.go*in the same commit. The gateway's binary needs a rebuild + redeploy for the alias change to take effect (the model name is still hard-coded there).
Trigger conditions for revision:
- New flagship model from any provider → reassess primary in relevant categories
/k-bench results show alternative beats primary by ≥10% on quality OR is ≥2× cheaper at same quality
- A provider deprecates the current primary model
- A new use case emerges that doesn't fit any existing alias
Quarterly review minimum even if no triggers fire.
Future: auto-tuned routing
AIGW-31 (smartrouter autoselect) plans to populate this dynamically from nightly /k-bench runs against a task_categories SQLite table (currently missing — see AIGW-38 sub-issue 3). When that lands, this file becomes the *allback / human-readable explanation* with the SQLite table being the runtime sourceoftruth.