AI model recommendations by use case

living

AI model recommendations by use case

Single source of truth for *hich model to use for which job* consumed by:

  • *oder AI Gateway*via the Aliases map (services/ai/gateway/internal/config/config.go) — clients pass an alias like model: "reasoning" and the gateway resolves to the current pick. Updating the pick is a one-line edit; consumers stay untouched.
  • *ode*(the agentic Koder AI) and other AI agents — read this file when deciding which backend to invoke for a given task.
  • *umans*— review on every flagship release; promotedemote alternatives based on `k-bench` results.

*ow to read* "primary" is what the gateway alias resolves to today. "Alternatives" are equally valid choices with different costqualitylatency trade-offs. "Why not the others" notes why something is not primary, so future revisions are informed rather than impulsive.


1. Reasoning pesado · alias reasoning

*rimary* anthropic/claude-opus-4-7
*lternatives* openai/o3, xai/grok-4.20-0309-reasoning, deepseek/deepseek-reasoner, kimi/kimi-k2.6
*hy primary* Best raw reasoning quality for high-stakes architectural decisions, RFC analysis, ambiguous bugs. Worth the cost.
*hy not OpenAI o3* Comparable quality; pick Opus when context > 100K or when you need strong code understanding. o3 wins on math-heavy reasoning.
*hy not DeepSeek-R1* ~95% as good in many cases at 120 the cost — promote to primary if `k-bench` gap closes further.

2. Código (arquitetura/refactor) · alias code-arch

*rimary* anthropic/claude-opus-4-7
*lternatives* openai/o3, openai/gpt-5, xai/grok-4.3
*hy primary* Best at multifile refactors, understanding crosscutting concerns, suggesting policies. Same model as reasoning — same justification.

3. Código (boilerplate / fast iteration) · alias code-fast

*rimary* xai/grok-code-fast-1
*lternatives* mistral/codestral-latest, deepseek/deepseek-chat, openai/gpt-5-mini
*hy primary* Specifically tuned for code; subsecond latency; cheap. Paretooptimal for "scaffold this CRUD endpoint" or "write 50 unit tests".
*hen to escalate* Logic gets nontrivial → switch to `codearch`.

4. General-purpose conversational · alias general

*rimary* openai/gpt-5
*lternatives* anthropic/claude-sonnet-4-6, xai/grok-4.3, kimi/kimi-k2.6
*hy primary* Strong allrounder; multimodal native (imagesaudio); best ecosystem of toolsfunctioncalling. Defaults that "just work".

5. Cheap commodity · alias cheap

*rimary* deepseek/deepseek-chat
*lternatives* google/gemini-2.0-flash, openai/gpt-4o-mini, mistral/mistral-small-latest
*hy primary* ~$0.14/M input with cache hit — 10× cheaper than gpt-4o-mini, comparable quality on common tasks (classification, summarization, simple Q&A).
*hen to switch to gemini2.0flash* Need free tier (1500 req/day no billing) OR Google ecosystem features (grounding, native search).

6. Vision / multimodal · alias vision

*rimary* openai/gpt-4o
*lternatives* anthropic/claude-sonnet-4-6, mistral/pixtral-large-latest, kimi/moonshot-v1-128k-vision-preview
*hy primary* Best chart/diagram OCR, screenshot understanding, framebyframe video analysis (relevant for Koder Eye, Koru, dek workflows).

7. Audio / transcription · alias audio

*rimary* openai/whisper-1
*lternatives* koder-ai-voice (local, postVOICE52 with CUDA backend)
*hy primary* Mature, multilingual, accurate for Brazilian Portuguese.
*uture* When VOICE52 ships, default to local for cost savings (T4 GPU = $0/hour); keep whisper1 as fallback for high-quality batch jobs.

8. Embeddings · alias embed

*rimary* openai/text-embedding-3-large
*lternatives* mistral/mistral-embed, google/embedding-001
*hy primary* 3072dim, wellsupported by every vector DB. Mature SDKs.

9. Image generation · alias image

*rimary* openai/dall-e-3
*lternatives* xai/grok-imagine-image, xai/grok-imagine-image-pro, stability/... (when key added)
*hy primary* Best instructionfollowing on textin-image, branded layouts.
*rimary* xai/grok-3
*lternatives* perplexity/sonar-pro (when key added)
*hy primary* Builtin realtime search, no extra setup. Already paid (xAI auto-reload active).
*hy not Perplexity yet* Card declined 20260506 — pending payment fix. When paid, sonarpro likely takes primary because it's purposebuilt for citations.

11. Long context (>200K tokens) · alias long-context

*rimary* google/gemini-2.5-pro
*lternatives* anthropic/claude-sonnet-4-6 (~1M ctx), kimi/moonshot-v1-128k
*hy primary* ~2M token context window; cheapest per token at that scale; still on free tier today.

12. Local zero-cost · alias local

*rimary* ollama/qwen2.5:3b
*lternatives* future ollama models pulled on demand
*hy primary* Already running on Tesla T4 GPU at ~172 toks. Free percall. Use for: lowstakes classification, embeddings of bulk data, devstaging defaults.

Tier shortcuts (legacy — back-compat)

These predate the use-case aliases. Kept so existing code doesn't break.

Alias Resolves to Notes
fast anthropic/claude-haiku-4-5-20251001 cheap fast Anthropic
smart anthropic/claude-sonnet-4-6 balanced Anthropic
best anthropic/claude-opus-4-7 flagship Anthropic
claude anthropic/claude-sonnet-4-6 "I want Claude"

Update protocol

When updating this registry, *lso update the Aliases map in services/ai/gateway/internal/config/config.go*in the same commit. The gateway's binary needs a rebuild + redeploy for the alias change to take effect (the model name is still hard-coded there).

Trigger conditions for revision:

  • New flagship model from any provider → reassess primary in relevant categories
  • /k-bench results show alternative beats primary by ≥10% on quality OR is ≥2× cheaper at same quality
  • A provider deprecates the current primary model
  • A new use case emerges that doesn't fit any existing alias

Quarterly review minimum even if no triggers fire.

Future: auto-tuned routing

AIGW-31 (smartrouter autoselect) plans to populate this dynamically from nightly /k-bench runs against a task_categories SQLite table (currently missing — see AIGW-38 sub-issue 3). When that lands, this file becomes the *allback / human-readable explanation* with the SQLite table being the runtime sourceoftruth.

Source: ../home/koder/dev/koder/meta/docs/stack/registries/ai-model-recommendations.md