AI model recommendations by use case

living

AI model recommendations by use case

Single source of truth for *hich model to use for which job* consumed by:

*oder AI Gateway*via the Aliases map (services/ai/gateway/internal/config/config.go) — clients pass an alias like model: "reasoning" and the gateway resolves to the current pick. Updating the pick is a one-line edit; consumers stay untouched.
*ode*(the agentic Koder AI) and other AI agents — read this file when deciding which backend to invoke for a given task.
*umans*— review on every flagship release; promotedemote alternatives based on `k-bench` results.

*ow to read* "primary" is what the gateway alias resolves to today. "Alternatives" are equally valid choices with different costqualitylatency trade-offs. "Why not the others" notes why something is not primary, so future revisions are informed rather than impulsive.

1. Reasoning pesado · alias `reasoning`

rimary	`anthropic/claude-opus-4-7`
lternatives	`openai/o3`, `xai/grok-4.20-0309-reasoning`, `deepseek/deepseek-reasoner`, `kimi/kimi-k2.6`
hy primary	Best raw reasoning quality for high-stakes architectural decisions, RFC analysis, ambiguous bugs. Worth the cost.
hy not OpenAI o3	Comparable quality; pick Opus when context > 100K or when you need strong code understanding. o3 wins on math-heavy reasoning.
hy not DeepSeek-R1	~95% as good in many cases at 120 the cost — promote to primary if `k-bench` gap closes further.

2. Código (arquitetura/refactor) · alias `code-arch`

rimary	`anthropic/claude-opus-4-7`
lternatives	`openai/o3`, `openai/gpt-5`, `xai/grok-4.3`
hy primary	Best at multi~~file refactors, understanding cross~~cutting concerns, suggesting policies. Same model as reasoning — same justification.

3. Código (boilerplate / fast iteration) · alias `code-fast`

rimary	`xai/grok-code-fast-1`
lternatives	`mistral/codestral-latest`, `deepseek/deepseek-chat`, `openai/gpt-5-mini`
hy primary	Specifically tuned for code; sub~~second latency; cheap. Pareto~~optimal for "scaffold this CRUD endpoint" or "write 50 unit tests".
hen to escalate	Logic gets non~~trivial → switch to `code~~arch`.

4. General-purpose conversational · alias `general`

rimary	`openai/gpt-5`
lternatives	`anthropic/claude-sonnet-4-6`, `xai/grok-4.3`, `kimi/kimi-k2.6`
hy primary	Strong all~~rounder; multimodal native (imagesaudio); best ecosystem of toolsfunction~~calling. Defaults that "just work".

5. Cheap commodity · alias `cheap`

rimary	`deepseek/deepseek-chat`
lternatives	`google/gemini-2.0-flash`, `openai/gpt-4o-mini`, `mistral/mistral-small-latest`
hy primary	~$0.14/M input with cache hit — 10× cheaper than `gpt-4o-mini`, comparable quality on common tasks (classification, summarization, simple Q&A).
hen to switch to gemini~~2.0~~flash	Need free tier (1500 req/day no billing) OR Google ecosystem features (grounding, native search).

6. Vision / multimodal · alias `vision`

rimary	`openai/gpt-4o`
lternatives	`anthropic/claude-sonnet-4-6`, `mistral/pixtral-large-latest`, `kimi/moonshot-v1-128k-vision-preview`
hy primary	Best chart/diagram OCR, screenshot understanding, framebyframe video analysis (relevant for Koder Eye, Koru, dek workflows).

7. Audio / transcription · alias `audio`

rimary	`openai/whisper-1`
lternatives	`koder-ai-voice` (local, post~~VOICE~~52 with CUDA backend)
hy primary	Mature, multilingual, accurate for Brazilian Portuguese.
uture	When VOICE~~52 ships, default to local for cost savings (T4 GPU = $0/hour); keep whisper~~1 as fallback for high-quality batch jobs.

8. Embeddings · alias `embed`

rimary	`openai/text-embedding-3-large`
lternatives	`mistral/mistral-embed`, `google/embedding-001`
hy primary	3072~~dim, well~~supported by every vector DB. Mature SDKs.

9. Image generation · alias `image`

rimary	`openai/dall-e-3`
lternatives	`xai/grok-imagine-image`, `xai/grok-imagine-image-pro`, `stability/...` (when key added)
hy primary	Best instruction~~following on text~~in-image, branded layouts.

10. Search-augmented · alias `search`

rimary	`xai/grok-3`
lternatives	`perplexity/sonar-pro` (when key added)
hy primary	Built~~in real~~time search, no extra setup. Already paid (xAI auto-reload active).
hy not Perplexity yet	Card declined 20260506 — pending payment fix. When paid, sonar~~pro likely takes primary because it's purpose~~built for citations.

11. Long context (>200K tokens) · alias `long-context`

rimary	`google/gemini-2.5-pro`
lternatives	`anthropic/claude-sonnet-4-6` (~1M ctx), `kimi/moonshot-v1-128k`
hy primary	~2M token context window; cheapest per token at that scale; still on free tier today.

12. Local zero-cost · alias `local`

rimary	`ollama/qwen2.5:3b`
lternatives	future ollama models pulled on demand
hy primary	Already running on Tesla T4 GPU at ~172 toks. Free per~~call. Use for: low~~stakes classification, embeddings of bulk data, devstaging defaults.

Tier shortcuts (legacy — back-compat)

These predate the use-case aliases. Kept so existing code doesn't break.

Alias	Resolves to	Notes
`fast`	`anthropic/claude-haiku-4-5-20251001`	cheap fast Anthropic
`smart`	`anthropic/claude-sonnet-4-6`	balanced Anthropic
`best`	`anthropic/claude-opus-4-7`	flagship Anthropic
`claude`	`anthropic/claude-sonnet-4-6`	"I want Claude"

Update protocol

When updating this registry, *lso update the Aliases map in services/ai/gateway/internal/config/config.go*in the same commit. The gateway's binary needs a rebuild + redeploy for the alias change to take effect (the model name is still hard-coded there).

Trigger conditions for revision:

New flagship model from any provider → reassess primary in relevant categories
/k-bench results show alternative beats primary by ≥10% on quality OR is ≥2× cheaper at same quality
A provider deprecates the current primary model
A new use case emerges that doesn't fit any existing alias

Quarterly review minimum even if no triggers fire.

Future: auto-tuned routing

AIGW-31 (smart~~router auto~~select) plans to populate this dynamically from nightly /k-bench runs against a task_categories SQLite table (currently missing — see AIGW-38 sub-issue 3). When that lands, this file becomes the *allback / human-readable explanation* with the SQLite table being the runtime sourceoftruth.

AI model recommendations by use case

1. Reasoning pesado · alias reasoning

2. Código (arquitetura/refactor) · alias code-arch

3. Código (boilerplate / fast iteration) · alias code-fast

4. General-purpose conversational · alias general

5. Cheap commodity · alias cheap

6. Vision / multimodal · alias vision

7. Audio / transcription · alias audio

8. Embeddings · alias embed

9. Image generation · alias image

10. Search-augmented · alias search

11. Long context (>200K tokens) · alias long-context

12. Local zero-cost · alias local

Tier shortcuts (legacy — back-compat)

Update protocol

Future: auto-tuned routing

1. Reasoning pesado · alias `reasoning`

2. Código (arquitetura/refactor) · alias `code-arch`

3. Código (boilerplate / fast iteration) · alias `code-fast`

4. General-purpose conversational · alias `general`

5. Cheap commodity · alias `cheap`

6. Vision / multimodal · alias `vision`

7. Audio / transcription · alias `audio`

8. Embeddings · alias `embed`

9. Image generation · alias `image`

10. Search-augmented · alias `search`

11. Long context (>200K tokens) · alias `long-context`

12. Local zero-cost · alias `local`