Ai embed
AI Embed — Unified Embeddings Foundation
- *rea:*Intelligence
- *ath:*
services/ai/embed - *ind:*Embedding generation foundation (text + image + audio; intent-based model selection)
- *tatus:*v0.0.7 —
#013binary naming alignment shipped 20260511. Daemon binary renamedkembed→koder-embed; workerkembed-worker→koder-embed-worker; CLIkembed-admin→kembed. Matches the cache precedent (koder-cachedaemon +kcacheCLI). systemd units updated; feature surface unchanged from v0.0.6. Previous v0.0.6:#007HTTP cache adapter consumingservices/ai/cache#008KV API.
Role in the stack
embed consolidates embeddings server-side. Today every consumer ships its own model (kortex, search, recsys, agents) — no shared cache, inconsistent dimensions, no consolidated billing. Migrating to a better model means rewriting each consumer.
The headline design choice: *ntentbased selection* Consumers query by intent (`textmatchsearchclassify, text, image-match`) and the registry picks the model. Swapping a model is a registry change, not a consumer rewrite.
It is the Koder analog of OpenAI Embeddings, Voyage, Cohere Embed and Jina — self-hosted via BGE / E5 / CLIP / CLAP on GPU runtime, with proxy fallback to OpenAI / Voyage / Cohere through services/ai/gateway for capability gaps.
Boundary vs neighbors
infra/data/kdb-vectoris the vector store (search, ANN);embedis the producer.services/ai/cacheis the embedding cache backend (keyed by model+input hash).services/ai/runtimehosts the model weights.
Features (v1 target)
- Text embeddings: BGE
multilingual default (pt+en), BGElargeen optin, E5, Instructor - Image embeddings: CLIP
ViTL/14 + SigLIP for cross-modal - Audio embeddings: CLAP for audio
text crossmodal - Intent registry: 6 default intents, per-tenant override
- Batch coalescing: 10ms window OR 64-input cap for throughput
- Cache integration: hits don't decrement quota
Primary couplings
| Consumer | Relationship |
|---|---|
services/ai/memory |
Episodic + semantic memory backed by embeddings |
services/ai/search |
Hybrid lexical+semantic search |
services/ai/recsys |
User+item embedding for similarity |
services/ai/kortex |
Knowledge-graph node + edge embeddings |
services/ai/agents |
Tool selection + memory retrieval |
services/ai/classify |
Zero-shot classification via embedding similarity |
services/ai/extract |
Embedding-based entity disambiguation |
services/ai/cache |
Embedding-result cache |
services/ai/runtime |
Local model serving |
infra/data/kdb-vector |
Native vector store target |
RFC and bootstrap
- RFC:
embed-RFC-001-foundations.kmd— *ccepted*20260509 - Bootstrap ticket:
services/ai/backlog/done/121-embed-bootstrap.md - API contract (normative):
api-v1.kmd+intents.toml— shipped 20260511 via#001(5 endpoints, intentovermodel design, 7 registered models across textimageaudio, error map) - Implementation tickets:
done/:001(API contract + intent registry),002(Go server skeleton + coalescer),003(quotas + cache + audit + admin API),004(split into #009#012),admin CLI),006(kembed007(HTTP cache adapter — consumes cache#008),008(superseded by #010),010(real tokenizers — WordPiece pureGo),05-11013(binary naming alignment) — all 2026pending/:005(image+audio CLIP/CLAP),009(ONNX runtime + LocalONNXWorker — CGO),011(GPU LXCs.embed.gpuprovisioning),012(weights + MTEB benchmark — blocked-by #009 + #011)
Recent changes
- *026
0511 (#013 ship)*— Binary naming aligned with cache precedent. Daemon binary:kembed→koder-embed(matchescmd/koder-embed/+ systemd unit). Worker:kembed-worker→koder-embed-worker. CLI:kembed-admin→kembed(now matches thebinaries-and-cli/naming.kmd"binary = aliases[0] || slug" rule). systemd units'ExecStartpaths updated to/usr/local/bin/koder-embed[-worker]. Zero behaviour change (pure rename); 65+ tests stay green. Breaking for any operator running pre-v0.0.7 packages on the same host —/usr/local/bin/kembednow refers to the CLI, not the daemon. - *026
0511 (#007 ship)*— HTTP cache adapter shipped.internal/cache/http_adapter.HTTPCacheconsumesservices/ai/cache#008public KV API (PUT/GET/DELETE /v1/cache/kv/{tenant}/{key}) with circuit breaker (5 consecutive 5xx/network errors → open for 30s, single probe recloses on success), percall timeout (default 1s), TTL via?ttl_seconds=(default 30 days, clamped serverside), defensive emptyBaseURL → NoOp. Wire format JSON{vector: [...], usage: {text_tokens, image_count, audio_seconds}}. 9 tests (HC1HC9) including bearer header propagation, URLencoding of|keys, 404=cleanmiss (no trip), trip+cooldown reopen, local Stats accumulation.[cache.backend] = "http"selects it;[cache.http]config section in TOML. - *026
0511 (#006 ship + #007 blocker)*—kembed-adminCLI shipped (cmd/kembed/,bin/kembed-admin). Subcommandsstats,quotas {list,set,delete},models. Mirrorskcachepattern but binary namedkembed-admin(notkembed) to avoid clash with the daemon binarykembedfrom#002— naming alignment tracked as#013. *rosssector discovery*stated as blocked-by until that lands.#007(HTTP cache adapter) was blocked becauseservices/ai/cacheexposes no public KV PUTGET — only admin + middleware Wrap. New ticket `servicesai/cache#008(public KV HTTP API) opened; embed#007` re - *026
0511 (#004 split + #010 ship)*—#004was bundling code + tokenizers + GPU LXC + weights + benchmarks; split into 4 subtickets (canonical WordPiece in pure Go (greedy longest#009ONNX runtime + LocalONNXWorker (CGO),#010real tokenizers (this commit),#011GPU LXCs.embed.gpuprovisioning,#012weights + MTEB benchmark).#010shipped:internal/tokenizer/with BERTprefix,per##continuations, CJK runepiece, lowercase toggle, maxchars fallback), Whitespace fallback,RegistrymodelID→Tokenizer index,[[tokenizers]]TOML config, handler swapsapproxTokens→tokenizerFor(model.ID).Count. Quota billing now charges in targetmodel tokens. SentencePiece deferred toby#009/#012.#008closed as superseded#010. +13 tests (T1-T12 unit + I11 integration). - *026
0511 (#003)*— Multitenant quotas + perinput cache + audit.internal/quota.Enforcerwith dayrolling UTC counters for texttokens / imagecount / audioseconds, pertenant override that shadows the default,atAllowedModels+BatchMaxSizechecks,Reserve/Refundclampzero.scoped key tuple).internal/cachewithNoOp+Memory(LRU bound, tenantinternal/auditslog wrapper. Admin APIGET/POST/DELETE /v1/embed/quotas[/{tenant}]+GET /v1/embed/stats(admingated). Cache hits skip quota by construction (handler reserves only the miss bundle). Estimatedvsactual reconciliation refunds overshoots postdispatch. +26 tests (Q1Q10, C1C6, I1-I10). TTL deferred to#007HTTP cache adapter. - *026
0511 (#002)*— Go server skeleton shipped (backend/cmd/koder-embed+backend/cmd/koder-embed-worker). Wire format: HTTPJSON server↔worker (`POST /v1workerembed); gRPC deferred. Coalescer: per-(modality, model_id) queue, 10 ms window OR 64-input cap. Stub worker uses xxh3-seeded deterministic vectors (real ONNX in#004`#005). Auth + identity context split intointernal/identctx/so handlers can read the principal without an auth↔handler import cycle. 24 tests pass (R1-R7,C1-C7,H1-H8,S1-S2).
Selfhostedfirst analysis (5 gates)
| Gate | Status | Notes |
|---|---|---|
| G1 Feature parity | pending | Skeleton phase; BGEE5CLIPCLAP cover textimage/audio self-hosted |
| G2 Performance | pending | Target BGE-large p50 < 50ms/batch=8 short sentences on RTX 4090 |
| G3 Stability | pending | Pre-MVP |
| G4 Capability | pending | Multimodal (text+image+audio) covered; specialized intents may proxy initially |
| G5 Critical-path readiness | pending | Pre-MVP; consolidating kortexsearchrecsys is the first concrete unblock |