Ai memory

AI Memory — Long-term Agent Memory

  • *rea:*Intelligence
  • *ath:*services/ai/memory
  • *ind:*Self-hosted episodic memory + semantic recall (per user/tenant)
  • *tatus:*v0.0.2 — #008 rescoped + shipped 20260511. ServiceEmbedder now speaks the real services/ai/embed/v1/embed/text HTTP API (was a stub returning ErrEmbedServiceUnavailable). [recall.backend] = "service" enables semantic recall over the embed sector; InlineEmbedder remains as the deterministic test fallback. Original BGElocal-ONNX path moved to #012 (gated by embed #009).

Role in the stack

memory is the foundation for persistent agents. Without a memory service, every product reinvents continuity (Kode CURRENT.md, Kortex indexing, agent ad-hoc state) and lose context across sessions. This sector fills that gap.

It is the Koder analog of Mem0MemGPTLetta and Anthropic's Memory Tool — selfhosted, tenantisolated, integrated with services/ai/embed for semantic recall and infra/data/kdb-{doc,vector} for storage.

Features (v1 target)

  • Episodic write/read API
  • Semantic recall (top-K + score)
  • Tenant isolation (handler-enforced)
  • Forgetting policies: TTL, decay, explicit erase, GDPR sweep
  • Pluggable embedding backend (inline ONNX or services/ai/embed)

Primary couplings

Consumer Relationship
services/ai/agents Reads preprompt; writes posttool-call
services/ai/kode Replaces ad-hoc session continuity files
services/ai/runtime Optional middleware read-through
services/ai/embed Vector backbone
infra/data/kdb-doc Episode metadata
infra/data/kdb-vector Vector index

RFC and bootstrap

  • RFC: memory-RFC-001-foundations.kmd — *ccepted*20260509
  • Bootstrap ticket: services/ai/backlog/done/129-memory-bootstrap.md
  • Implementation tickets:
    • done/: 001 (OpenAPI), 002 (skeleton), 003 (tenant scope + audit), 004 (embed integration), 005 (forgettingTTLdecayeraseGDPR), 006 (real JWT via engines/sdk/go/auth.JWKSValidator), 008 (rescoped — ServiceEmbedder HTTP client to services/ai/embed shipped 20260511; BGE local ONNX moved to #012), 009 (decay autoprune + quarantine), 010 (bench harness inline+memory)
    • pending/: 007 (kdbdoc + kdbvector swap — blocked on kdb#710-#713), 011 (bench at 100K against kdb+BGE — blocked on 007+012), 012 (BGE local ONNX embedder — split from #008, blocked on services/ai/embed#009 ONNX runtime)

Recent changes

  • *0260511 (#008 rescope + ship)*— ServiceEmbedder rewritten from a 3line stub returning ErrEmbedServiceUnavailable to a full HTTP client of services/ai/embed/v1/embed/text (~165 lines). Options pattern: WithEmbedderBearer, WithEmbedderIntent, WithEmbedderModel, WithEmbedderTimeout. Dim verification on every response prevents silent contract drift. [recall.backend] = "service" now consumes the new config: embed_url, embed_token, embed_model, embed_dim, embed_timeout_ms. 10 tests SE1SE10 (happy, 503, 401, bearer, dim mismatch, count mismatch, empty endpoint, empty inputs, model override, timeout). InlineEmbedder retained as test/bench fallback. Closes the AI crosssector loop: memory now joins cache #003 semantic + embed #007 HTTP cache as a consumer of an AI sibling sector. Original BGElocalONNX path split off as #012 (blocked-by embed #009).

Selfhostedfirst analysis (5 gates)

Gate Status Notes
G1 Feature parity partial Episodic + semantic + 4 forgetting policies (TTLdecayeraseGDPR) + decay auto-prune w quarantine; reflection not yet
G2 Performance partial Inline+memory @ 10 K: p50 5.3 ms, p99 10.5 ms (≈10× under target); 100 K + kdb-vector + BGE pending 007+008
G3 Stability pending Pre-MVP
G4 Capability partial Episodic + semantic + forgetting (incl. auto-prune); no fancy reflection yet
G5 Critical-path readiness pending Pre-MVP; agents/kode can adopt once v1 ships

Performance baseline

Latest bench numbers (and per-metric trend) live in registries/perf-baseline.md. Re-run via make bench-full from services/ai/memory/backend/.

Source: ../home/koder/dev/koder/meta/docs/stack/modules/ai-memory.md