Trace (AI Tracing/Observability): foundations

accepted

Trace (AI Tracing/Observability) — foundations RFC

Status

*ccepted*— ratificada 20260509 (mesmo dia da abertura) como parte da onda piloto de bootstrap servicesai. Implementação iniciada em `servicesaitrace; tickets em servicesaitracebacklogpending/{001..005}`.

Summary

Trace de chainsagents em runtime — análogo LangSmithLangfuseHelicone traces. `eval faz aval offline, trace/` faz runtime.

Motivation

Agents complexos têm múltiplas calls (LLM + tool + LLM …). Sem trace: debug impossível, latency root cause invisível, regressões silenciosas. Industry standard hoje.

Scope

In

  • Span ingestion via OTel
  • Trace UI minimal
  • Query API
  • Integração com observe (Grafana)

Out (yet)

  • Full APM (escopo de observe)
  • Distributed tracing genérico

Initial design

Surfaces

  • backend/ — Go API OTel-compatible
  • app/web — UI minimal de traces

Key APIs

  • POST /v1/traces — ingest
  • GET /v1/traces/{id} — fetch
  • GET /v1/traces?filter= — query

Dependencies

  • infra/observe — Grafana/Tempo backend
  • infra/data/kdb-time — span storage
  • services/ai/prompt — link span → prompt version

Relation to existing sectors

  • Par runtime do eval/ (offline)
  • Cross-cutting consumido por todos sectors AI

Selfhostedfirst analysis (5 gates)

  • *1 Feature parity* zero
  • *2 Performance* N/A
  • *3 Stability* N/A
  • *4 Capability* OTel + Tempo open
  • *5 Critical-path readiness* bloqueia debug de agents/kode em prd

Open questions

  • Q1: Storage — Tempo direto ou wrapper?
  • Q2: Sample rate default?

Next steps

  1. Ratificar esta RFC (1 round de comments).
  2. Criar sector dir services/ai/trace/ com koder.toml, README.md, skeleton.
  3. Abrir tickets de implementação em services/ai/trace/backlog/pending/.
  4. Registrar em meta/docs/stack/registries/self-hosted-pairs.md se substituir externo.

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/trace-RFC-001-foundations.kmd