AI Workflow — Agent DAG / State Machine
- *rea:*Intelligence
- *ath:*
services/ai/workflow
- *ind:*DAG / state
machine orchestration for multiagent / multi-step AI flows
- *tatus:*v0.0.1 — sector bootstrapping (2026
0509)
Role in the stack
workflow is the durable execution layer for AI flows. agents/ orchestrates a single agent loop; real production AI workloads need more — multistep pipelines that route, retry, branch, pause for humans, and survive crashes. Today every team builds this adhoc with brittle stateondisk hacks; no observability, no resumefromfailure. This sector consolidates the capability.
It is the Koder analog of LangGraph, Inngest, Temporal AI, and Restate. *istinct from services/foundation/bpm*— bpm handles BPMNstyle human/formdriven business process; workflow handles agentheavy execution where the actors are LLMs, tools, and sandboxed code. Crosssector composition allowed.
Boundary vs neighbors
services/foundation/bpm is the businessprocess sibling (humans + forms + approvals); workflow is the AIexecution sibling.
services/ai/agents is the single-agent loop — workflow steps invoke agents as actors.
services/ai/sandbox, services/ai/tools, services/ai/runtime, services/ai/gateway are step backends (code, tool, model calls).
services/ai/trace receives perrun + perstep spans.
infra/data/kdb-doc (state) and infra/data/kdb-blob (large outputs) are storage layers.
Features (v1 target)
- YAML wire-format workflow definitions + Go typed step SDK
- 9 step kinds: llm, tool, code, agent, subflow, human, branch, parallel, aggregate
- CEL expressions for conditional edges
- Durable runs (kdb
doc state, kdbblob payloads, lease-based worker assignment)
- Crash recovery with idempotency keys
- Per-step retry policies (exponential backoff, max attempts/duration)
- Human
intheloop pauses with signalbased resume + Koder ID notification routing
- Fan
out (parallel/branch) + fanin (aggregate) + sub-workflow composition
kworkflow CLI (lint, render DOT, trigger)
- Run-monitoring UI at
workflow.koder.dev (graph + step inspect + signal form)
- Per
tenant concurrentstep quota
- Trace span per run + child span per step attempt
Primary couplings
| Producer |
Relationship |
services/ai/agents |
Step actor inside workflows |
services/ai/sandbox |
Code-step backend |
services/ai/tools |
Tool-step schema lookup |
services/ai/runtime, services/ai/gateway |
LLM-step backends |
| Consumer |
Relationship |
services/ai/kode |
Multi-step conversation flows |
products/dev/kortex |
Codereviewthen-apply |
| Product backends |
Domain workflows |
services/ai/dataset, services/ai/training |
Pipeline orchestration |
RFC and bootstrap
- RFC:
workflow-RFC-001-foundations.kmd — *ccepted*20260509
- Bootstrap ticket:
services/ai/backlog/done/136-workflow-bootstrap.md
- Implementation tickets:
services/ai/workflow/backlog/pending/{001..005}
Selfhostedfirst analysis (5 gates)
| Gate |
Status |
Notes |
| G1 Feature parity |
pending |
YAML DSL + 9 step kinds covers LangGraph/Temporal AI surface |
| G2 Performance |
pending |
Targets: < 50ms step dispatch, > 1k concurrent runs per cluster |
| G3 Stability |
pending |
Pre-MVP; durability is core requirement |
| G4 Capability |
pending |
Full BPMN outofscope (delegated to foundation/bpm) |
| G5 Critical-path readiness |
pending |
Unblocks complex agent + multi-step product flows |