Ai workflow

AI Workflow — Agent DAG / State Machine

  • *rea:*Intelligence
  • *ath:*services/ai/workflow
  • *ind:*DAG / statemachine orchestration for multiagent / multi-step AI flows
  • *tatus:*v0.0.1 — sector bootstrapping (20260509)

Role in the stack

workflow is the durable execution layer for AI flows. agents/ orchestrates a single agent loop; real production AI workloads need more — multistep pipelines that route, retry, branch, pause for humans, and survive crashes. Today every team builds this adhoc with brittle stateondisk hacks; no observability, no resumefromfailure. This sector consolidates the capability.

It is the Koder analog of LangGraph, Inngest, Temporal AI, and Restate. *istinct from services/foundation/bpm*— bpm handles BPMNstyle human/formdriven business process; workflow handles agentheavy execution where the actors are LLMs, tools, and sandboxed code. Crosssector composition allowed.

Boundary vs neighbors

  • services/foundation/bpm is the businessprocess sibling (humans + forms + approvals); workflow is the AIexecution sibling.
  • services/ai/agents is the single-agent loop — workflow steps invoke agents as actors.
  • services/ai/sandbox, services/ai/tools, services/ai/runtime, services/ai/gateway are step backends (code, tool, model calls).
  • services/ai/trace receives perrun + perstep spans.
  • infra/data/kdb-doc (state) and infra/data/kdb-blob (large outputs) are storage layers.

Features (v1 target)

  • YAML wire-format workflow definitions + Go typed step SDK
  • 9 step kinds: llm, tool, code, agent, subflow, human, branch, parallel, aggregate
  • CEL expressions for conditional edges
  • Durable runs (kdbdoc state, kdbblob payloads, lease-based worker assignment)
  • Crash recovery with idempotency keys
  • Per-step retry policies (exponential backoff, max attempts/duration)
  • Humanintheloop pauses with signalbased resume + Koder ID notification routing
  • Fanout (parallel/branch) + fanin (aggregate) + sub-workflow composition
  • kworkflow CLI (lint, render DOT, trigger)
  • Run-monitoring UI at workflow.koder.dev (graph + step inspect + signal form)
  • Pertenant concurrentstep quota
  • Trace span per run + child span per step attempt

Primary couplings

Producer Relationship
services/ai/agents Step actor inside workflows
services/ai/sandbox Code-step backend
services/ai/tools Tool-step schema lookup
services/ai/runtime, services/ai/gateway LLM-step backends
Consumer Relationship
services/ai/kode Multi-step conversation flows
products/dev/kortex Codereviewthen-apply
Product backends Domain workflows
services/ai/dataset, services/ai/training Pipeline orchestration

RFC and bootstrap

  • RFC: workflow-RFC-001-foundations.kmd — *ccepted*20260509
  • Bootstrap ticket: services/ai/backlog/done/136-workflow-bootstrap.md
  • Implementation tickets: services/ai/workflow/backlog/pending/{001..005}

Selfhostedfirst analysis (5 gates)

Gate Status Notes
G1 Feature parity pending YAML DSL + 9 step kinds covers LangGraph/Temporal AI surface
G2 Performance pending Targets: < 50ms step dispatch, > 1k concurrent runs per cluster
G3 Stability pending Pre-MVP; durability is core requirement
G4 Capability pending Full BPMN outofscope (delegated to foundation/bpm)
G5 Critical-path readiness pending Unblocks complex agent + multi-step product flows

Source: ../home/koder/dev/koder/meta/docs/stack/modules/ai-workflow.md