Id
Koder ID
- *rea:*Foundation
- *ath:*
services/foundation/id/engine(canonical; legacyplatform/symlinked per RFC-003) - *ind:*Identity provider (OIDC / OAuth2 / SAML) — custom Go microservice
- *ersion:*0.9.3
- *tatus:*In production at
id.koder.devsince 20260409
Role in the stack
Koder ID is the centralized identity layer for every Koder product. Every product that needs authentication redirects to https://id.koder.dev/oauth/v2/authorize, gets an OIDC token back, and uses it to authorize the user. Products never implement their own login flow — they consume tokens minted by ID.
The implementation is a fromscratch Go IAM (not Zitadel). It targets hyperscale tenant density (kdbnext substrate) and is the production identity provider. The codebase lives in services/foundation/id/platform/.
Primary couplings
| Consumer | Relationship |
|---|---|
foundation/flow |
Uses Koder ID as OIDC provider for Koder Flow git forge login |
foundation/kompass |
Organizations and tenants scoped via Koder ID |
apps/pass / foundation/pass |
Companion app for passkeys and identity management |
products/horizontal/chat/engine |
dev.koder.chat OIDC client — exchanges Koder ID tokens for Chat sessions |
products/vertical/lex/app |
dev.koder.lex OIDC client — PKCE flow for legal app auth |
| Every Koder SaaS product | OIDC client — redirects to ID for login |
Auth Flow — Handle Lookup
CreateFlow resolves the login identifier in two steps: first by email (GetUserByEmail), then by handle/username (GetUserByHandle) if the email lookup fails. Both strategies produce the same flow.UserID; on both failing the flow is created with an empty UserID and fails generically at the password step (timing-safe — existence is never revealed to the caller).
The by_handle secondary index on identity_users (IndexID 2) was added alongside the existing by_email index (IndexID 1). The gRPC method GetUserByHandle was added to IdentityService.
Interfaces
- OIDC Discovery:
https://id.koder.dev/.well-known/openid-configuration—claims_supportedincludestenant(added 20260518, engine#069) for RP-side tenant validation - Authorization:
GET /oauth/v2/authorize - Token:
POST /oauth/v2/token— id_token carriestenantclaim matchingAccessTokenClaims.TenantID - UserInfo:
GET /api/v1/me— returnssub,email,full_name,preferred_username - Admin gRPC:
RegisterClientRequestaccepts optionaldesired_client_id(field 7) for firstparty clients with stable reversedomain IDs - Migration endpoint:
POST /v1/admin/migrate/oauth-client(requiresKODER_ID_MIGRATION_MODE=true) — registers clients with specific IDs - Admin REST + CLI surface fully documented in
engine/docs/rfcs/RFC-007-admin-service-and-cli.md(revised 20260518 to reflect shipped reality from engine#073)
Selfconformance to oauthflow.kmd (R1.E1)
Koder ID's own administration UIs (engine/account-ui/, engine/admin-ui/) ride a cookie-session direct from the engine — they do NOT perform OAuth round-trips against the same issuer. This is the *1.E1 exception*ratified 20260518 in specs/auth/oauth-flow.kmd and tracked in registries/koder-id-auth-coverage.md as SKIP (R1.E1). External RPs (Flow, Kall, Dek, …) remain subject to the full R1+R6 + T-suite.
GeoIP enrichment (tickets 075083084)
pkg/geoip ships a Resolver interface with two implementations: NoOpResolver (private/unknown short-circuit) and MaxMindResolver (reads GeoLite2City.mmdb via `github.comoschwaldmaxminddbgolang).
Both auth and sso services consume pkg/geoip — moved out of
servicesauthinternal/geoip` into the shared package in #084 to anticipate the ASN follow-ups (#077/#085) and other future consumers.
Wiring: both services/auth/cmd/main.go and services/sso/cmd/main.go read env KODER_ID_GEOIP_DB (default empty → NoOpResolver) and pass the resolver to AuthService.SetGeoResolver / SSOService.SetGeoResolver. Schema columns shipped on auth_events (migrations 1723) and 14): country code + name + city + lat + lon + source + lookupat. The idtoken discovery + tenant claim are unaffected — these fields live on persisted events/sessions only.sso_sessions (migrations 8
ASN / network-type enrichment (tickets 077/085)
pkg/asn mirrors the pkg/geoip design: Resolver interface + NoOpResolver + MaxMindResolver (GeoLite2ASN / GeoIP2ISP). The resolver layers a curated *atacenter ASN*table (pkg/asn/datacenter.go — AWSGCPAzureDOHetznerOVHLinodeVultr OracleAlibabaIBMTencentCloudflareFastlyScalewayContaboLeaseweb) and a curated *PN provider ASN*table (NordMullvadExpressProton Surfshark/IPXO) on top of the raw lookup. The TorExitTracker pulls https://check.torproject.org/torbulkexitlist once a day and pushes the set into the resolver via SetTorExits (override before lookup).
Wiring: services/auth/cmd/main.go reads env KODER_ID_ASN_DB to load the .mmdb. KODER_ID_TOR_LIST=1 opt-in starts the refresh goroutine. Schema columns shipped on auth_events (migrations 24-27): asn, asnorg, networktype, asnlookupat. network_type is a closed taxonomy: unknownprivateresidentialmobiledatacentervpntor.
Risk scoring (per policies/multi-tenant-by-default.kmd + future #086) will compose ParsedASN.RiskWeight() (residential0, vpn20, datacenter50, tor80) with geoanomaly and devicemismatch signals.
Personal Access Tokens (#104)
PAT issuance lands as pat_tokens (migrations 38-40) + PATRepository + PATService (IssueListRevokeValidate) + `v1mepats HTTP
endpoints + koderidcli pat {create,list,revoke}. Tokens are
formatted as kpat_<43 base64 chars> (32 bytes entropy), stored as
SHA-256 hex with a unique index. Scope validation at issuance gates
against pkg/scopes.WellKnownPATScopes (read:usage` Wave 1, more as consumers land). Plaintext is returned exactly once (copy-now UX in CLI/UI). The Flutter Settings UI lives in follow-up #106.
Admin erasure-replay endpoint (#098)
POST /admin/v1/erasure-replay triggers ErasureService.ReplayErasures in a goroutine and returns 202 + job_id. Companion to the env-gated boot replay (KODER_ID_ERASURE_REPLAY_ON_BOOT=1 from #094); same loopback-only posture as the rest of admin REST (LXC firewall is the enforcement layer until admin auth ratifies). Optional body {"reason": "post-restore 2026-05-14"} for audit.
Perservice observability (#102 closed + RFC015)
Every sub-service exposes /health + /metrics via the shared pkg/observability helper (Wave 1 ratified 20260516, Wave 2 covered 15 subservices). `RFC015-observability.md (this batch)
codifies the wiring pattern, response shape, and rollout map; the
helper gains CheckKDB/CheckRedis/CheckPinger/CheckFunc
dependency-check constructors so per-service main.go can wire
realistic health checks in one line. Auth service wired as exemplar
(observability.CheckKDB("kdb", kdbClient)). RED-style histograms
+ gRPC interceptor tracked in engine#107; admin metrics aggregator
update in engine#108`.
*ave 3b (20260520, #113 + #114 closed):*auth + identity cmd/main.go now wrap the parent HTTP mux with observability.WrapHandler("", handler) so every route emits koder_id_http_request_duration_seconds{handler=<URL.Path>,method,status,quantile} samples. Their grpc.NewServer is constructed with grpc.UnaryInterceptor(observability.UnaryServerInterceptor()), which records koder_id_grpc_server_handled_seconds{grpc_service,grpc_method,grpc_code,quantile} into a sibling registry exposed by the same /metrics endpoint. Reservoir is capped at 1024 samples per (handler,method,status) and per (grpcservice,grpcmethod,grpc_code) key — bounded cardinality. The interceptor's self-conformance test (grpc_interceptor_test.go) drives an in-process gRPC server through the standard health probe to assert sample emission for both OK and PermissionDenied codes.
OAuth Device Authorization Grant (RFC 8628, #116)
Headless CLI / AI / CI authentication landed 20260520 (#116 phases 12, 45, 7-partial — phases 3 + 6 open):
- *tart endpoint*
POST /oauth/v2/device_authorizationmints a32
byte URLsafedevice_code+ 8char `XXXXXXXXuser_codefrom the BCP-47-safe alphabet (ABCDEFGHJKLMNPQRSTUVWXYZ23456789` — no 0O1IL). Default lifetime 600s; recommended poll interval 5s. - *torage*lives in
oauth_device_authorizations(kdb1 table)/
DeviceAuthorizationRowproto (kdbnext). The kdbnext repo uses Scan + filter foruser_codelookups (bounded table; index added later if perf requires it). - *oken endpoint*
POST /oauth/v2/tokenaccepts bothgrant_type=urn:ietf:params:oauth:grant-type:device_codeand the short aliasdevice_code. Implements the §3.5 polling protocol inservice.ExchangeDeviceCode:authorization_pending/slow_down(backtoback polls underinterval/2) /access_denied/expired_token/invalid_grant(client_id mismatch). On success: mints access + id_token (15min, no refresh yet) and *eletes the row*for single-use enforcement. - *IDC discovery*advertises the new grant in
grant_types_supportedand exposesdevice_authorization_endpointper RFC 8628 §4. - *erification page*
GET/POST /oauth/v2/deviceis a vanillaHTML form (no template
engine dependency) that prefills the usercode from `?usercode=(matchesverificationuricomplete) and acceptsemail + password + decisionin a single submit. Allow →service.AuthorizeDevice; Deny →service.DenyDevice`; surfaces friendly errors for unknown / expired / MFA-locked branches. Multi-step "if already logged in" flow deferred until the oauth service gains session-cookie detection. - *ub CLI*(
products/dev/hub/cli/main.go::cmdLogin) now talksto
https://id.koder.dev/oauth/v2/device_authorizationdirectly (form-encoded, no Authorization header), polls/oauth/v2/tokenwith properslow_down/expired_token/access_denied/invalid_granthandling, and parses the returnedid_tokenJWT to populatecfg.UserID(sub) +cfg.Username(name → email fallback). Override viaKODER_ID_URLenv for self-hosted Koder ID instances. - *eed client
khub-cli*is registered as a public clientwith
device_code + refresh_tokengrants viaengine/tools/seed-clients/main.go(the catalog now supports per-entrygrantTypesoverrides). - *-suite + handler tests*
services/oauth/internal/service/device_grant_test.go—T1-T8 (start fields, alphabet, pending, slow_down, success+delete, denied, expired, single-use) + client_id mismatch + unauthorized_client. 10/10 PASS.
services/oauth/internal/handler/device_grant_verification_test.go— 7 tests on the verification page (GET render, GET pre-fill, POST missingunknownexpired user_code, no
authclient, methodnotallowed). 7/7 PASS.
OAuth client_credentials grant — service accounts (#115)
Machinetomachine auth path for CI / /k-ship / AI agents that need to publish without a human in the loop. The client_credentials grant is OAuth 2.0 §4.4; this section documents the Koder ID-specific contracts that #115 hardened.
- *onfidential clients only.*
tokenClientCredentialsrejectspublic clients up front — a public client without a registered secret cannot prove possession of the credential, so granting it a JWT would defeat the grant's purpose.
- *onstant-time secret compare*via
subtle.ConstantTimeCompare. - *rant-list enforcement.*Client must list
client_credentialsin
client.GrantTypes; otherwise the endpoint returnsunauthorized_client(400). Catches the case where a client was provisioned for one grant and is being abused with another. - *cope subset check.*Requested scopes must all be present in
client.Scopes; an unregistered scope yieldsinvalid_scope. An omittedscopeparameter grants the full registered set. - *ubject convention.*JWTs carry
sub=service-account:<client_id>so downstream middleware (notably the Hub's
requireDev) can recognize service-account tokens without a schema change. The usertablerow materialization (kind='service'onkoder_users) remains an open slice; the prefix convention is enough for the immediate publishing path. - *o refresh token.*Clients re-mint when the access token
expires (1h TTL).
- *ub CLI*(
products/dev/hub/cli) recognizes three env vars inprecedence order:
KHUB_CLIENT_CREDENTIALS(file withclient_id:client_secret),KHUB_CLIENT_ID + KHUB_CLIENT_SECRET(paired vars), and falls back to the cached human token fromkhub loginwhen none are set.KODER_ID_URLoverrides the issuer for self-hosted instances. - *est suite*
services/oauth/internal/handler/client_credentials_test.go— T1 success roundtrip + JWT shape, T2 wrong secret, public-client reject, unauthorizedclient guard, invalidscope, omitted-scope defaults, unknown client, form-body credentials. 8/8 PASS.
- *perator runbook*
meta/context/runbooks/hub-publishing-service-account.kmd.
Open followups: serviceaccount materialization on koder_users (kind='service' column + auto-Apply to developer table) and automated 90-day rotation cron.
Multi-tenant isolation (RLS, tickets 100 + 105)
*hase 1*(#100, 20260518) shipped Postgres-only RLS DDL helpers in pkg/kdb (AddPostgresOnly, BackendKind, WithTenantTx) and CREATE POLICY migrations for the admin service's tenant-scoped tables (api_keys, audit_log). All behind KODER_ID_RLS_ENABLE=true.
*hase 2 slice 1*(#105, this batch) wires the admin service to actually serve requests under RLS:
pkg/kdb.ContextWithStore+StoreFromContextpropagate therequest
scoped txstore via context.WithTenantTxstashes the tx-store automatically so unmodified callsites pick it up.apikey_repo+audit_reporead every DB op viakdb.StoreFromContext(ctx, r.client.Raw())— tx-aware by default, fallback to the longlived store when no ctxstore is set.services/admin/internal/handler/rls_middleware.gowraps everyadmin HTTP request in
WithTenantTxwhen the env gate is on, using the request'sX-Tenant-ID. Requests without a tenant header (systemadmin ops like listall-tenants) skip the wrap.- Wired in
services/admin/cmd/main.gobetween mux and ListenAndServe.
*hase 2 slice 2*(#105, this batch) extends the same pattern to the auth service:
services/auth/internal/repository/migrations_rls.goregistersRLS for
auth_events+auth_flows+auth_lockouts(the 3 hot- path tables) at versions 12001201, 12101211, 1220/1221.event_repo,flow_repo,lockout_reporoute every DB op viakdb.StoreFromContext(ctx, r.client.Raw()).services/auth/internal/handler/rls_middleware.gomirrors theadmin helper;
services/auth/cmd/main.gowires both the RLS migrations (behind the env gate) and the middleware.
*hase 2 slice 3*(#105, this batch) covers oauth + session:
- oauth:
client_repo,authcode_repo,consent_repo,key_reporouted via
kdb.StoreFromContext; RLS forclients,authorization_codes,consents,signing_keysat versions 13001301, 13101311, 13201321, 13301331. - session:
session_reporouted; RLS forsessions,access_tokensat 14001401, 14101411.
- Both services ship a mirror
rls_middleware.goand wire it inmain.gobehindKODER_ID_RLS_ENABLE.
*hase 2 slice 4*(#105, this batch) — narrower than originally scoped. Auditing the remaining sub-services revealed that *udit, behavior, device, access, qr, sync, telephony, webhooks, and sso use org_id / user_id / no isolation column — NOT tenant_id* The RLS template from #100 / #105 slice 13 doesn't fit them asis. Captured in engine#109: each needs categorization (retroactive tenantscope, orgscope with a new WithOrgTx helper, user-scope with WHEREclauseonly, or genuinely system-wide) before RLS adoption proceeds.
The slice 4 work that did ship: *aml*(legitimately tenant-scoped, 2 tables) — sprepo + idpkeyrepo routed via kdb.StoreFromContext; RLS migrations 1500-1511 for saml_sps + saml_idp_keys; rls_middleware.go mirror; main.go wire.
*hase 2 slice 5*(#105, this batch) — finishes the truly tenantscoped subservices:
- *uth completion* remaining 6 repos (magiclink, socialprovider,
socialstate, workspacesso, ldap, identity) routed via
StoreFromContext;migrations_rls.goextended to versions 1230-1289 covering magiclinks / socialproviders / socialloginstates / workspacessoconfigs / ldap_servers / user_identities. - *dentity*(biggest single sub
service — 15 tenantscoped tablesout of 32 total): 17 repos refactored,
migrations_rls.go(new) registers RLS at versions 1600-1696 for users, credentials, mfadevices, verificationtokens, groups, group_members, scimtokens, subtenantrole_bindings, invites, roles, roleassignments, webhooks, webhookdeliveries, pat_tokens, erasurerequests. `rlsmiddleware.go` mirror + main.go wire.
*LS adoption count after slice 5* admin + auth + oauth + session + saml + identity = */6 of the confirmed tenantscoped subservices (100%)*— every sub-service that genuinely uses tenant_id is now ctx-aware and gated behind KODER_ID_RLS_ENABLE.
*hase 2 infra extension*(batch 11, 20260518) — pkg/kdb now ships WithOrgTx + WithUserTx helpers (mirrors of WithTenantTx, setting app.current_org_id / app.current_user_id GUCs instead). #109 carries the schema audit + classification of the remaining 8 subservices: audit + access = orgscoped; behavior + qr + sync + telephony + webhooks + sso = userscoped (with a handful of system wide tables — rate_limits, sso_clients, sso_revocations — that don't need RLS at all). Mixedscope subservices (device) split per-table or use composite predicates.
*hase 2 slice 6*(#105, batch 12) — first adoption of the new scopes:
- *udit*(org-scoped exemplar): 9 repository callsites routed via
kdb.StoreFromContext(ctx, r.db);migrations_rls.goregisters RLS foraudit_events,mfa_policies,org_settingswith theorg_id = current_setting('app.current_org_id')predicate (versions 1700-1721);rls_middleware.gowraps inWithOrgTxusingX-Org-IDheader; main.go wires both gated by env. - *r*(user-scoped exemplar): 4 callsites routed; RLS for
qr_sessionswithuser_idpredicate (1800-1801);WithUserTxmiddleware extractsX-User-ID(already injected by the gateway for authenticated requests).
Remaining work: behavior, sync, telephony, webhooks (user-scoped, mechanical now), access (org-scoped), device (mixed — split per table). Crosstenant + crossorg + cross-user T1 integration tests + default-on cutover remain as gating items. Crosstenant T1 integration test against a real Postgres + defaulton cutover after staging burn-in remain as gating items before flipping the env-gate default.
*hase 2 slice 7*(#105, batch 13 / 20260520) — wires the *ebhooks*subservice (userscoped, mirrors the qr exemplar):
- 18 repository callsites routed via
kdb.StoreFromContext(ctx, r.db). migrations_rls.goregisters RLS forwebhooks,webhook_deliveries, andapi_keyswith theuser_id = current_setting('app.current_user_id')predicate (versions 1900-1921).rls_middleware.gowraps inWithUserTxusingX-User-ID.- main.go gates both behind
KODER_ID_RLS_ENABLE.
*FC016 ratified in the same commit*(`docsrfcsRFC016rowlevel-security.md).
Codifies the two-layer model (Postgres RLS + middleware), the
three-scope taxonomy (tenant / org / user), the three-file
per-sub-service layout, the migration-version namespace table
(100-block per sub-service), the koderspecaudit multi-tenancy -trict` audithook contract, and the fourcriterion default-on cutover plan.
*LS adoption count after slice 7* admin + auth + oauth + session + saml + identity + sso = tenantscoped (7); audit = orgscoped (1); qr + webhooks = userscoped (2). *0 subservices wired.*Remaining mechanical work: behavior, sync, telephony (user-scoped), access (org-scoped), device (mixed) — each follows an existing exemplar.
*hase 2 slice 8*(#105, batch 14 / 20260520) — wires the *ccess*subservice (orgscoped, mirrors the audit exemplar):
- 8 repository callsites routed via
kdb.StoreFromContext(ctx, r.db). migrations_rls.goregisters RLS foraccess_policiesandaccess_evaluations(versions 23002311 per RFC016 §5's persubservice 100-block). Policy predicate admits bothorg_id = current_setting('app.current_org_id', true)rows ANDorg_id = ''global rows so system-wide access policies remain visible to every org.rls_middleware.gowraps inWithOrgTxusingX-Org-ID.- main.go gates both behind
KODER_ID_RLS_ENABLE.
*LS adoption count after slice 8* 7 tenant + 2 org + 2 user = *1 sub-services wired.*Remaining mechanical work: behavior, sync, telephony (user-scoped), device (mixed — split per table).
*hase 2 slices 911*(#105, batch 15 / 202605-20) — bundles three userscoped subservices in one lote (mechanical, all mirror the qr/webhooks exemplar):
- *ehavior*(slice 9, versions 2000-2099): RLS on
behavior_events,user_risk_profiles,anomaly_records. 9 repository callsites routed. - *ync*(slice 10, versions 2100-2199): RLS on
sync_versions,sync_entries,sync_devices,sync_snapshots. 16 callsites routed. - *elephony*(slice 11, versions 2200-2299): RLS on
otp_requests,verified_phones.rate_limitsintentionally excluded — composite key-only table withoutuser_id, global by design (wrapping it in WithUserTx would silently drop all rows). 13 callsites routed.
*LS adoption count after slice 11* 7 tenant + 2 org + 5 user = *4 sub-services wired.*Only *evice*(mixed — split per table) remains in the mechanical-wiring queue.
Federated Identity (Social Login)
Social login is implemented inside services/auth/ (shares the auth service, reuses IdentityClient/SessionClient).
- *roviders built
in* Google, GitHub, Microsoft, GitLab; any OIDCgeneric provider can be configured per tenant. - *low*
GET /v1/auth/social/login/{provider}/start→ redirect to provider →GET /v1/auth/social/login/{provider}/callback→ session tokens. - *ccount linking* on first social login, links by existing email or creates a new account. Provider profile picture imported to user avatar.
- *nlink*
DELETE /v1/auth/identities/{user_id}/{provider}— denied if it's the last auth method and no password credential exists. - *I* Login and signup pages show social buttons when providers are configured.
account.htmltemplate shows linked identities with link/unlink controls. - *torage*
user_identitiestable (provider, provideruserid, userid snapshot). `socialprovidersper tenant.socialloginstates` for CSRF (TTL 10 min).
Proto
services/foundation/id/platform/proto/koder/id/admin/v1/admin.protoRegisterClientRequest.desired_client_id(string, field 7) — optional stable ID for first-party clients
Registered OIDC Clients
| Client ID | Product | Notes |
|---|---|---|
dev.koder.chat |
Koder Chat | Registered 2026 |
dev.koder.lex |
Koder Lex | Registered 2026 |
Related
services/foundation/id/platform/— production Go sourceapps/pass/foundation/pass— identity companion app