Multi-tenancy contract

Spec — Multi-tenancy contract

Mecanismos concretos para implementar policies/multi-tenant-by-default.kmd. Spec é normativo: todo módulo da Koder Stack que armazena dadodeusuário deve passar nos checks T1–T9 abaixo.

Identity model

koder_user_id   BIGINT NOT NULL    -- FK to services/foundation/id.user(id)
workspace_id    BIGINT             -- nullable; FK to id.workspace(id)

koder_user_id é canonical PK partial em toda tabela com PII. workspace_id amplifica scope: nullable significa "pessoal"; non-null significa "do workspace, todos os members veem por membership".

Tabela canonical de membership (Koder ID):

CREATE TABLE workspace_member (
  workspace_id BIGINT NOT NULL,
  koder_user_id BIGINT NOT NULL,
  role TEXT NOT NULL,             -- 'owner' | 'admin' | 'member' | 'viewer'
  joined_unix BIGINT NOT NULL,
  PRIMARY KEY (workspace_id, koder_user_id)
);

Qualquer query cross-workspace passa por essa tabela. *ão cacheia membership client-side*— é hot path do auth, fica em services/foundation/id com cache de 60s server-side.

Scope model — 3 categories + system-wide

The koder_user_id + workspace_id identity model above is the *enant scope*— every product table that holds PII goes here. Foundation services (auth, audit, behavior, device, telephony, SSO …) also need isolation, but along axes that are not the product-tenant axis. The Koder Stack therefore defines three isolation scopes plus an explicit no-RLS bucket:

Scope Discriminator column GUC setting Helper When to use
*enant-scoped* koder_user_id + optional workspace_id app.current_tenant_id kdb.WithTenantTx Default for product data with PII (Dek recordings, Drive blobs, Talk messages, Kruze bookmarks, …)
*rg-scoped* org_id app.current_org_id kdb.WithOrgTx Foundation tables owned by an org: audit events, MFA policies, access policies, org settings
*ser-scoped* user_id (=koder_user_id at the foundation layer) app.current_user_id kdb.WithUserTx Foundation tables owned by a single user: behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooksapi keys, SSO grantssessions
*ystem-wide* (none) (none) (none — admin auth) Tables shared across all tenants by design: rate-limit buckets, SSO client registry, SSO revocation list. Access is governed by admin auth, not RLS

Selection rules:

  1. *enant-scoped is the default for product data.*Tables that

    surface in a product UI carry koder_user_id.

  2. *rg-scoped applies when the primary owner is an org* not a

    user — even if individual user IDs are recorded. Audit events for an org are visible to every admin of that org regardless of who triggered the event.

  3. *serscoped applies when the row is peruser inside a

    foundation service*and there is no org or product tenant in play. Behavior risk profiles, QR pairing sessions, sync state.

  4. *ystem-wide is the explicit exception.*It must be justified

    in the table's migration comment (e.g. "global rate-limit counters; no per-tenant cardinality"). Drift toward this scope without justification is a spec violation.

Mixed-scope tables (e.g. device carrying both user_id and org_id columns) either split into two migrations_rls files *r*use a composite policy:

USING (org_id   = current_setting('app.current_org_id',   true)
    OR user_id = current_setting('app.current_user_id', true))

The composite form is acceptable when both isolation axes apply to the same row simultaneously; otherwise prefer the split.

PAT scope grammar

PAT (Personal Access Token) emitido pelo Koder ID carrega scope. Sintaxe canonical (herdado do Flow RFC-003 credentials/backups):

<verb>:<resource>[:<modifier>]

verbs:    read | write | admin
resources: user | workspace | repo | credentials | backups | …
modifier:  optional, e.g. "self" or "<id>"

Exemplos:

  • read:user — ler perfil próprio
  • write:credentials — escrever credentials no scope que o PAT herdou
  • read:workspace:<id> — ler dados específicos de um workspace
  • admin:user — privileged self-management

PAT é *coped to a single koderuserid*(o owner). Workspace access é resolvido via workspace_member na hora do request, não via PAT scope. PATs *ão*atravessam tenants.

RLS template (Postgres / kdb-next)

Toda tabela com PII tem RLS. Helper migration:

-- 1. Schema com tenant fields
CREATE TABLE my_resource (
  id BIGSERIAL,
  koder_user_id BIGINT NOT NULL REFERENCES koder_id.user(id),
  workspace_id BIGINT REFERENCES koder_id.workspace(id),
  payload JSONB NOT NULL,
  created_unix BIGINT NOT NULL DEFAULT extract(epoch from now()),
  PRIMARY KEY (koder_user_id, id)
);

-- 2. Index on tenant + recent
CREATE INDEX ix_my_resource_user_recent
  ON my_resource (koder_user_id, created_unix DESC);

-- 3. RLS enable + policy
ALTER TABLE my_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_owner ON my_resource
  USING (koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT);

CREATE POLICY p_workspace_member ON my_resource
  USING (workspace_id IS NOT NULL
         AND EXISTS (
           SELECT 1 FROM koder_id.workspace_member m
           WHERE m.workspace_id = my_resource.workspace_id
             AND m.koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT
         ));

Connection setup (per request):

// Tenant-scoped (default for product data with PII)
err := kdb.WithTenantTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

The helper opens a transaction, runs SELECT set_config('app.current_tenant_id', $1, true), and dispatches the callback through a *sql.Tx. Empty scope ID is fail-closed (returns an error rather than running unscoped).

*ypass admin path*(rare): an explicit RESET app.current_tenant_id is privilege of the role koder_admin only. Audit log obrigatório em qualquer reset.

Org-scoped variant

For foundation tables owned by an org (audit events, MFA policies, access policies):

ALTER TABLE org_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_org ON org_resource
  USING (org_id = current_setting('app.current_org_id', true)::BIGINT);
err := kdb.WithOrgTx(ctx, store, auth.OrgID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

User-scoped variant

For foundation tables owned by a single user (behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooks):

ALTER TABLE user_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_user ON user_resource
  USING (user_id = current_setting('app.current_user_id', true)::BIGINT);
err := kdb.WithUserTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
    return repo.List(ctx)
})

All three helpers share the same body in services/foundation/id/engine/pkg/kdb/rls.go::withScopeTx — failclosed on empty scope, ctxstash propagated to nested calls, test coverage in rls_scope_test.go.

KV / cache template (Redis-style)

Toda key tem prefixo de tenant:

<namespace>:<tenant-key>:<resource-key>

examples:
  rate_limit:user:<uid>:5h_window     → counter
  session:user:<uid>:<session_id>     → JSON
  presence:workspace:<wid>:<uid>      → boolean

Helper:

func TenantKey(uid int64, parts ...string) string {
    return fmt.Sprintf("user:%d:%s", uid, strings.Join(parts, ":"))
}

Key sem prefixo → *ug crítico*(cross-tenant leak via cache).

S3 / object storage template

Path:

<bucket>/<koder_user_id>/<workspace_id|"personal">/<resource_id>/<file>

IAM / signedURL: perrequest, restricted to the tenant prefix.

Test contract — T1..T12

Todo módulo multi-tenant tem suite que cobre:

Tenant-scoped baseline (T1..T9 — every product table)

ID Test Description
*1* Auth required GET /resource sem PAT → 401
*2* Self read A's PAT, GET /my-resource → A's data only
*3* Cross-tenant read denied A's PAT, GET /resource/<B's id> → *04 (not 403)*
*4* Cross-tenant write denied A's PAT, POST /resource setting koderuserid=B → 400 or silent override to A
*5* Workspace member read A in workspace W, GET /resource?workspace=W → all members' data
*6* Workspace non-member read A not in W, GET /resource?workspace=W → 404
*7* RLS isolation Direct DB query without SET LOCAL app.current_tenant_id → returns nothing (or error)
*8* Index efficiency EXPLAIN of A's read uses tenant index, not seq scan
*9* Tenant deletion When user A is deleted, all WHERE koder_user_id = A rows are removed within retention window

Org/user-scoped extension (T10..T12 — foundation services)

Required for any table whose RLS policy is keyed by org_id or user_id (not koder_user_id). Sibling of T3+T7 but along the secondary axis.

ID Test Description
*10* Cross-org read denied Admin of org A, GET /audit-events from org B context → *04 (not 403)* Mirrors T3 for app.current_org_id.
*11* Crossuser read denied (userscoped foundation) User A's session, GET /qr-sessions/<B's id> → 404. Mirrors T3 for app.current_user_id.
*12* Scope isolation in RLS Direct DB query without the appropriate SET LOCAL app.current_org_id / app.current_user_id → returns nothing (or error). Mirrors T7 for the secondary axis.

Systemwide tables (ratelimit buckets, SSO client registry, SSO revocation list) intentionally have no T10..T12 row — they are not scope-protected by RLS and instead rely on admin auth. Their migration MUST carry a -- system-wide: <justification> comment that audit will read.

Cada implementação ships com tests/multi-tenant/T1..T9_test.go (produto) ou tests/multi-tenant/T10..T12_test.go (foundation org/user-scoped). Audit: PR sem o subconjunto aplicável verde *loqueia merge*(ver policies/regression-tests.kmd co-enforcement).

Error model

Cenário HTTP gRPC Body
Sem auth 401 UNAUTHENTICATED {"error": "auth required"}
Token inválido 401 UNAUTHENTICATED {"error": "invalid token"}
Recurso não-existente OU de outro tenant 404 NOT_FOUND {"error": "not found"}
Recurso existe mas role insuficiente (workspace member sem write) 403 PERMISSION_DENIED {"error": "insufficient role"}
Bad input 400 INVALID_ARGUMENT {"error": "<details>"}
Server error 500 INTERNAL {"error": "internal"}

Crítico: *04, não 403* em cross-tenant cases. 403 vaza existência ("este id existe mas você não pode ler" → atacante sabe que existe).

Audit log

Toda operação *utating*que toca PII grava audit row:

CREATE TABLE audit_log (
  id BIGSERIAL PRIMARY KEY,
  actor_user_id BIGINT NOT NULL,    -- the PAT owner
  target_user_id BIGINT,            -- tenant being acted on (often = actor)
  action TEXT NOT NULL,             -- 'create' | 'update' | 'delete' | 'read_admin'
  resource TEXT NOT NULL,           -- 'credentials' | 'usage' | …
  resource_id BIGINT,
  payload JSONB,
  created_unix BIGINT NOT NULL
);
CREATE INDEX ix_audit_actor ON audit_log (actor_user_id, created_unix DESC);
CREATE INDEX ix_audit_target ON audit_log (target_user_id, created_unix DESC);

Audit row é *est-effort write*(failure logs but doesn't abort the action; ver flow#056b policy).

Sharding model (futuro, hyperscale)

Quando uma tabela passar de ~10M rows ou ~100K tenants ativos:

  • Range-shard por koder_user_id (TiKV PD faz isso automático em

    kdb-next)

  • Hash-shard via hash(koder_user_id) % N (alternativa em Postgres

    com Citus / pg_partman)

  • Geoshard por região do tenant (multiregion future, ver

    stackRFC001 §faseamento)

Não pré-otimizar. Trigger: monitoring sinalizar p99 latência > 50ms ou table size > 1TB.

Edge cases

User rename / handle change

koder_user_id é *mmutable*— handle (@username) muda; ID não. Toda referência cross-table usa koder_user_id (BIGINT), nunca handle.

Workspace transfer

Workspace muda de owner: workspace.owner_id muda; workspace_id permanece. Resources com workspace_id = X continuam acessíveis pelos members atuais.

Account deletion (GDPR-style)

Quando user pede delete:

  1. Set user.deleted_unix = NOW() (soft delete)
  2. Cron job de retention varre tabelas e deleta rows

    WHERE koder_user_id = X AND <table-specific retention>

  3. Audit row em audit_log registra "user_deleted" antes da limpeza
  4. Retention default: 30 dias (configurável per-tenant pra

    compliance).

Account merge

Out of scope — Koder Stack não suporta merge automático de accounts. Admin-only manual operation se necessário.

Spec audit

Aplicabilidade automática (futuro: koder-spec-audit multi-tenancy):

  • Escaneia migrations: tabelas com PII columns (email, name,

    password*, key*) sem koder_user_id *em*user_id / org_id (3-scope model) → flag

  • Escaneia routers: endpoints sem auth middleware → flag
  • Escaneia código: SELECT * FROM <pii-table> sem WHERE → flag
  • Escaneia env vars: shared cache keys sem prefixo → flag
  • Escaneia migrations sem ALTER TABLE … ENABLE ROW LEVEL SECURITY

    para tabelas que carregam coluna koder_user_id / user_id / org_id → flag

  • Escaneia migrations systemwide (`- system-wide:` comment): cada

    tabela sem RLS deve ter uma justification clause de uma linha, ou o flag aponta um candidato a autoelevar para orgscoped.

Severity: error (block release) na primeira release que adopt o audit; advisory antes.

Source: ../home/koder/dev/koder/meta/docs/stack/specs/multi-tenancy/contract.kmd