Multi-tenancy contract
Spec — Multi-tenancy contract
Mecanismos concretos para implementar policies/multi-tenant-by-default.kmd. Spec é normativo: todo módulo da Koder Stack que armazena dadodeusuário deve passar nos checks T1–T9 abaixo.
Identity model
koder_user_id BIGINT NOT NULL -- FK to services/foundation/id.user(id)
workspace_id BIGINT -- nullable; FK to id.workspace(id)koder_user_id é canonical PK partial em toda tabela com PII. workspace_id amplifica scope: nullable significa "pessoal"; non-null significa "do workspace, todos os members veem por membership".
Tabela canonical de membership (Koder ID):
CREATE TABLE workspace_member (
workspace_id BIGINT NOT NULL,
koder_user_id BIGINT NOT NULL,
role TEXT NOT NULL, -- 'owner' | 'admin' | 'member' | 'viewer'
joined_unix BIGINT NOT NULL,
PRIMARY KEY (workspace_id, koder_user_id)
);Qualquer query cross-workspace passa por essa tabela. *ão cacheia membership client-side*— é hot path do auth, fica em services/foundation/id com cache de 60s server-side.
Scope model — 3 categories + system-wide
The koder_user_id + workspace_id identity model above is the *enant scope*— every product table that holds PII goes here. Foundation services (auth, audit, behavior, device, telephony, SSO …) also need isolation, but along axes that are not the product-tenant axis. The Koder Stack therefore defines three isolation scopes plus an explicit no-RLS bucket:
| Scope | Discriminator column | GUC setting | Helper | When to use |
|---|---|---|---|---|
| *enant-scoped* | koder_user_id + optional workspace_id |
app.current_tenant_id |
kdb.WithTenantTx |
Default for product data with PII (Dek recordings, Drive blobs, Talk messages, Kruze bookmarks, …) |
| *rg-scoped* | org_id |
app.current_org_id |
kdb.WithOrgTx |
Foundation tables owned by an org: audit events, MFA policies, access policies, org settings |
| *ser-scoped* | user_id (=koder_user_id at the foundation layer) |
app.current_user_id |
kdb.WithUserTx |
Foundation tables owned by a single user: behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooksapi keys, SSO grantssessions |
| *ystem-wide* | (none) | (none) | (none — admin auth) | Tables shared across all tenants by design: rate-limit buckets, SSO client registry, SSO revocation list. Access is governed by admin auth, not RLS |
Selection rules:
- *enant-scoped is the default for product data.*Tables that
surface in a product UI carry
koder_user_id. - *rg-scoped applies when the primary owner is an org* not a
user — even if individual user IDs are recorded. Audit events for an org are visible to every admin of that org regardless of who triggered the event.
- *ser
scoped applies when the row is peruser inside afoundation service*and there is no org or product tenant in play. Behavior risk profiles, QR pairing sessions, sync state.
- *ystem-wide is the explicit exception.*It must be justified
in the table's migration comment (e.g. "global rate-limit counters; no per-tenant cardinality"). Drift toward this scope without justification is a spec violation.
Mixed-scope tables (e.g. device carrying both user_id and org_id columns) either split into two migrations_rls files *r*use a composite policy:
USING (org_id = current_setting('app.current_org_id', true)
OR user_id = current_setting('app.current_user_id', true))The composite form is acceptable when both isolation axes apply to the same row simultaneously; otherwise prefer the split.
PAT scope grammar
PAT (Personal Access Token) emitido pelo Koder ID carrega scope. Sintaxe canonical (herdado do Flow RFC-003 credentials/backups):
<verb>:<resource>[:<modifier>]
verbs: read | write | admin
resources: user | workspace | repo | credentials | backups | …
modifier: optional, e.g. "self" or "<id>"Exemplos:
read:user— ler perfil própriowrite:credentials— escrever credentials no scope que o PAT herdouread:workspace:<id>— ler dados específicos de um workspaceadmin:user— privileged self-management
PAT é *coped to a single koderuserid*(o owner). Workspace access é resolvido via workspace_member na hora do request, não via PAT scope. PATs *ão*atravessam tenants.
RLS template (Postgres / kdb-next)
Toda tabela com PII tem RLS. Helper migration:
-- 1. Schema com tenant fields
CREATE TABLE my_resource (
id BIGSERIAL,
koder_user_id BIGINT NOT NULL REFERENCES koder_id.user(id),
workspace_id BIGINT REFERENCES koder_id.workspace(id),
payload JSONB NOT NULL,
created_unix BIGINT NOT NULL DEFAULT extract(epoch from now()),
PRIMARY KEY (koder_user_id, id)
);
-- 2. Index on tenant + recent
CREATE INDEX ix_my_resource_user_recent
ON my_resource (koder_user_id, created_unix DESC);
-- 3. RLS enable + policy
ALTER TABLE my_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_owner ON my_resource
USING (koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT);
CREATE POLICY p_workspace_member ON my_resource
USING (workspace_id IS NOT NULL
AND EXISTS (
SELECT 1 FROM koder_id.workspace_member m
WHERE m.workspace_id = my_resource.workspace_id
AND m.koder_user_id = current_setting('app.current_tenant_id', true)::BIGINT
));Connection setup (per request):
// Tenant-scoped (default for product data with PII)
err := kdb.WithTenantTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})The helper opens a transaction, runs SELECT set_config('app.current_tenant_id', $1, true), and dispatches the callback through a *sql.Tx. Empty scope ID is fail-closed (returns an error rather than running unscoped).
*ypass admin path*(rare): an explicit RESET app.current_tenant_id is privilege of the role koder_admin only. Audit log obrigatório em qualquer reset.
Org-scoped variant
For foundation tables owned by an org (audit events, MFA policies, access policies):
ALTER TABLE org_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_org ON org_resource
USING (org_id = current_setting('app.current_org_id', true)::BIGINT);err := kdb.WithOrgTx(ctx, store, auth.OrgID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})User-scoped variant
For foundation tables owned by a single user (behavior events, QR sessions, sync state, telephony OTPs, user-scoped webhooks):
ALTER TABLE user_resource ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_user ON user_resource
USING (user_id = current_setting('app.current_user_id', true)::BIGINT);err := kdb.WithUserTx(ctx, store, auth.UserID, func(ctx context.Context, s kdb.Store) error {
return repo.List(ctx)
})All three helpers share the same body in services/foundation/id/engine/pkg/kdb/rls.go::withScopeTx — failclosed on empty scope, ctxstash propagated to nested calls, test coverage in rls_scope_test.go.
KV / cache template (Redis-style)
Toda key tem prefixo de tenant:
<namespace>:<tenant-key>:<resource-key>
examples:
rate_limit:user:<uid>:5h_window → counter
session:user:<uid>:<session_id> → JSON
presence:workspace:<wid>:<uid> → booleanHelper:
func TenantKey(uid int64, parts ...string) string {
return fmt.Sprintf("user:%d:%s", uid, strings.Join(parts, ":"))
}Key sem prefixo → *ug crítico*(cross-tenant leak via cache).
S3 / object storage template
Path:
<bucket>/<koder_user_id>/<workspace_id|"personal">/<resource_id>/<file>IAM / signedURL: perrequest, restricted to the tenant prefix.
Test contract — T1..T12
Todo módulo multi-tenant tem suite que cobre:
Tenant-scoped baseline (T1..T9 — every product table)
| ID | Test | Description |
|---|---|---|
| *1* | Auth required | GET /resource sem PAT → 401 |
| *2* | Self read | A's PAT, GET /my-resource → A's data only |
| *3* | Cross-tenant read denied | A's PAT, GET /resource/<B's id> → *04 (not 403)* |
| *4* | Cross-tenant write denied | A's PAT, POST /resource setting koderuserid=B → 400 or silent override to A |
| *5* | Workspace member read | A in workspace W, GET /resource?workspace=W → all members' data |
| *6* | Workspace non-member read | A not in W, GET /resource?workspace=W → 404 |
| *7* | RLS isolation | Direct DB query without SET LOCAL app.current_tenant_id → returns nothing (or error) |
| *8* | Index efficiency | EXPLAIN of A's read uses tenant index, not seq scan |
| *9* | Tenant deletion | When user A is deleted, all WHERE koder_user_id = A rows are removed within retention window |
Org/user-scoped extension (T10..T12 — foundation services)
Required for any table whose RLS policy is keyed by org_id or user_id (not koder_user_id). Sibling of T3+T7 but along the secondary axis.
| ID | Test | Description |
|---|---|---|
| *10* | Cross-org read denied | Admin of org A, GET /audit-events from org B context → *04 (not 403)* Mirrors T3 for app.current_org_id. |
| *11* | Cross |
User A's session, GET /qr-sessions/<B's id> → 404. Mirrors T3 for app.current_user_id. |
| *12* | Scope isolation in RLS | Direct DB query without the appropriate SET LOCAL app.current_org_id / app.current_user_id → returns nothing (or error). Mirrors T7 for the secondary axis. |
Systemwide tables (ratelimit buckets, SSO client registry, SSO revocation list) intentionally have no T10..T12 row — they are not scope-protected by RLS and instead rely on admin auth. Their migration MUST carry a -- system-wide: <justification> comment that audit will read.
Cada implementação ships com tests/multi-tenant/T1..T9_test.go (produto) ou tests/multi-tenant/T10..T12_test.go (foundation org/user-scoped). Audit: PR sem o subconjunto aplicável verde *loqueia merge*(ver policies/regression-tests.kmd co-enforcement).
Error model
| Cenário | HTTP | gRPC | Body |
|---|---|---|---|
| Sem auth | 401 | UNAUTHENTICATED | {"error": "auth required"} |
| Token inválido | 401 | UNAUTHENTICATED | {"error": "invalid token"} |
| Recurso não-existente OU de outro tenant | 404 | NOT_FOUND | {"error": "not found"} |
| Recurso existe mas role insuficiente (workspace member sem write) | 403 | PERMISSION_DENIED | {"error": "insufficient role"} |
| Bad input | 400 | INVALID_ARGUMENT | {"error": "<details>"} |
| Server error | 500 | INTERNAL | {"error": "internal"} |
Crítico: *04, não 403* em cross-tenant cases. 403 vaza existência ("este id existe mas você não pode ler" → atacante sabe que existe).
Audit log
Toda operação *utating*que toca PII grava audit row:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
actor_user_id BIGINT NOT NULL, -- the PAT owner
target_user_id BIGINT, -- tenant being acted on (often = actor)
action TEXT NOT NULL, -- 'create' | 'update' | 'delete' | 'read_admin'
resource TEXT NOT NULL, -- 'credentials' | 'usage' | …
resource_id BIGINT,
payload JSONB,
created_unix BIGINT NOT NULL
);
CREATE INDEX ix_audit_actor ON audit_log (actor_user_id, created_unix DESC);
CREATE INDEX ix_audit_target ON audit_log (target_user_id, created_unix DESC);Audit row é *est-effort write*(failure logs but doesn't abort the action; ver flow#056b policy).
Sharding model (futuro, hyperscale)
Quando uma tabela passar de ~10M rows ou ~100K tenants ativos:
- Range-shard por
koder_user_id(TiKV PD faz isso automático emkdb-next)
- Hash-shard via
hash(koder_user_id) % N(alternativa em Postgrescom Citus / pg_partman)
- Geo
shard por região do tenant (multiregion future, verstack
RFC001 §faseamento)
Não pré-otimizar. Trigger: monitoring sinalizar p99 latência > 50ms ou table size > 1TB.
Edge cases
User rename / handle change
koder_user_id é *mmutable*— handle (@username) muda; ID não. Toda referência cross-table usa koder_user_id (BIGINT), nunca handle.
Workspace transfer
Workspace muda de owner: workspace.owner_id muda; workspace_id permanece. Resources com workspace_id = X continuam acessíveis pelos members atuais.
Account deletion (GDPR-style)
Quando user pede delete:
- Set
user.deleted_unix = NOW()(soft delete) - Cron job de retention varre tabelas e deleta rows
WHERE koder_user_id = X AND <table-specific retention> - Audit row em
audit_logregistra "user_deleted" antes da limpeza - Retention default: 30 dias (configurável per-tenant pra
compliance).
Account merge
Out of scope — Koder Stack não suporta merge automático de accounts. Admin-only manual operation se necessário.
Spec audit
Aplicabilidade automática (futuro: koder-spec-audit multi-tenancy):
- Escaneia migrations: tabelas com PII columns (
email,name,password*,key*) semkoder_user_id*em*user_id/org_id(3-scope model) → flag - Escaneia routers: endpoints sem auth middleware → flag
- Escaneia código:
SELECT * FROM <pii-table>semWHERE→ flag - Escaneia env vars: shared cache keys sem prefixo → flag
- Escaneia migrations sem
ALTER TABLE … ENABLE ROW LEVEL SECURITYpara tabelas que carregam coluna
koder_user_id/user_id/org_id→ flag - Escaneia migrations system
wide (`- system-wide:` comment): cadatabela sem RLS deve ter uma justification clause de uma linha, ou o flag aponta um candidato a auto
elevar para orgscoped.
Severity: error (block release) na primeira release que adopt o audit; advisory antes.