Regression Tests per Bug

mandatory

Toda correção de bug exige teste de regressão (3 categorias: behavioral, golden, estrutural) que falhe sem o fix e passe com ele. Testes ficam em tests/regression/. Registry: registries/regression-test-cases.md.

Policy — Regression Tests per Bug

When fixing any bug in any module of the monorepo, *ou must*create a regression test that:

*eproduces the bug*— the test fails if the fix is reverted.
*asses after the fix*
*tays permanently in the repository*as regression protection.

Directory Structure (per module)

modulo/
  tests/
    regression/
      NNN-descricao-do-bug.test.{ext}

NNN: sequential number per module (001, 002, ...).
{ext}: language-specific extension of the module (.kd, .go, .ts, .dart, .py, .sh, etc.).
*anguage-specific conventions override*when the test runner requires a fixed filename format:
- *o*(go test discovers only *_test.go): use NNN_descricao_test.go (e.g. 001_sighup_reload_test.go).
- *ust* tests live in tests/ with NNN_descricao.rs.
- *ython (pytest)* prefer test_NNN_descricao.py.
- Other languages (.kd, .ts, .dart, .sh, etc.): follow NNN-descricao.test.{ext}.

Test Header

Every regression test must start with a comment containing:

Reference to the backlog ticket (if any).
Description of the bug in one line.
Expected behavior (correct) vs buggy behavior.

Rules

*imple and focused*— test only the specific bug, not the whole system.
*o unstable external dependencies*(APIs, network, third-party services).
*o UI tests coupled to layout*— test logic/behavior, not pixels.
If the module has no tests/regression/ yet, create it on the first bug.
Name the file descriptively (e.g. 003-argv-variable-mismatch.test.kd).

Test Categories (preference order)

6.a — Behavioral (default)

Executes the code and validates observable behavior (output, exit code, state, side effects). Default choice. Any bug fix should try this category first.

6.b — Golden

Validates *nvariance/contract between multiple representations of the same data*via roundtrip, diff, or direct comparison. First~~class category, not a workaround. Use when the goal isn't "X does Y" but "A and B agree on X". Typical cases: DSL↔generated code, encoder↔decoder, single~~sourceoftruth with multiple consumers, binary↔text formats. The test runs something and checks exit code, but the "behavior" tested is consistency, not business logic.

*alidate by injecting drift*(e.g. intentionally alter one side and confirm the test fails) before committing. Mark as golden in the "Revisão?" column of the registry. Canonical examples: 023-opcode-dsl-consistency.test.sh, 024-kbcb-roundtrip.test.sh, 025-kode-disasm.test.sh, 026-bytekode-loc-sourcemap.test.sh in engines/lang/lang/tests/regression/.

6.c — Structural as stopgap (exception)

When the target module has no executable runtime harness and building one is out of scope for the fix, a structural test (grep/AST on source) is acceptable *nly if all*of these hold:

(a) the test *ails*if the fix is reverted (verify with git stash before committing);
(b) an *pen backlog ticket*tracks the construction of the harness, and is referenced in the test's comment header;
(c) the test's top comment *xplains why*a behavioral test isn't viable today;
(d) the registry entry (meta/context/registries/regression-test-cases.md) is marked with estrutural in the "Revisão?" column to ease future auditing.

Structural stopgaps must be replaced with behavioral tests as soon as the harness exists.

Registry

Keep the counter uptodate in meta/context/registries/regression-test-cases.md.
Each new regression test adds an entry to the registry.
*hile the counter ≤ 30 cases* review this policy and the process at each new case to identify improvements (recurring patterns, bug categories, needed helpers, structural adjustments). Apply improvements to the policy and inform the user.
*fter 30 cases* the policy is considered mature — stop automatic reviews. The user can request a manual review at any time.