Auto-heal selectors for UI regression tests
Spec: Auto-heal selectors for UI regression tests
Defines the contract for the /k-test-run auto-heal pass that retries failed UI selectors with progressively-weaker matching strategies, surfacing patches to update tests instead of failing them outright.
Source motivation: TestSprite's auto-heal-ui feature (cited in ticket #010). Goal: cosmetic refactors (renaming a Key, moving a button into a different parent) should not require touching every test that referenced the moved element.
R1 — Trigger conditions
The auto-heal pass MUST activate when, and only when, a UI test fails with one of these signatures:
| Framework | Failure signature |
|---|---|
| Flutter Widget | flutter_test matcher fails with "Expected: exactly one matching node in the widget tree" + node count = 0 |
| Playwright (web/desktop variants) | expect.locator(...).toBeVisible() timeout OR getByRole/getByText returns empty |
| Bubble Tea (TUI golden) | tea.NewProgram snapshot diverges from golden by > N% pixel diff (TBD) |
Other failures (assertion mismatch with non-zero matches, runtime panic, network error) MUST pass through unmodified — auto-heal is selector-drift only, not behavior drift.
R2 — Fallback strategy order
When R1 triggers, the heal pass MUST attempt the following selector strategies, *n this order* against the DOM snapshot at the moment of failure:
- *yText*— find a node whose visible text exactly matches the
text the original selector targeted (extracted via reverse-lookup from the original Key/role).
- *yRole*— find a node with the same ARIA/Semantics role (Flutter
Semantics widgets; Playwright
getByRole). - *yTooltip*— find a node whose
tooltip:/title:attributematches.
- *yCoordinates*— find a node at the same relative position
(within ±5% of viewport, scaled) as the original element occupied in the last passing run.
The first strategy that yields *xactly one*unique match wins. Multiple matches at the same strategy level MUST NOT advance — the heal pass aborts and the test fails normally (ambiguity is worse than miss).
R3 — Patch proposal
When a strategy succeeds:
- The test execution MUST emit a
SELF_HEAL_APPLIEDwarning row inthe test report (
stdout+tests/auto-heal-patches/<test-id>.log). The test result counts as PASS for CI gating purposes. - A unified-diff patch MUST be queued at
<module>/tests/auto-heal-patches/<test-id>-<timestamp>.patchshowing the proposed selector update (old selector → new strategy + keytextrole). - The patch MUST NEVER be auto-applied. Owner review is required
before adoption — patches in this dir are .gitignored review queue.
- Each patch carries a metadata header (
# strategy: byText, `# old:find.byKey(Key("foo"))
,# new: find.byTooltip("Save changes")`) for human and tooling inspection.
R4 — Opt-out
/k-test-run --no-auto-heal MUST disable the entire heal pass. When disabled, the original test failure surfaces unchanged.
Env var KODER_TEST_AUTO_HEAL=0 is the CI-friendly equivalent.
Default: auto-heal *nabled*(assumption: most CI failures are selector drift; cost of not-finding a match is the same as today — failure surfaces).
R5 — Telemetry
The heal pass MUST emit these counters via the koder-reporter SDK (services/foundation/reporter):
| Counter | Increment when |
|---|---|
kt_auto_heal_attempts |
R1 fires for any test |
kt_auto_heal_success |
R2 yields a unique match |
kt_auto_heal_strategy_<n> |
success at strategy n (1-4) |
kt_auto_heal_ambiguous |
R2 yielded ≥2 matches at the winning level |
kt_auto_heal_miss |
all 4 strategies failed |
Per-module drift rate is derived in dashboards: attempts / total_runs. A module with rising drift rate is a signal that its UI is refactoring faster than its tests track — surface to owner as a polish trigger.
R6 — Rate limiting
Per test execution, the heal pass MUST NOT retry the same test more than once. Reruns of the same `<testid>` within the same CI run MUST short-circuit (return cached patch decision) to avoid infinite loops on tests with chains of broken selectors.
Per CI run, the heal pass MUST cap total heal attempts at 100. Beyond the cap, remaining failures pass through unmodified with a HEAL_BUDGET_EXHAUSTED warning.
R7 — Per-framework adapter interface
A minimal Go interface (in products/dev/koder-tools/internal/
healselect/) lets each framework plug in:
type Adapter interface {
// ParseFailure extracts the original selector and the DOM snapshot
// from a test failure record. Returns (nil, ErrNotHealable) if the
// failure isn't a selector miss.
ParseFailure(record FailureRecord) (*OriginalSelector, *Snapshot, error)
// FindByText / FindByRole / FindByTooltip / FindByCoordinates implement
// R2's strategy chain. Each returns at most one match — multiple
// matches MUST yield nil + a non-nil ambiguity error.
FindByText(s *Snapshot, text string) (*Element, error)
FindByRole(s *Snapshot, role string) (*Element, error)
FindByTooltip(s *Snapshot, tip string) (*Element, error)
FindByCoordinates(s *Snapshot, x, y float64) (*Element, error)
// EmitPatch produces the unified-diff replacing the original
// selector with one targeting the found element via `strategy`.
EmitPatch(original *OriginalSelector, found *Element, strategy string) ([]byte, error)
}R7 is normative — implementations MUST satisfy this interface.
T1 — Flutter widget heal: byText hit
A Widget test using find.byKey(Key("foo")) where the Key is removed in a refactor but the button text "Save" is unchanged MUST pass with SELFHEALAPPLIED, byText strategy.
T2 — Flutter widget heal: byCoordinates miss when ambiguous
A test referencing a button at position (300, 200) where the refactor introduces two identical buttons at (300, 195) and (300, 205) MUST NOT heal — heal pass aborts at byCoordinates with ambiguous-match error; test fails normally.
T3 — Playwright heal: byRole hit across CSS-class rename
A Playwright test using page.locator('.btn-primary') where the class is renamed to .button-primary but role="button" and text "Submit" survive MUST pass with SELFHEALAPPLIED, byRole (since byRole comes before byText in the chain? — actually byText is R2.1, byRole is R2.2; this test exercises the byRole branch when text differs slightly).
Implementation note: the test author can override the chain order per test with
// koder:heal-strategy=byRole,byText— but the default order is normative R2.
T4 — Opt-out
KODER_TEST_AUTO_HEAL=0 koder-test-run … with a known-healable failure MUST report the original failure, no SELFHEALAPPLIED, no patch in tests/auto-heal-patches/.
Implementation slices (referenced by ticket #010)
- *lice 1 (this spec, 2026
0517 ✓)* design contract above. - *lice 2* Flutter Widget adapter —
internal/healselect/flutter.goimplementing R7; one fixture per T1–T2.
- *lice 3* Playwright adapter —
internal/healselect/playwright.go;T3.
- *lice 4* TUI/Bubble Tea adapter (deferred until R1's golden-diff
threshold is settled — TBD by koder-tui spec).
- *lice 5* telemetry wiring R5 + reporter integration.
- *lice 6* rate limiting R6 + budget cache.
- *lice 7* integration with
/k-test-runskill — opt-out flag,patch queue surfacing in test report.
Slices 2–7 each open as a sub-ticket in products/dev/koder-tools/backlog/pending/ when ready to start. Slice 2 is the smallest cohesive deliverable and the natural next step.
Non-goals
- This spec does NOT propose auto-applying patches. Patches are
proposals for human review.
- This spec does NOT govern Widget test generation itself — that's
the
/k-test-gen-uiskill's contract (seepolicies/regression- tests.kmd). Auto-heal kicks in at RUN time, not GEN time. - This spec does NOT cover non-UI test heal (API contract drift,
schema drift, fixture drift). Those are separate problems.
History
- *026
0517*— Initial draft v0.1.0. Ratified alongsidetools/design-gensession winddown. Source: ticketproducts/dev/koder-tools/backlog/pending/010(autohealui feature inspired by testsprite.com).