Auto-heal selectors for UI regression tests

draft

Spec: Auto-heal selectors for UI regression tests

Defines the contract for the /k-test-run auto-heal pass that retries failed UI selectors with progressively-weaker matching strategies, surfacing patches to update tests instead of failing them outright.

Source motivation: TestSprite's auto-heal-ui feature (cited in ticket #010). Goal: cosmetic refactors (renaming a Key, moving a button into a different parent) should not require touching every test that referenced the moved element.

R1 — Trigger conditions

The auto-heal pass MUST activate when, and only when, a UI test fails with one of these signatures:

Framework	Failure signature
Flutter Widget	`flutter_test` matcher fails with "Expected: exactly one matching node in the widget tree" + node count = 0
Playwright (web/desktop variants)	`expect.locator(...).toBeVisible()` timeout OR `getByRole/getByText` returns empty
Bubble Tea (TUI golden)	`tea.NewProgram` snapshot diverges from golden by > N% pixel diff (TBD)

Other failures (assertion mismatch with non-zero matches, runtime panic, network error) MUST pass through unmodified — auto-heal is selector-drift only, not behavior drift.

R2 — Fallback strategy order

When R1 triggers, the heal pass MUST attempt the following selector strategies, *n this order* against the DOM snapshot at the moment of failure:

*yText*— find a node whose visible text exactly matches the
text the original selector targeted (extracted via reverse-lookup from the original Key/role).
*yRole*— find a node with the same ARIA/Semantics role (Flutter
Semantics widgets; Playwright getByRole).
*yTooltip*— find a node whose tooltip: / title: attribute
matches.
*yCoordinates*— find a node at the same relative position
(within ±5% of viewport, scaled) as the original element occupied in the last passing run.

The first strategy that yields *xactly one*unique match wins. Multiple matches at the same strategy level MUST NOT advance — the heal pass aborts and the test fails normally (ambiguity is worse than miss).

R3 — Patch proposal

When a strategy succeeds:

The test execution MUST emit a SELF_HEAL_APPLIED warning row in
the test report (stdout + tests/auto-heal-patches/<test-id>.log). The test result counts as PASS for CI gating purposes.
A unified-diff patch MUST be queued at
<module>/tests/auto-heal-patches/<test-id>-<timestamp>.patch showing the proposed selector update (old selector → new strategy + keytextrole).
The patch MUST NEVER be auto-applied. Owner review is required
before adoption — patches in this dir are .gitignored review queue.
Each patch carries a metadata header (# strategy: byText, `# old:
find.byKey(Key("foo")), # new: find.byTooltip("Save changes")`) for human and tooling inspection.

R4 — Opt-out

/k-test-run --no-auto-heal MUST disable the entire heal pass. When disabled, the original test failure surfaces unchanged.

Env var KODER_TEST_AUTO_HEAL=0 is the CI-friendly equivalent.

Default: auto-heal *nabled*(assumption: most CI failures are selector drift; cost of not-finding a match is the same as today — failure surfaces).

R5 — Telemetry

The heal pass MUST emit these counters via the koder-reporter SDK (services/foundation/reporter):

Counter	Increment when
`kt_auto_heal_attempts`	R1 fires for any test
`kt_auto_heal_success`	R2 yields a unique match
`kt_auto_heal_strategy_<n>`	success at strategy `n` (1-4)
`kt_auto_heal_ambiguous`	R2 yielded ≥2 matches at the winning level
`kt_auto_heal_miss`	all 4 strategies failed

Per-module drift rate is derived in dashboards: attempts / total_runs. A module with rising drift rate is a signal that its UI is refactoring faster than its tests track — surface to owner as a polish trigger.

R6 — Rate limiting

Per test execution, the heal pass MUST NOT retry the same test more than once. Re~~runs of the same `<test~~id>` within the same CI run MUST short-circuit (return cached patch decision) to avoid infinite loops on tests with chains of broken selectors.

Per CI run, the heal pass MUST cap total heal attempts at 100. Beyond the cap, remaining failures pass through unmodified with a HEAL_BUDGET_EXHAUSTED warning.

R7 — Per-framework adapter interface

A minimal Go interface (in products/dev/koder-tools/internal/ healselect/) lets each framework plug in:

type Adapter interface {
    // ParseFailure extracts the original selector and the DOM snapshot
    // from a test failure record. Returns (nil, ErrNotHealable) if the
    // failure isn't a selector miss.
    ParseFailure(record FailureRecord) (*OriginalSelector, *Snapshot, error)

    // FindByText / FindByRole / FindByTooltip / FindByCoordinates implement
    // R2's strategy chain. Each returns at most one match — multiple
    // matches MUST yield nil + a non-nil ambiguity error.
    FindByText(s *Snapshot, text string) (*Element, error)
    FindByRole(s *Snapshot, role string) (*Element, error)
    FindByTooltip(s *Snapshot, tip string) (*Element, error)
    FindByCoordinates(s *Snapshot, x, y float64) (*Element, error)

    // EmitPatch produces the unified-diff replacing the original
    // selector with one targeting the found element via `strategy`.
    EmitPatch(original *OriginalSelector, found *Element, strategy string) ([]byte, error)
}

R7 is normative — implementations MUST satisfy this interface.

A Widget test using find.byKey(Key("foo")) where the Key is removed in a refactor but the button text "Save" is unchanged MUST pass with SELFHEALAPPLIED, byText strategy.

A test referencing a button at position (300, 200) where the refactor introduces two identical buttons at (300, 195) and (300, 205) MUST NOT heal — heal pass aborts at byCoordinates with ambiguous-match error; test fails normally.

T3 — Playwright heal: byRole hit across CSS-class rename

A Playwright test using page.locator('.btn-primary') where the class is renamed to .button-primary but role="button" and text "Submit" survive MUST pass with SELFHEALAPPLIED, byRole (since byRole comes before byText in the chain? — actually byText is R2.1, byRole is R2.2; this test exercises the byRole branch when text differs slightly).

Implementation note: the test author can override the chain order per test with // koder:heal-strategy=byRole,byText — but the default order is normative R2.

T4 — Opt-out

KODER_TEST_AUTO_HEAL=0 koder-test-run … with a known-healable failure MUST report the original failure, no SELFHEALAPPLIED, no patch in tests/auto-heal-patches/.

Implementation slices (referenced by ticket #010)

*lice 1 (this spec, 20260517 ✓)* design contract above.
*lice 2* Flutter Widget adapter — internal/healselect/flutter.go
implementing R7; one fixture per T1–T2.
*lice 3* Playwright adapter — internal/healselect/playwright.go;
T3.
*lice 4* TUI/Bubble Tea adapter (deferred until R1's golden-diff
threshold is settled — TBD by koder-tui spec).
*lice 5* telemetry wiring R5 + reporter integration.
*lice 6* rate limiting R6 + budget cache.
*lice 7* integration with /k-test-run skill — opt-out flag,
patch queue surfacing in test report.

Slices 2–7 each open as a sub-ticket in products/dev/koder-tools/backlog/pending/ when ready to start. Slice 2 is the smallest cohesive deliverable and the natural next step.

Non-goals

This spec does NOT propose auto-applying patches. Patches are
proposals for human review.
This spec does NOT govern Widget test generation itself — that's
the /k-test-gen-ui skill's contract (see policies/regression- tests.kmd). Auto-heal kicks in at RUN time, not GEN time.
This spec does NOT cover non-UI test heal (API contract drift,
schema drift, fixture drift). Those are separate problems.

History

*0260517*— Initial draft v0.1.0. Ratified alongside
tools/design-gen session winddown. Source: ticket products/dev/koder-tools/backlog/pending/010 (auto~~heal~~ui feature inspired by testsprite.com).