RFC 015 — Browser Event Source for ChatOps

Approved

RFC 015 — Browser Event Source for ChatOps

Approved ChatOps extension

  • Tracking ticket: backlogpending015rfcbrowsereventsource.md
  • Depends on:
    • RFC 003 — Common Event Schema (Approved, 20260408) — extended here with koder.browser.* event types
    • RFC 004 — Common Control Plane (Approved) — defines the koder-jet admin API used to enable/disable browser tap per site
    • RFC 007 — ChatOps Pipeline (Approved, 20260414) — this RFC adds a third event source to the pipeline defined there
  • Implementation home: infra/jet (injection + ingestion + enrichment) and platform/kortex (event ingestion + new chatops rules)
  • Status: Approved (20260414)

1. Summary

This RFC defines the Browser Event Source — a new upstream for the Kortex ChatOps pipeline that captures clientside errors automatically and proactively from every web application served by `koderjet. It is the third event source after observe/log (server-side logs) and kodertalk` (chat screenshots), and it closes the "clientside blind spot" explicitly identified in RFC 007 §2.3.

The mechanism is straightforward: koder-jet, the reverse proxy that fronts every Koder web property, injects a small JavaScript payload (tap.js, ~35 KiB minified + gzipped) into every HTML response. The payload installs error listeners (window.onerror, unhandledrejection, console wrapping, ResourceTiming, CSP violations, Web Vitals) and beacons batched events back to a jetserved endpoint. Jet enriches each event with deployment context (service name, version, deploy id, environment), scrubs PII, deduplicates floods, and forwards to Kortex Senses as standard OTLP events with the new koder.browser.* event type family.

The key design principles are:

  1. Proactive over reactive. Browser events are captured automatically, without the user having to remember to send a screenshot. Most users hit a bug and abandon the page; the bug is gone unless we caught it in the browser.
  2. Zero application code. The mechanism is reverseproxylevel. Application teams do not need to install an SDK, link a library, or change their build. Optin is persite in sites.toml, applies immediately to the next request.
  3. OTLP all the way. The wire format follows RFC 003 — Common Event Schema. No bespoke protocol. Events flow into Kortex Senses on the same OTLP endpoints already in production.
  4. Privacy by default. No URL query strings, no input values, no cookies, no localStorage. Stack traces are resolved serverside via private source maps so the maps never leave the jet host. PII scrubbing happens before the event leaves the user's network in the case of selfhosted deployments.
  5. Bounded cost. Persite sampling, pererror rate limiting, and per-user circuit breaking ensure that an application bug that triggers thousands of console errors per second cannot fill the Kortex queue or run up an LLM bill.

The output is a new, fully-supported source feeding RFC 007's existing chatops rules engine. The same autonomous/supervised mode switching, the same Kode invocation, the same chat feedback loop. From the rules engine's perspective, a koder.browser.error event is just another input — the same downstream pipeline that handles screenshot reports handles JS exceptions.

1.1 The pipeline at a glance

[ Browser ]                                          [ koder-jet ]                  [ Kortex Senses ]
   │                                                       │                              │
   │  GET /index.html ─────────────────────────────────────▶                              │
   │  ◀── HTML + injected <script src="/_jet/tap.js"> ─────│                              │
   │                                                       │                              │
   │  GET /_jet/tap.js ─────────────────────────────────────▶                              │
   │  ◀── tap.js (~3-5 KiB minified, gzipped, cached) ─────│                              │
   │                                                       │                              │
   │  [ tap.js installs listeners ]                        │                              │
   │   • window.onerror                                    │                              │
   │   • window.onunhandledrejection                       │                              │
   │   • console.error wrapper                             │                              │
   │   • PerformanceObserver (longtask, paint, lcp)        │                              │
   │   • ResourceTiming (failed CSS/JS/img/fetch)          │                              │
   │   • CSP violation reports                             │                              │
   │   • fetch / XHR monkey-patch (5xx, network errors)    │                              │
   │   • visibilitychange / beforeunload (final flush)     │                              │
   │                                                       │                              │
   │  POST /_jet/beacon ─[batched JSON, gzipped, sendBeacon]▶                             │
   │                                                       │                              │
   │                                                       │  [ jet enrichment ]          │
   │                                                       │   • add resource attrs:      │
   │                                                       │     service, version, env,   │
   │                                                       │     deploy id, build sha     │
   │                                                       │   • resolve stack traces     │
   │                                                       │     via private source maps  │
   │                                                       │   • PII scrub                │
   │                                                       │   • dedupe (error_hash + ttl)│
   │                                                       │   • per-user circuit break   │
   │                                                       │                              │
   │                                                       │  POST /v1/events ────────────▶ ChatOps rules engine
   │                                                       │   event_type:                │     (RFC 007)
   │                                                       │   "koder.browser.error"      │

2. Context and motivation

2.1 The client-side blind spot

RFC 007 §2.3 enumerated the gap explicitly:

Some errors are client-side only (JavaScript exceptions, UI glitches, rendering bugs) Some errors are transient and don't leave persistent log traces Some errors are configuration issues that don't throw exceptions Users often report behavioral bugs ("this button does nothing") that have no error log

Today the ChatOps pipeline has two event sources:

  1. *observe/log`*— serverside logs. Catches anything that produces a log line on the backend. Misses everything that happens after the HTML is served: JavaScript exceptions, fetch failures the backend never saw, render bugs, asset 404s served from CDN, CSP violations, thirdparty widget breakage.
  2. *koder-talk`*(RFC 007 §3) — screenshots posted in chat groups. Reactive: depends on a user noticing, taking a screenshot, opening the right group, and sending it. Even motivated users do this maybe one time in twenty.

The result is that the pipeline is blind to a significant fraction of real production bugs. A JavaScript exception that breaks a form submit will never reach Kortex unless the user manually reports it.

2.2 The PoC precedent (revisited)

The Pouso Alegre — MG PoC (MarchApril 2026) proved that autonomous monitoring + AI fix + autodeploy works for serverside logbased detection. The same loop should work for browserside detection — and arguably better, because clientside errors usually carry richer context (URL, user agent, full stack trace, the exact page state that triggered the bug) than a server log line typically does.

The PoC also established the trust model: users want fixes to happen automatically. The friction of reporting bugs is the bottleneck, not the willingness to receive fixes. Removing the reporting step entirely is the next obvious move.

2.3 Why koder-jet is the right home

koder-jet is the reverse proxy that fronts every Koder web property. Three properties make it the natural injection point:

  1. It already touches every HTML response. Adding a <script> tag to the response body is trivial — the streaming HTML rewriter already exists for the benchmark and observability injections.
  2. It already knows the deployment context. The site config in sites.toml knows which service the request is going to, what version is currently deployed, what environment, and which release artifact. Every browser event can be enriched with this metadata at the edge, with zero application participation.
  3. It already talks to Kortex. Jet sends deploy events to Kortex on every release. The OTLP push channel is established. Adding a new event type is a config change.

2.4 Why not Sentry / Bugsnag / Datadog RUM

Three reasons:

  1. Self-hosting requirement. Koder products are deployed in environments (PoC servers, on-prem hospital systems, regulated tenants) where sending raw browser data to a SaaS RUM provider is not acceptable on privacy or compliance grounds.
  2. Pipeline integration. Offtheshelf RUM products produce dashboards. The Koder ChatOps pipeline produces fixes. The output of the RUM tool is the input of the autonomous fix loop — the connector between them is the interesting work, and we may as well own the whole stack.
  3. Cost. SaaS RUM providers charge per session or per event. At Koder's planned scale (every customer site fronted by jet), the perevent cost of a SaaS solution dominates. A homegrown jetbased capture is a fixed cost.

This RFC does not preclude also shipping events to a third-party RUM tool when a customer wants it. The jet endpoint can fan out. But the canonical destination is Kortex.


3. Event categories

The browser tap captures events in the following categories. Each category maps to a specific koder.event.type value under the koder.browser.* namespace.

Category Event type Source API Triggered by
*avaScript exception* koder.browser.error.js window.onerror, addEventListener("error") Uncaught throw, syntax error, runtime error in script
*romise rejection* koder.browser.error.unhandled_rejection window.onunhandledrejection Async/await error not caught, .then chain without .catch
*onsole error* koder.browser.error.console console.error monkey-patch Application code that logs an error without throwing
*etwork failure* koder.browser.error.network fetch / XMLHttpRequest monkey-patch + PerformanceResourceTiming 5xx response, CORS error, network unreachable, certificate error, abort
*sset load failure* koder.browser.error.asset error event on <img>, <script>, <link>, <iframe> 404 / 5xx on CSS, JS, image, font, video, iframe
*SP violation* koder.browser.error.csp securitypolicyviolation event Inline script blocked, eval blocked, foreign origin blocked
*ong task* koder.browser.perf.longtask PerformanceObserver({type:"longtask"}) Main thread blocked > 50ms (default; configurable)
*ayout shift (CLS)* koder.browser.perf.layout_shift PerformanceObserver({type:"layout-shift"}) Cumulative Layout Shift > 0.25 over the session
*low LCP* koder.browser.perf.lcp_slow PerformanceObserver({type:"largest-contentful-paint"}) LCP > 4s (poor per Web Vitals)
*age abandon after error* koder.browser.behavior.abandon visibilitychange + recent error in window User closes tab within N seconds of an error event firing
*age click*(Phase 2) koder.browser.behavior.rage_click Click handler heuristic ≥3 clicks on the same element within 1 second with no resulting state change
*ead click*(Phase 2) koder.browser.behavior.dead_click Click handler + mutation observer Click on element that has no observable effect within 500ms

The first 6 categories (errors) are the high-value events that flow into the ChatOps fix pipeline. Categories 79 (perf) are signals that feed dashboards and trend analysis but do not normally trigger autonomous fixes. Categories 1012 (behavior) are user experience events that complement the technical signals — a rage click is the strongest possible signal that something is broken even when no exception fires.

3.1 Phasing

Phase Categories included Notes
Phase 1 js, unhandled_rejection, console, network, asset, csp The technical errors. Highest signal, lowest privacy risk, simplest to implement.
Phase 2 longtask, layoutshift, lcpslow, abandon Performance and abandonment. Useful for trend analysis and as secondary signals for the fix pipeline.
Phase 3 rageclick, deadclick Behavioral. Highest privacy risk (requires DOM inspection). Optional even after Phase 3 ships.

4. The BrowserEvent schema

The schema extends RFC 003 — Common Event Schema. Browser events ride on the standard OTLP Logs records with a structured body and the type discriminator under koder.event.type. This RFC defines the body shape and the resource attributes added by jet at the edge.

4.1 Resource attributes (added by jet)

These are attached to every event by koder-jet after it receives the beacon, before forwarding to Kortex. They are the same attributes already present on every other Koder OTLP event (RFC 003 §4) — reused without extension.

Attribute Source Example
koder.product.name sites.toml site definition "saude-publica"
koder.product.version jet deploy state "v2.3.0"
koder.deployment.env sites.toml "prd", "stg", "dev" (3-letter canonical per policies/environments.kmd; legacy "production"/"staging" accepted by jet validator with deprecation warning — jet#139)
koder.deployment.deploy_id jet deploy state "dpl-20260408-153022"
koder.deployment.build_sha jet deploy state "a8f7c3d"
koder.tenant.id request header (if present) "tenant-vivver-pousoalegre"
koder.event.type constant per category "koder.browser.error.js"
koder.event.severity per category default "error", "warn", "info"

Note that koder.product.version, deploy_id, and build_sha come from jet's own deploy state, not from the browser. The browser cannot lie about which version is deployed — jet knows because it served the assets.

4.2 Body fields (browser-supplied)

The event body is a JSON object. Fields are conditionally present based on event category. The following table is the union; per-category required fields are noted.

Field Type Required for Description
event_id string (uuid v7) all Unique perevent id, generated clientside
session_id string all Per-tab session id (lifetime: page load to unload)
page_url string (sanitized) all URL with query string and fragment stripped
page_route string all Logical route, if the app uses a router (set via tap.js API)
referrer string (sanitized) all Same sanitization as page_url
user_agent string all Raw UA string
viewport object {w, h} all Viewport pixels
timestamp_ms int64 all Browserside ms epoch (besteffort, may drift)
error_message string js, unhandled_rejection, console The thrown message / rejection reason / first console arg
error_name string js, unhandled_rejection Error.name (e.g., "TypeError")
stack_trace string js, unhandled_rejection Raw clientside stack (jet resolves to source coords serverside)
source_file string js The file where the error originated (post-resolution)
source_line int js Source line (post-resolution)
source_column int js Source column (post-resolution)
script_url string asset URL of the failing asset
network_url string (sanitized) network URL of the failing fetch
network_status int network HTTP status code (0 if no response)
network_method string network HTTP method
csp_directive string csp Violated directive
csp_blocked_uri string csp URI that was blocked
duration_ms int longtask, lcp_slow Performance metric value
cls_value float layout_shift CLS score
error_hash string (sha256[:16]) all Hash of the deduplication key (see §6.2)
breadcrumbs arrayobject optional Last N user actions / navigations / network requests preceding the error (Phase 2)

4.3 Concrete example — JavaScript exception

{
  "resource_attributes": {
    "koder.product.name": "saude-publica",
    "koder.product.version": "v2.3.0",
    "koder.deployment.env": "production",
    "koder.deployment.deploy_id": "dpl-20260408-153022",
    "koder.deployment.build_sha": "a8f7c3d",
    "koder.event.type": "koder.browser.error.js",
    "koder.event.severity": "error"
  },
  "timestamp_unix_nano": 1712592623123456000,
  "body": {
    "event_id": "01HX4M2AWPY8Z5Q7K4F8R3N0Z6",
    "session_id": "sess-7f3a2b",
    "page_url": "https://saude.poc.vivver.com/agendamento",
    "page_route": "/agendamento",
    "referrer": "https://saude.poc.vivver.com/dashboard",
    "user_agent": "Mozilla/5.0 (X11; Linux x86_64) ...",
    "viewport": {"w": 1920, "h": 1080},
    "timestamp_ms": 1712592623123,
    "error_message": "Cannot read properties of null (reading 'patientId')",
    "error_name": "TypeError",
    "stack_trace": "TypeError: Cannot read properties of null (reading 'patientId')\n    at AgendamentoForm.handleSubmit (agendamento.js:142:21)\n    at HTMLFormElement.<anonymous> (agendamento.js:89:12)",
    "source_file": "src/components/AgendamentoForm.tsx",
    "source_line": 142,
    "source_column": 21,
    "error_hash": "a8f7c3d5b2e91047"
  }
}

4.4 Schema registration

This RFC requires registering the new event types in infra/observe/observability/schemas/browser/v1/. The schema lives there per RFC 003 §10. The Protobuf source becomes the canonical definition; jet vendors the generated Go SDK; tap.js uses a hand-authored TypeScript type that mirrors the schema (validated in CI against the JSON Schema generated from the Protobuf).


5. koder-jet — the injection and ingestion side

This section is the implementation contract for the infra/jet side of the work. It is intentionally specific so that ticket 071 (and follow-ups) can be implemented without ambiguity.

5.1 Persite optin

Browser tap is opt-in per site. Sites that already have SentryDatadogetc. or sites that handle highly sensitive content (banking, healthcare under strict compliance) can leave it off. The site config in sites.toml gains a new section:

[sites.routes."/"]
upstream = "http://127.0.0.1:8001"

[sites.browser_tap]
enabled = true
sample_rate = 1.0          # 0.0 to 1.0
exclude_routes = ["/admin

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/kortex-015-browser-event-source.kmd