Dns RFC 001 koder dns platform

RFC001 — Koder DNS Platform: authoritative DNSasaservice for the Koder Stack

Field Value
Status *ccepted*(20260415)
Author(s) Rodrigo (with Claude as scribe)
Date 20260415
Target module platform/dns/
Related apps/domains/ tickets #007–#012

1. Summary

This RFC defines the architecture and phased roadmap for *oder DNS* an authoritative DNSasa-service platform built into the Koder Stack. The goal is to give Koder customers (and the Koder infrastructure itself) a fully managed DNS service with advanced features — GeoDNS, failover, DDNS, zone analytics, and secondary DNS — backed by a global anycast network operated by Koder.

The project is divided into three phases:

  • *hase 0*— Software-only: advanced DNS logic (failover, GeoDNS, DDNS,

    statistics) delivered on top of existing DNS providers (ClouDNS, Porkbun). Zero infrastructure investment. Shipped via apps/domains.

  • *hase 1*— Own anycast: Koder acquires an ASN and a PI address block,

    stands up 3–5 PoPs in strategic regions, and begins serving authoritative DNS with its own anycast network.

  • *hase 2*— Global expansion: 15+ PoPs across all continents, full

    competitive parity with ClouDNS, Cloudflare DNS, and Route53.


2. Goals

  1. *NSasa-service for Koder customers*— any Koder customer can delegate

    their domain's DNS to the Koder DNS platform and manage records via apps/domains or the API.

  2. *lobal low-latency resolution*— via anycast, queries resolve at the

    nearest PoP. Target: p95 < 10 ms for South America, Europe, and North America by end of Phase 1.

  3. *dvanced routing*— GeoDNS (return different IPs per region/country),

    failover (autoswitch on server down), and weighted roundrobin.

  4. *perational self-sufficiency*— the Koder infrastructure itself migrates

    off ClouDNS to its own DNS platform, eliminating a third-party dependency for a core service.

  5. *eveloper-grade API*— full CRUD for all record types, zone file

    importexport, DDNS endpoint, and TerraformOpenTofu provider.

  6. *bservability*— per-zone query analytics, latency per PoP, anomaly

    detection.


3. Non-goals

  • *ecursive DNS resolver*— Koder DNS is authoritative only. It answers

    for zones it is authoritative for; it does not resolve arbitrary queries on behalf of end users.

  • *egistrar services*— domain registration stays in apps/domains via

    Porkbun/Dynadot integrations. Koder DNS is not a registrar.

  • *DoS scrubbing*— traffic scrubbing for volumetric attacks is out of

    scope for Phase 1 and 2. Anycast inherently distributes attack traffic across PoPs, which provides partial mitigation, but dedicated scrubbing is a separate workstream.

  • *NSSEC signing in Phase 0*— DNSSEC is planned for Phase 1 once we

    control the authoritative servers.

  • *ull Cloudflare parity*— Cloudflare has 300+ PoPs and a WAF, CDN, and

    Workers platform built on their anycast network. That is a decade of infrastructure investment. This RFC scopes to DNS specifically.


4. Background

Today the Koder Stack uses *louDNS*as its DNS provider. ClouDNS hosts the koder.dev zone and provides a DNS management API that is used by apps/domains and by automation scripts (ACME DNS-01, deployment hooks).

ClouDNS is reliable and inexpensive (~$20/month), but it represents an external dependency for a service that is foundational to every Koder product. More importantly, as Koder grows into a platform company, DNS becomes a product in its own right — one that generates revenue, deepens the platform lock-in, and differentiates the Koder Stack from competitors.

The features that matter most for customers — GeoDNS, failover, DDNS — can be partially delivered in software today (Phase 0) and fully delivered with own infrastructure later (Phase 1+).


5. Architecture overview

┌─────────────────────────────────────────────────────────────┐
│                      Koder DNS Platform                      │
│                                                             │
│  ┌─────────────────┐      ┌──────────────────────────────┐  │
│  │  apps/domains   │      │      platform/dns (engine)   │  │
│  │  (management UI)│─────▶│                              │  │
│  └─────────────────┘      │  - Zone management API       │  │
│                           │  - GeoDNS routing rules      │  │
│  ┌─────────────────┐      │  - Failover monitor          │  │
│  │  REST/gRPC API  │─────▶│  - DDNS endpoint             │  │
│  │  (external)     │      │  - Analytics collector       │  │
│  └─────────────────┘      │  - Zone file import/export   │  │
│                           └──────────┬───────────────────┘  │
└──────────────────────────────────────┼──────────────────────┘
                                       │
              ┌────────────────────────┼──────────────────────┐
              │  Phase 0               │  Phase 1+             │
              │  (provider APIs)       │  (own anycast)        │
              │                        │                       │
         ┌────▼────┐  ┌────────┐  ┌───▼──────────────────┐   │
         │ ClouDNS │  │Porkbun │  │  Koder Anycast Network│   │
         │  API    │  │  API   │  │  (PoPs: GRU, IAD, FRA)│   │
         └─────────┘  └────────┘  └──────────────────────┘   │

The engine (platform/dns) is the single source of truth for zone state. In Phase 0 it synchronizes that state to external providers via their APIs. In Phase 1+ it synchronizes to its own authoritative name servers at each PoP.


6. Phase 0 — Software-only (no infrastructure investment)

*bjective:*deliver the highest-value DNS features immediately, on top of existing DNS providers, with zero infrastructure cost.

6.1 Components

All logic lives in platform/dns (engine) and is surfaced via apps/domains (UI) and a REST API.

Feature Implementation
Failover DNS Health check poller (HTTPTCPICMP) + provider API to swap records on failure
GeoDNS (approx.) Push region-based records to ClouDNS GeoDNS API; UI to define rules
DDNS Scoped token endpoint; updates A/AAAA via provider API
Zone file import BIND parser → bulk create via provider API
Zone file export Read records from provider API → generate BIND zone file
DNS statistics Pull query stats from ClouDNS API; store in time-series DB
Secondary DNS Configure slave zones via ClouDNS API

6.2 Failover engine detail

Health check loop (per monitored endpoint):
  every N seconds → probe primary IP (HTTP 200? TCP connect? ICMP alive?)
  on failure:
    wait M seconds (anti-flap)
    if still down: call provider API → update A record to fallback IP
    emit notification (Koder Talk / email / webhook)
  on recovery:
    wait recovery_delay
    call provider API → restore primary IP
    emit notification

The poller runs as a goroutine pool inside the platform/dns server. State (current active IP per monitored record) is stored in kdb.

6.3 Acceptance criteria

  • Failover detects a down server within 2× the configured check interval and

    switches the record within 30 seconds of detection.

  • GeoDNS rules defined in the UI are reflected in ClouDNS within 60 seconds.
  • DDNS endpoint updates a record within 5 seconds of receiving a valid

    request.

  • Zone file round-trip (export → import on a fresh zone) produces identical

    records.


7. Phase 1 — Own anycast infrastructure (first PoPs)

*bjective:*operate Koder's own authoritative DNS servers, served via anycast, eliminating the dependency on ClouDNS for koder.dev and offering the platform to customers.

7.1 Infrastructure prerequisites

*nycast strategy (Phase 1): managed anycast*— use a provider that already has a global anycast network and leases it as a service (Vultr BGP, Hivelocity Anycast, or Cloudflare Spectrum). This avoids operating BGP sessions, IXP memberships, and a NOC before there is paying customer demand. Own BGP infrastructure is deferred to Phase 2 (§8), at which point Koder registers its own ASN and PI block at LACNIC.

Item Description Procurement path
Managed anycast IPs Lease anycast-capable IPs from provider Vultr BGP / Hivelocity
PoP — GRU (São Paulo) VPS or dedicated with anycast IP at São Paulo Vultr São Paulo / Hivelocity
PoP — IAD (Ashburn) VPS or dedicated with anycast IP at Ashburn Vultr Ashburn / Hivelocity
PoP — FRA (Frankfurt) VPS or dedicated with anycast IP at Frankfurt Hetzner Frankfurt + BGP addon

7.2 Software stack at each PoP

Phase 1 uses *anaged anycast*— the provider (VultrHivelocityHetzner) handles BGP announcement. Koder operates only the DNS software layer.

Each PoP runs:

┌──────────────────────────────────────┐
│  Koder Herald PoP node               │
│                                      │
│  ┌─────────────────┐                 │
│  │  Knot DNS       │ ← authoritative │
│  │  (zone serving) │   name server   │
│  └────────┬────────┘                 │
│           │ zone sync                │
│  ┌────────▼────────┐                 │
│  │  koder-dns-sync │ ← pulls zones   │
│  │  (agent)        │   from engine   │
│  └─────────────────┘                 │
│                                      │
│  Anycast IP: leased from provider    │
│  BGP announcement: provider-managed  │
└──────────────────────────────────────┘
  • *not DNS* high-performance authoritative name server (C, handles

    millions of queries/second per core). Chosen for builtin modgeoip and rate limiting modules (see §12, decision #3). Receives zone updates from koder-dns-sync.

  • *oderdnssync* a lightweight Go agent (part of platform/dns)

    that polls the DNS engine for zone changes and pushes them to the local Knot instance via AXFR or incremental API.

*hase 2 note* own BGP (BIRD 2, ASN, PI /24 block via LACNIC) replaces the managed anycast layer in Phase 2, adding full traffic engineering and IXP peering without changing the Knot DNS + koderdnssync software stack.

7.3 Zone distribution flow

User edits record in apps/domains
    → platform/dns engine stores in kdb
    → increments zone serial
    → koder-dns-sync agents at each PoP poll for serial changes
    → pull updated zone → reload NSD/Knot
    → TTL governs how long resolvers cache the old answer

Target propagation: record change visible at all PoPs within 30 seconds.

7.4 GeoDNS with own infrastructure

With own anycast, true GeoDNS works as follows:

  • Each PoP announces the same anycast IP.
  • BGP routing ensures resolvers query the nearest PoP.
  • The PoP serves the answer for the resolver's region.
  • NSD/Knot supports GeoIP-based views natively (via BIND views syntax

    or Knot's builtin `modgeoip` module).

  • GeoIP database: MaxMind GeoLite2 (free) or GeoIP2 City (paid, higher

    accuracy). Updated weekly via automated job.

7.5 Acceptance criteria

  • koder.dev resolves from GRU, IAD, and FRA PoPs with p95 latency < 5 ms

    from each respective region.

  • A record change propagates to all PoPs within 30 seconds.
  • Failure of a single PoP is transparent to end users (BGP withdraws the

    announcement; traffic reroutes to nearest remaining PoP automatically).

  • Koder DNS is serving at least 10 external customer zones.

8. Phase 2 — Global expansion

*bjective:*reach 15+ PoPs across all continents, achieving sub-10ms resolution worldwide and full competitive parity with ClouDNS.

8.1 Target PoP map

Region City Priority
South America São Paulo (GRU) Phase 1
North America East Ashburn (IAD) Phase 1
Europe West Frankfurt (FRA) Phase 1
North America West Los Angeles (LAX) Phase 2
Europe North Amsterdam (AMS) Phase 2
Asia Pacific Singapore (SIN) Phase 2
Asia Pacific Tokyo (NRT) Phase 2
Australia Sydney (SYD) Phase 2
South America Buenos Aires (EZE) Phase 2
Africa Johannesburg (JNB) Phase 2
Middle East Dubai (DXB) Phase 2
Europe South Madrid (MAD) Phase 2
North America Central Chicago (ORD) Phase 2
Asia Mumbai (BOM) Phase 2
Europe East Warsaw (WAW) Phase 2

8.2 Additional Phase 2 features

  • *NSSEC*— sign zones at the authoritative server; publish DS records at

    registrar.

  • *NS over HTTPS (DoH) and DNS over TLS (DoT)*— encrypted resolver

    endpoints for privacy-conscious clients.

  • *erraform / OpenTofu provider*— koder/dns provider for infrastructure

    as code.

  • *ebhook on record change*— notify external systems when DNS state

    changes.

  • *ate limiting and abuse protection*— per-source IP query rate limiting

    at the PoP level to mitigate amplification attacks.

  • *LA dashboard*— public uptime page per PoP, per zone.

9. Engine module: platform/dns

9.1 Responsibilities

  • *one store* authoritative state of all zones and records (backed by kdb).
  • *rovider sync* for Phase 0, push zone changes to ClouDNS/Porkbun APIs.

    For Phase 1+, push to PoP agents via zone transfer.

  • *ailover engine* health check poller + automatic record switching.
  • *eoDNS rule engine* store and evaluate geographic routing rules.
  • *DNS service* token-gated endpoint to update A/AAAA records.
  • *nalytics ingestion* receive query stats from PoP agents; store in

    time-series DB.

  • *one file I/O* BIND zone file parser and serializer.
  • *anagement API* REST + gRPC, consumed by apps/domains and external

    clients.

9.2 Tech stack

  • *anguage* Go (consistent with platform/kmail, platform/raven,

    platform/id)

  • *torage* kdb (zone records, failover state, GeoDNS rules)
  • *ueue* for async zone sync jobs
  • *bservability* OpenTelemetry traces + metrics, exported to

    observe/ stack

9.3 API surface (draft)

# Zones
GET    /api/v1/zones
POST   /api/v1/zones
GET    /api/v1/zones/{zone}
DELETE /api/v1/zones/{zone}

# Records
GET    /api/v1/zones/{zone}/records
POST   /api/v1/zones/{zone}/records
PUT    /api/v1/zones/{zone}/records/{id}
DELETE /api/v1/zones/{zone}/records/{id}

# Zone files
GET    /api/v1/zones/{zone}/export          → BIND zone file
POST   /api/v1/zones/{zone}/import          ← BIND zone file

# Failover
GET    /api/v1/zones/{zone}/monitors
POST   /api/v1/zones/{zone}/monitors
PUT    /api/v1/zones/{zone}/monitors/{id}
DELETE /api/v1/zones/{zone}/monitors/{id}

# GeoDNS
GET    /api/v1/zones/{zone}/records/{id}/geo-rules
POST   /api/v1/zones/{zone}/records/{id}/geo-rules
DELETE /api/v1/zones/{zone}/records/{id}/geo-rules/{id}

# DDNS
POST   /api/v1/ddns/update?token=&hostname=&ip=

# Analytics
GET    /api/v1/zones/{zone}/stats?from=&to=&resolution=

10. Infrastructure costs (estimates)

Phase 0

Item Cost
ClouDNS (keep) ~$20/month
Engineering (internal)
*otal* *$20/month*

Phase 1 (3 PoPs)

Item Cost/month (est.)
ASN + PI /24 (LACNIC) R$70 ($13)
PoP GRU (1U colo or VPS BGP) ~$80–150
PoP IAD (Vultr BGP or colo) ~$80–150
PoP FRA (Hetzner dedicated) ~$80–150
IXP memberships (PTT.br etc) ~$30–80
*otal* *$300–550/month*

Phase 2 (15 PoPs)

Estimated $1,500–3,000/month depending on colocation vs. cloud BGP mix. At that scale, DNSasa-service revenue from customers should cover costs.


11. Relationship to existing modules

Module Relationship
apps/domains Primary UI for managing zones and records. Tickets #007–#012 are Phase 0 deliverables
platform/raven Raven provisions MX + SPF + DKIM records for email tenants; will call platform/dns API
platform/id ID service provisions DNS records for new tenant subdomains
infra/ PoP node configs, BIRD BGP templates, NSD/Knot configs
observe/ PoP health metrics, query rate dashboards, latency per region
platform/kdb Primary storage for zone state, failover state, analytics

12. Decisions

# Question Decision Rationale
1 Product name *erald*(platform/dns engine · apps/herald product) — Brand 84 (Great) Evaluated 10 candidates; atlasnexussignal eliminated by collision; herald leads (84) followed by relay (81) and resolve (75). "Herald" = the one who announces/delivers — direct DNS metaphor, no major tech collision
2 Anycast strategy for Phase 1 *anaged anycast first*(Vultr BGP or Hivelocity), own BGP deferred to Phase 2 Faster timetomarket; Phase 1 goal is proving the product, not the infrastructure. Own BGP adds significant operational risk (NOC, BGP incidents, IXP memberships) before there is paying customer demand to justify it
3 Authoritative name server *not DNS* modgeoip builtin (eliminates custom GeoDNS code in Phase 1), built-in rate limiting, actively developed, used in production by large ccTLD operators (CZ.NIC). NSD is excellent but lacks the module system
4 GeoIP data provider *axMind GeoLite2*to start; evaluate upgrade to GeoIP2 City when Phase 1 has >50 customer zones GeoLite2 is accurate enough for continentlevel routing (Phase 1 target). Countrylevel accuracy justifies the GeoIP2 City cost only at scale
5 RIR for ASN and PI block *ACNIC* Koder is a Brazilian company; LACNIC is the RIR for Latin America and the Caribbean. ARIN is for North America — no justification to register there first

All questions resolved

All 5 open questions are now closed. See table above.


13. Backlog summary

platform/dns/backlog/ (new tickets — engine)

  • #001 Zone store and management API (CRUD + kdb backend)
  • #002 Provider sync adapter (ClouDNS + Porkbun) — Phase 0 bridge
  • #003 Failover health check engine
  • #004 GeoDNS rule engine
  • #005 DDNS token service
  • #006 Zone file import/export (BIND)
  • #007 Analytics ingestion and time-series storage
  • #008 koderdnssync agent (PoP agent for Phase 1)
  • #009 BIRD BGP config templates and PoP provisioning runbook
  • #010 Knot DNS integration and GeoIP module setup

apps/domains/backlog/ (already created — tickets #007–#012)

  • #007 Failover DNS UI
  • #008 GeoDNS UI
  • #009 DDNS management
  • #010 DNS statistics dashboard
  • #011 Secondary DNS / zone transfer UI
  • #012 Zone file import/export UI

infra/backlog/ (infrastructure)

  • ASN registration (LACNIC)
  • PI /24 IPv4 + /48 IPv6 block (LACNIC)
  • PoP GRU colocation contract
  • PoP IAD colocation/BGP contract
  • PoP FRA colocation/BGP contract
  • IXP memberships (PTT.br, Equinix IX, DE-CIX)

Source: ../home/koder/dev/koder/meta/docs/stack/rfcs/dns-RFC-001-koder-dns-platform.md