Dns RFC 003 phase2 hyperscale
RFC-003 — Koder Herald Phase 2: Hyperscale Global Network
| Field | Value |
|---|---|
| Status | *raft*(2026 |
| Author(s) | Rodrigo (with Claude as scribe) |
| Date | 2026 |
| Target module | platform/dns/ |
| Depends on | RFC |
1. Summary
Phase 2 takes Koder Herald from a singleregion managedanycast setup to a fully selfoperated global anycast network. Koder acquires its own ASN and PI address block via LACNIC, deploys BIRD 2 at each PoP for BGP announcement, and grows to 15+ PoPs across all continents. At Phase 2 completion, Koder DNS is a competitive alternative to ClouDNS, Cloudflare DNS, and Amazon Route 53 for latencysensitive customers worldwide.
Phase 2 is not a single sprint — it is a multi-year programme that runs in parallel with the rest of the Koder Stack. It should be approached incrementally: each PoP added independently, each feature shipped as soon as it is ready.
Phase 2 entry criteria:
- Phase 1 is complete (São Paulo PoP live, 3+ paying customers, JWT auth in production)
- koder-herald engine is stable under production load
- Terraform provider is published and used by at least one customer
Phase 2 exit criteria:
- 15+ PoPs on 5 continents
- Own ASN and PI /24 block in production
- p95 query latency < 5 ms from South America and Europe; < 10 ms globally
- 1 000+ customer zones
- Full feature parity with ClouDNS premium tier
- SOC 2 Type II readiness (audit trail, access controls, encryption at rest)
2. Network Architecture
2.1 Own BGP (replaces managed anycast)
┌─────────────────────────────────────────────────────┐
│ Koder ASN (LACNIC PI /24) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ PoP BR │ │ PoP MIA │ │ PoP AMS │ ... │
│ │ BIRD 2 │ │ BIRD 2 │ │ BIRD 2 │ │
│ │ Knot DNS │ │ Knot DNS │ │ Knot DNS │ │
│ │ dns-sync │ │ dns-sync │ │ dns-sync │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ Upstreams + IXP peering (per PoP) │
└─────────────────────────────────────────────────────┘
│
koder-herald (engine)
(control plane — multi-region HA in Phase 2)Each PoP announces the same anycast IP prefix (/24) via BGP. Clients resolve to the nearest PoP by BGP route propagation and IXP peering. No DNS-level load balancing is needed — BGP handles it.
2.2 Control plane HA
In Phase 1, koderherald runs as a single instance on s.k.lin. In Phase 2, it runs as an activepassive pair (primary + standby) with automatic failover via Keepalived or a distributed lock in kdbnext. The database also becomes distributed (kdbnext on TiKV substrate).
2.3 PoP target list (15 PoPs)
| # | City | Region | IXP | Priority |
|---|---|---|---|---|
| 1 | São Paulo, BR | South America | PTT.br | Phase 1 (done first) |
| 2 | Miami, US | North America | NAP of the Americas | Phase 2 first |
| 3 | Amsterdam, NL | Europe | AMS-IX | Phase 2 |
| 4 | Frankfurt, DE | Europe | DE-CIX | Phase 2 |
| 5 | Singapore, SG | Asia Pacific | SGIX | Phase 2 |
| 6 | Los Angeles, US | North America | CoreSite LA | Phase 2 |
| 7 | London, UK | Europe | LINX | Phase 2 |
| 8 | Tokyo, JP | Asia Pacific | JPIX | Phase 2 |
| 9 | Sydney, AU | Oceania | IX Australia | Phase 2 |
| 10 | Johannesburg, ZA | Africa | NAPAfrica | Phase 2 |
| 11 | Buenos Aires, AR | South America | CABASE | Phase 2 |
| 12 | Toronto, CA | North America | TorIX | Phase 2 |
| 13 | Paris, FR | Europe | France-IX | Phase 2 |
| 14 | Mumbai, IN | Asia | DE-CIX Mumbai | Phase 2 |
| 15 | Warsaw, PL | Europe | THINX | Phase 2 |
3. Technical Decisions
Decision 1 — BGP software: BIRD 2
*hosen* BIRD 2.
*ationale* BIRD 2 is the industry standard for open-source BGP routing software. It is used by major IXPs, CDNs, and DNS operators worldwide. It is actively maintained, supports full BGP feature set (communities, AS path prepending, route filtering), and has excellent documentation. Alternative (FRRouting) is viable but BIRD 2 has simpler configuration for the anycast use case.
Each PoP runs BIRD 2 with:
- Session to the hosting provider's BGP peer
- Session to IXP route server (once peering is established)
- Announces the Koder PI /24 prefix
- AS path prepending for traffic engineering (prefer BR PoP for BR clients)
Decision 2 — ASN and PI block: LACNIC
*hosen* LACNIC (already decided in RFC-001 §12, decision #5).
*teps*
- Join LACNIC as a new member (Koder Tecnologia Ltda) — ~$1,500/year
- Request a PI /24 block (requires justification — our customer base qualifies)
- ASN assigned (16
bit or 32bit; 32-bit is sufficient) - Register route objects in the LACNIC IRR database
- Configure BIRD 2 at São Paulo PoP with the new ASN
Decision 3 — IXP strategy: PTT.br first, then regional IXPs
*hosen* Join PTT.br (São Paulo) in Phase 2, then join the IXP local to each new PoP.
*ationale* IXP peering dramatically reduces query latency (no transit hop to major networks). PTT.br is free for members and has 600+ peers. For each subsequent PoP, join the regional IXP — AMSIX, DECIX, etc.
Decision 4 — GeoIP upgrade: MaxMind GeoIP2 City at >50 customer zones
*hosen* Upgrade from GeoLite2 (free, countrylevel) to GeoIP2 City (paid, citylevel) when Herald manages >50 customer zones. Threshold justifies the ~$200/year cost.
*mpact* Unlocks city-level GeoDNS rules (e.g., "São Paulo users → cluster A, Rio de Janeiro users → cluster B").
Decision 5 — Real-time analytics: PoP agents stream query samples
*hosen* Each koderdnssync agent emits periodic query count samples (per zone, per minute) to Herald via HTTP POST. Herald aggregates them into zone_stats with 1-minute resolution. Existing hourly ClouDNS stats pipeline is replaced.
*ormat* POST /api/v1/sync/stats with body {zone_id, period_start, query_count, nxdomain, by_type}.
Decision 6 — Control plane HA: activepassive with kdbnext
*hosen* Activepassive Herald pair once kdbnext is production-ready. Until then, single instance is acceptable (planned downtime < 5 min for deployments).
Decision 7 — DNSSEC: full automation with key rotation
*hosen* Knot DNS keymgr handles all key lifecycle (creation, rollover, revocation). Herald only stores DNSSEC metadata (DS records for delegation). Key rollovers happen automatically on a configurable schedule (default: KSK 2 years, ZSK 90 days).
Decision 8 — Secondary DNS as a service
*hosen* Phase 2 allows external providers (customers' other DNS providers) to slave zones from Koder Herald via AXFR. Herald implements AXFR as a server (RFC 5936). This enables customers to use Koder as primary with a different provider as secondary for redundancy.
4. Feature Completeness Targets
By end of Phase 2, Koder Herald must support:
| Feature | Phase 1 | Phase 2 |
|---|---|---|
| AAAAACNAMEMXTXTNSSRVCAAPTR | ✅ | ✅ |
| GeoDNS (country + region) | ✅ (ClouDNS) | ✅ (own PoPs) |
| GeoDNS (city-level) | — | ✅ (GeoIP2 City) |
| Failover (health check → swap IP) | ✅ | ✅ |
| DDNS (DynDNS v2 compat) | ✅ | ✅ |
| DNSSEC | — | ✅ |
| Zone import/export (BIND) | ✅ | ✅ |
| Analytics (per-zone query stats) | ✅ (ClouDNS) | ✅ (own PoPs, 1-min buckets) |
| Anomaly detection | — | ✅ |
| Weighted round-robin | — | ✅ |
| Private DNS / split-horizon | — | ✅ |
| DNS firewall / RPZ | — | ✅ |
| Secondary DNS (AXFR server) | — | ✅ |
| Terraform provider | Phase 1 | ✅ |
| Webhooks / event streaming | — | ✅ |
CLI (koder dns) |
— | ✅ |
| Rate limiting per tenant | Phase 1 | ✅ |
| Zone locking / change review | Phase 1 | ✅ |
| SLA monitoring + status page | — | ✅ |
| SOC 2 Type II readiness | — | ✅ |
5. Open Questions
- *ontrol plane HA provider* Keepalived vs kdb
next distributed lock — decide when kdbnext is near production. - *APTR record support* Required for VoIP/SIP customers. Add to Phase 2 record type support.
- *AA and certificate automation* Integrate with Koder's future CA or Let's Encrypt validation flow.
- *GP communities for traffic engineering* Which communities to honour from upstreams (LACNIC region, no-export, etc.) — define when BGP sessions are live.
- *XP peering policy* open peering (accept all IX peers) or selective — decide at PTT.br onboarding.
- *ricing model at scale* flat per
zone fee adequate at 1 000+ zones, or shift to perquery pricing above a threshold.
6. Non-Goals for Phase 2
- *ecursive DNS resolver*— Koder DNS remains authoritative only (RFC-001 §3).
- *omain registrar*— stays in apps/domains via third-party registrars.
- *DoS scrubbing*— use a scrubbing provider (Cloudflare Magic Transit or similar) upstream of PoPs; not built in-house.
- *DNS / local network discovery*— not relevant for a public authoritative DNS platform.