Skip to content

Edge gateway feature bundle: multi-WAN, secure DNS/DDNS, WireGuard, PBR, and smart queueing #1389

@psaab

Description

@psaab

Request

Chris Putnam wants an easy, integrated edge-gateway feature set:

  • multi-WAN with failover
  • DHCP leases registered into DNS, with DNS cleanup on lease expiry/reassignment
  • DNS proxy/forwarder with secure upstream DNS over TLS or HTTPS, and secure downstream service where appropriate
  • easy WireGuard setup
  • policy-based routing for multi-WAN
  • fq_codel/CAKE-like smart queueing that is easy to enable or default-on where safe

This issue expands that request into an implementation plan and calls out what xpf already has versus what is still missing.

Current repo status

Already present or partially present

  • HA chassis/failover foundation exists. README documents chassis cluster failover, VRRP, session sync, dual fabric links, fabric cross-chassis forwarding, and ISSU at README.md:111-124. HA validation docs exercise failover and traffic continuity at testing-docs/ha-cluster.md:1-45 and userspace RG move validation at testing-docs/ha-cluster.md:139-150.
  • Routing instances, VRFs, and PBR exist. README lists VRFs, inter-VRF leaking, GRE/XFRM/PBR at README.md:103-107; feature-gap docs summarize static routes, ECMP, VRFs, GRE/IPIP, rib-groups, next-table route leaking, and PBR at docs/feature-gaps.md:250-252. The kernel/eBPF compiler maps firewall-filter then routing-instance into route action/table IDs at pkg/dataplane/compiler_filter.go:363-371. Userspace snapshot generation imports routing instances and synthetic inter-VRF leak routes at pkg/dataplane/userspace/snapshot.go:1059-1075.
  • DHCP server exists. Kea-backed DHCPv4/v6 config rendering exists in pkg/dhcpserver/dhcpserver.go:183-258 and pkg/dhcpserver/dhcpserver.go:273-329; pool config already models DNS servers, lease time, and domain at pkg/config/types.go:1004-1014. Historical commits include dc3f1621 (DHCP server support), feb927fe (RA + DHCPv6), and 6de466e7 (DHCP lease display via CLI/gRPC).
  • DHCP lease DNS updates are already tracked separately. See DHCP server: dynamic DNS updates and stale lease cleanup #1387, "DHCP server: dynamic DNS updates and stale lease cleanup". That issue should remain the detailed DDNS implementation tracker.
  • Router Advertisement can advertise recursive DNS. RA config carries DNSServers at pkg/config/types.go:1655-1666, and RA sender builds RDNSS options in pkg/ra/sender.go.
  • DNS proxy syntax/plan exists, but runtime is missing. dns-proxy: replace systemd-resolved toggle with real firewall DNS proxy runtime #660 was the earlier DNS proxy runtime tracker, and docs/next-features/dns-proxy.md:1-80 records the plan. However current config still warns that system services dns dns-proxy has no real runtime at pkg/config/compiler.go:613-615, and docs/feature-gaps.md:407 still marks DNS proxy as missing.
  • WireGuard has architecture research but no implementation. docs/vpp-dataplane-assessment.md:716-849 evaluates kernel WireGuard + TC/veth, userspace WireGuard + AF_XDP, VPP WG, and DPDK-integrated WG. It recommends kernel WG + veth/XDP as the pragmatic first step, but there is no production config/runtime code for WireGuard today.
  • fq_codel is not implemented as an xpf feature. Current docs mention Linux tc-fq/tc-fq_codel as external/test-host mitigations (docs/per-5-tuple/even-flows-recipe.md:97-100, docs/cross-worker-flow-fairness-research.md:636-641), but there is no first-class WAN smart-queueing config/runtime. Userspace CoS has MQFQ/DRR/fairness work, but that is not the same as operator-facing fq_codel/CAKE-style AQM on WAN egress.

Product direction

This should be treated as an "edge gateway profile" rather than six unrelated knobs. The operator should be able to turn up a multi-WAN site with sane defaults:

  1. define WAN uplinks and health probes;
  2. define LANs/prefixes and DHCP pools;
  3. enable local DNS service and DDNS registration;
  4. optionally enable WireGuard remote/site access;
  5. attach policies that select WANs by source, destination, app, or routing-instance;
  6. get smart queueing automatically where the link is shaped or measured.

The CLI should expose the underlying knobs for advanced users, but the primary workflow should be short and hard to misconfigure.

Proposed work plan

Phase 1: Multi-WAN service model and health state

Add a first-class multi-WAN model above raw routing instances:

set services multi-wan uplink wan-a interface reth0.100
set services multi-wan uplink wan-a routing-instance ISP-A
set services multi-wan uplink wan-a priority 100
set services multi-wan uplink wan-a weight 1
set services multi-wan uplink wan-a health-check target 1.1.1.1 protocol icmp
set services multi-wan uplink wan-b interface reth0.200
set services multi-wan uplink wan-b routing-instance ISP-B
set services multi-wan policy default mode failover primary wan-a backup wan-b

Implementation notes:

  • Reuse existing routing instances/PBR instead of inventing a parallel forwarding plane.
  • Reuse/extend RPM probes for health checks where possible.
  • Keep health state separate from config so route changes are event-driven and observable.
  • Add hysteresis: consecutive-success/consecutive-fail thresholds, hold-down, min-up-time, and flap counters.
  • Add show services multi-wan with uplink state, selected primary, failover reason, last probe result, active sessions, and route/PBR installs.
  • HA behavior: only the active RG owner should publish live route/NAT changes for that RG; failover must reconstruct selected uplinks from health state or immediately re-probe.

Acceptance:

  • Losing primary WAN probe removes that uplink from selection within a bounded time.
  • Existing sessions either stay pinned when possible or are explicitly drained/reset with visible reason counters.
  • New sessions use backup WAN after failover.
  • Recovery honors hold-down before moving traffic back.

Phase 2: Policy-based routing for multi-WAN

Build a friendly policy layer on top of existing filter then routing-instance support:

set services multi-wan policy video match application [ zoom teams ] uplink wan-a
set services multi-wan policy backup match source-prefix 10.10.50.0/24 uplink wan-b
set services multi-wan policy default mode load-balance uplinks [ wan-a wan-b ] algorithm weighted-flow-hash

Implementation notes:

  • Compile policies into existing firewall-filter/PBR/routing-instance primitives where possible.
  • For userspace dataplane, carry the selected WAN/uplink in the session so return path, SNAT, and failover behavior are stable.
  • NAT must bind to the chosen egress uplink/pool and remain sticky for session lifetime.
  • Add counters: policy hits, selected uplink, failover-to-backup hits, no-healthy-uplink drops/fallbacks.
  • Support both failover and weighted flow-hash modes; do not start with per-packet balancing.

Acceptance:

  • Source-prefix policy directs traffic into expected routing instance/uplink.
  • App/class policy works for known app IDs and falls back predictably for unknown app IDs.
  • Unhealthy uplinks are excluded from new-flow selection.
  • Existing PBR behavior remains backwards-compatible.

Phase 3: DHCP DDNS integration

Use #1387 as the detailed tracker. This umbrella issue depends on it for:

  • DHCPv4/v6 active lease watcher over Kea lease CSV/state.
  • hostname/FQDN normalization.
  • A/AAAA/PTR creation.
  • cleanup on expiry, release, reassignment, and pool removal.
  • xpf-owned-record state store so we never delete records we did not create.
  • HA-aware emission only from the active DHCP-serving owner.

Multi-WAN-specific requirements:

  • DDNS must work for LANs behind any WAN uplink.
  • DNS update transport should be able to choose an upstream routing-instance/uplink if the DNS server is reachable only through a specific WAN.
  • DDNS failures must not block DHCP leases by default; they must surface in counters/status and retry.

Phase 4: Real DNS proxy with secure upstream/downstream

Extend the #660 / docs/next-features/dns-proxy.md plan into a secure DNS runtime:

Config sketch:

set system services dns dns-proxy listen-interface reth1.0
set system services dns dns-proxy default-domain home.example
set system services dns dns-proxy upstream quad9 address 9.9.9.9 protocol dot tls-server-name dns.quad9.net
set system services dns dns-proxy upstream cloudflare url https://cloudflare-dns.com/dns-query protocol doh
set system services dns dns-proxy cache-size 10000
set system services dns dns-proxy downstream plain enable
set system services dns dns-proxy downstream dot enable certificate local-dns-cert
set system services dns dns-proxy downstream doh enable certificate local-dns-cert path /dns-query

Implementation notes:

  • First runtime target should probably still be managed unbound for plain DNS + DoT upstream and caching. DoH upstream/downstream may need CoreDNS, dnsdist, or a small purpose-built proxy if unbound is insufficient for the full product goal.
  • Separate host resolver behavior from client-facing firewall DNS proxy behavior. Do not use systemd-resolved as the fake DNS proxy runtime.
  • Bind listeners per interface/routing-instance. Avoid accidentally exposing DNS on WAN.
  • Support ACLs/default-deny by source prefix/zone/interface.
  • Add certificate management integration for downstream DoT/DoH, or clearly require an existing local certificate object for phase 1.
  • Integrate with DHCP server: dynamic DNS updates and stale lease cleanup #1387 so DHCP-created local names are served authoritatively before forwarding upstream.

Acceptance:

  • Clients can query firewall DNS on intended LAN interfaces.
  • Upstream forwarding works over plain DNS and DoT in phase 1; DoH is either implemented or explicitly tracked as phase 2.
  • Downstream secure DNS is either DoT first or DoH first, with a documented cert path and access control.
  • DNS proxy honors routing-instance/uplink selection for upstreams.
  • HA failover moves listener ownership cleanly.

Phase 5: Easy WireGuard setup

Use the existing WireGuard architecture decision in docs/vpp-dataplane-assessment.md:716-849.

Recommended phase-1 architecture:

  • Kernel WireGuard for crypto.
  • veth/tun plumbing so decrypted traffic re-enters xpf policy/routing/NAT path.
  • Zone binding for the decrypted WireGuard side.
  • XDP/userspace dataplane still handles physical NIC traffic and encrypted UDP on WAN.

Config sketch:

set services wireguard interface wg0 listen-port 51820
set services wireguard interface wg0 address 10.44.0.1/24
set services wireguard interface wg0 wan-uplink wan-a
set services wireguard peer phone public-key <key> allowed-address 10.44.0.2/32
set services wireguard peer phone preshared-key <secret>
set services wireguard peer phone route-mode remote-access

Operator tooling:

  • Generate server keypair.
  • Add peer with generated client config/QR output.
  • Optional dynamic endpoint/roaming peer support.
  • Automatic firewall policy template for remote-access and site-to-site modes.
  • Optional multi-WAN endpoint failover: publish endpoint DNS, prefer current healthy uplink, support keepalive.

Acceptance:

  • show services wireguard shows interface/peer handshake/bytes/endpoint/allowed IPs.
  • Remote-access peer can reach selected LAN through xpf policy.
  • Site-to-site peer can route prefixes through selected WAN/uplink.
  • Multi-WAN failover either preserves tunnel via endpoint roaming or reconnects within documented bounds.

Phase 6: Smart queueing / fq_codel-like default

The product ask is "fq_codel or similar easily enabled or on by default." For xpf that needs two paths:

  1. Kernel path / management interfaces: render Linux tc qdisc (fq_codel or cake) when traffic really egresses through kernel qdisc.
  2. Userspace dataplane / AF_XDP path: implement native AQM in the userspace CoS queues because Linux qdisc is bypassed.

Config sketch:

set class-of-service smart-queueing enable
set class-of-service smart-queueing default-profile wan
set class-of-service smart-queueing profile wan algorithm fq-codel
set class-of-service smart-queueing profile wan target 5ms
set class-of-service smart-queueing profile wan interval 100ms
set class-of-service smart-queueing profile wan ecn
set interfaces reth0 unit 100 family inet smart-queueing profile wan

Implementation notes:

  • For shaped WANs, default smart queueing should be enabled unless the operator disables it. For unshaped high-speed LAN/datacenter interfaces, default-off or passive telemetry-only is safer.
  • Userspace path should build on existing CoS per-flow fairness buckets, adding sojourn-time tracking and CoDel-style ECN/drop decisions at admission/dequeue.
  • Preserve current CoS exact guarantees: AQM cannot let best-effort steal from exact queues.
  • Export per-queue and per-class counters: sojourn p50/p99/p99.9, ECN marks, CoDel drops, tail drops, queue delay estimate, flow bucket count.
  • Provide a visible show class-of-service smart-queueing command.

Acceptance:

  • Under WAN saturation, latency-sensitive flow p99/p99.9 remains bounded while bulk traffic fills the link.
  • fq_codel/CoDel decisions are visible and attributable.
  • Exact-rate CoS queues keep their guaranteed rates under best-effort pressure.
  • Userspace dataplane and kernel-qdisc paths have separate tests because they exercise different machinery.

Suggested implementation slices

  1. Multi-WAN health/state model: config model, RPM/probe reuse, show command, no dataplane changes yet.
  2. Multi-WAN route/PBR compiler: compile policy to routing-instance/filter/session selection, with failover state gates.
  3. DNS proxy secure runtime: revive dns-proxy: replace systemd-resolved toggle with real firewall DNS proxy runtime #660 with concrete runtime backend and DoT first.
  4. DDNS from DHCP leases: implement DHCP server: dynamic DNS updates and stale lease cleanup #1387 and feed local DNS zone.
  5. WireGuard kernel+veth MVP: key/config renderer, wg lifecycle, zone/interface binding, show command.
  6. Smart queueing kernel path: tc qdisc renderer for non-AF_XDP egress.
  7. Smart queueing userspace path: native CoDel/AQM in CoS queues, with metrics and validation harness.
  8. Integrated edge-gateway validation: one lab profile with two WANs, DHCP+DDNS, DNS proxy, WG remote client, PBR, failover, and smart queueing under load.

Validation matrix

  • Single WAN baseline: DHCP, DNS proxy, DDNS, smart queueing, WireGuard all work.
  • Dual WAN healthy: policy chooses expected uplink; weighted mode distributes new flows by hash/weight.
  • Primary WAN failure: new sessions use backup; DNS upstream and DDNS update path continue if reachable.
  • Primary WAN recovery: hold-down prevents flap; optional preempt moves back only after stable.
  • DHCP lease expires/reassigns: DNS records cleaned.
  • DNS upstream DoT failure: fallback upstream selected; counters show failure.
  • WireGuard peer during WAN failover: reconnect behavior within documented bound.
  • Bufferbloat test: bulk download/upload under shape with ping/TCP echo latency gate.
  • HA RG failover during multi-WAN traffic: active owner owns DNS/DHCP/WG/listeners and route state.

Open design questions

  1. Should multi-WAN be expressed as a new high-level services multi-wan tree, or as templates that compile into existing routing-instances/firewall filters/NAT rules?
  2. Do we want DoT-only first for upstream secure DNS, or is DoH a phase-1 requirement?
  3. Which downstream secure DNS mode matters first: DoT on 853, DoH on 443, or both?
  4. For WireGuard, should phase 1 use TC on wg0 or veth re-entry into the xpf pipeline? Existing docs recommend veth/XDP for fuller pipeline reuse.
  5. For smart queueing, should default-on apply only when an interface has an explicit shape, or should we infer shape from measured WAN speed?
  6. How should smart queueing interact with current CoS exact/surplus/equal-flow enforcement when all are enabled?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions