Skip to content

DHCP server: dynamic DNS updates and stale lease cleanup #1387

@psaab

Description

@psaab

Problem

When xpf hands out DHCP leases through the Kea-backed DHCP server, it does not publish hostnames into DNS and it does not clean those records when the lease expires or the address is handed to a different client.

Current source of truth:

  • pkg/dhcpserver/dhcpserver.go renders /etc/kea/kea-dhcp{4,6}.conf and restarts kea-dhcp{4,6}-server.
  • pkg/dhcpserver/dhcpserver.go reads /var/lib/kea/kea-leases4.csv and kea-leases6.csv for show dhcp-server lease display.
  • pkg/config/types.go models DHCP pools with Domain, DNSServers, LeaseTime, RangeLow, RangeHigh, and Subnet, but no DDNS policy.
  • docs/next-features/dns-proxy.md tracks a future xpf-managed DNS proxy; today there is no authoritative/dynamic DNS runtime inside xpf.

Operators need DHCP client names to resolve while leases are active, and stale A/AAAA/PTR records must be removed when a lease expires, is released, or the same address is reassigned to a different machine.

Goals

  1. Optional DDNS for DHCPv4 and DHCPv6 server leases.
  2. Add forward records (A / AAAA) and reverse records (PTR) for eligible leases.
  3. Clean records when:
    • a lease expires,
    • a lease is released/declined/reclaimed,
    • a client gets a new address,
    • an address is reassigned to a different client,
    • the DHCP pool/group/interface is removed from config.
  4. Never delete records that xpf did not create.
  5. Work across daemon restart and HA failover.
  6. Keep DHCP serving independent of DNS update failures: log/counter failures, retry, but do not block leases unless an explicit strict mode is added later.

Proposed config surface

Add an opt-in DDNS block under DHCP local server config. Exact syntax can be refined, but the model should support at least:

set system services dhcp-local-server dynamic-dns enable
set system services dhcp-local-server dynamic-dns domain lab.example.net
set system services dhcp-local-server dynamic-dns forward-zone lab.example.net
set system services dhcp-local-server dynamic-dns reverse-zone 80.16.172.in-addr.arpa
set system services dhcp-local-server dynamic-dns ttl 300
set system services dhcp-local-server dynamic-dns hostname-source client-hostname
set system services dhcp-local-server dynamic-dns conflict-policy replace-owned
set system services dhcp-local-server dynamic-dns update-server 192.0.2.53 key <tsig-key-name>

Allow per-pool override later, but start with server/global defaults plus pool domain-name as the default suffix when present.

Suggested config fields:

  • Enabled bool
  • Domain string
  • ForwardZones []string
  • ReverseZones []string
  • TTLSeconds int
  • HostnameSource: client-hostname | fqdn | client-id | mac-fallback
  • ConflictPolicy: replace-owned | skip-existing | strict-fail
  • UpdateServer, TSIGKeyName, TSIGSecret, TSIGAlgorithm for RFC 2136 updates
  • future: Backend: rfc2136 | local-dns-runtime

Architecture plan

Phase 1: Lease identity and state store

Add a small DDNS state store under /var/lib/xpf/dhcp-ddns-state.json or sqlite if we already have a preferred local DB.

Each owned DNS record set stores:

  • lease family: v4/v6
  • lease identity: v4 client-id/hwaddr; v6 DUID/IAID when available
  • address
  • hostname/FQDN
  • forward zone and reverse zone
  • record types created (A, AAAA, PTR, ownership marker)
  • lease expiry time
  • pool/group/interface metadata
  • last successful update time and retry state

The state store is the protection boundary: cleanup only deletes records that match xpf-owned state. Do not delete arbitrary existing DNS records just because they match a DHCP hostname.

Phase 2: Hostname normalization

Implement deterministic name selection:

  1. Prefer DHCP FQDN option when present and allowed.
  2. Else use DHCP host-name option from Kea lease data.
  3. Else fallback to a deterministic generated name, e.g. dhcp-<sanitized-client-id> or dhcp-<mac-without-colons>.
  4. Append the configured domain/pool domain when the name is not fully qualified.
  5. Sanitize to DNS label rules: lower-case, [a-z0-9-], label length <= 63, full name <= 255, no leading/trailing dash.
  6. Reject or fallback on empty/invalid names.

Phase 3: DNS update backend

Create a small interface, for example:

type DNSUpdater interface {
    UpsertLease(ctx context.Context, rec LeaseDNSRecord) error
    DeleteLease(ctx context.Context, rec LeaseDNSRecord) error
}

First backend should be RFC 2136 dynamic update with TSIG using miekg/dns or an equivalent library already acceptable in the repo.

Records:

  • Add/update hostname TTL A address for IPv4 leases.
  • Add/update hostname TTL AAAA address for IPv6 leases.
  • Add/update reverse PTR address -> hostname when a reverse zone is configured or derivable.
  • Add an ownership marker if supported, e.g. TXT alongside the name: xpf-dhcp-ddns=<stable-owner-id>. If we choose not to add TXT, rely on the local state DB and strict matching before delete.

Future backend can target the xpf-managed DNS runtime from docs/next-features/dns-proxy.md once that exists.

Phase 4: Lease watcher and reconciler

Do not rely on a single event path. Use both:

  1. Near-event trigger: watch Kea lease CSV files with fsnotify or another file-change mechanism.
  2. Periodic reconciliation: rescan every N seconds and on daemon startup.

Reconciler algorithm:

  1. Read current Kea v4/v6 lease files.
  2. Parse expiry times and ignore expired/inactive rows.
  3. Build desired DNS state from active leases and current DHCP config.
  4. Compare desired state with previous xpf-owned state.
  5. For new lease: add forward/reverse records.
  6. For renewed same lease: refresh TTL/expiry metadata; update DNS only if name/address changed.
  7. For same client moved to new address: delete old A/AAAA/PTR, then add new records.
  8. For same address assigned to a new client: delete old xpf-owned records before adding new records.
  9. For expired/missing lease: delete owned records.
  10. Retry failed updates with bounded backoff and expose counters.

Important: current parseLeaseCSV returns rows from Kea CSV without strong active/expired filtering. The DDNS implementation needs a lease parser that understands Kea expiry/state semantics, not just display parsing.

Phase 5: HA behavior

In chassis cluster mode, DHCP service is only applied on MASTER paths (daemon_ha.go starts/stops Kea with RETH ownership). DDNS must follow the same ownership:

  • Only the node actively serving DHCP for an RG performs DNS updates for that RG.
  • On MASTER transition, run immediate reconciliation.
  • On BACKUP transition, stop update emission but do not delete valid records just because the local node stopped serving; deletion should be lease-state driven or explicit config removal driven.
  • State store should survive restart and failover. If both nodes can become MASTER over time, use deterministic owner IDs so cleanup remains safe.

Observability

Add status/counters:

  • dhcp_ddns_updates_total{family,type,result}
  • dhcp_ddns_deletes_total{family,type,result}
  • dhcp_ddns_reconcile_runs_total{result}
  • dhcp_ddns_pending_retries
  • dhcp_ddns_last_success_timestamp_seconds
  • dhcp_ddns_last_error

CLI/operational output:

  • show system services dhcp-server dynamic-dns
  • include per-lease DNS name in show dhcp-server detail if available
  • logs for update/delete failures with pool/group/interface/address/hostname

Failure policy

Default fail-open for DHCP:

  • DHCP lease grant must not block on DNS update.
  • DNS update failures are retried and surfaced in status.
  • Optional future strict mode can fail/withhold leases if DNS update fails, but do not implement strict mode first.

Tests

Unit tests:

  • hostname normalization and FQDN/domain handling
  • v4 A + reverse PTR update payload generation
  • v6 AAAA + reverse PTR update payload generation
  • delete only records present in xpf-owned state
  • same client moving addresses deletes old records and adds new records
  • address reassigned to different client cleans old owner before adding new owner
  • expired lease rows are removed from desired state
  • config removal deletes owned records for that pool/group
  • retry/backoff does not block reconciliation forever

Integration-style tests with fake DNS updater:

  • synthetic Kea CSV lease file change produces expected upsert
  • lease expiration/reconciliation produces expected delete
  • daemon restart with existing state and active leases does not duplicate updates
  • HA MASTER/BACKUP transition gates update emission

Lab validation:

  • Configure DHCPv4 pool with domain, hand lease to client with hostname, verify A and reverse PTR resolve.
  • Renew same lease, verify record remains stable.
  • Force lease expiry/reclaim, verify A/PTR removed.
  • Reassign the address to another MAC/hostname, verify old name removed and new name added.
  • Repeat for DHCPv6 AAAA/ip6.arpa where supported.

Acceptance criteria

  • Opt-in config compiles and renders without affecting existing DHCP server behavior when disabled.
  • Active DHCP leases produce forward and reverse DNS records through the configured backend.
  • Stale records are removed on expiry, release/reclaim, reassignment, and config removal.
  • xpf never deletes non-owned DNS records.
  • DHCP service remains available when DNS update backend is down.
  • Status/counters make update failures and pending retries visible.
  • HA mode emits updates only from the active DHCP-serving node.

Open questions

  1. Should phase 1 use direct RFC 2136 updates from xpfd, Kea kea-dhcp-ddns/D2, or both? Direct xpfd updates give us easier ownership-state cleanup and tests; Kea D2 is closer to native Kea behavior.
  2. Which DNS runtime should be the first local authoritative/update target if/when docs/next-features/dns-proxy.md lands?
  3. Should generated fallback names be enabled by default, or should leases without hostnames be skipped unless configured?
  4. How much Junos syntax compatibility do we want for DDNS knobs versus an xpf-native config subtree?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions