Decision
Retire the eBPF dataplane in favor of the Rust userspace AF_XDP dataplane. The userspace path has had feature parity for months, is the canonical smoke target (loss:xpf-userspace-fw0/fw1), has the active perf/HA development surface (PR #1187 / #1188 / #1206 / #1330 / #1332 / #1341 / #1358 / #1366 all landed against it), and is what every new feature gets built on. The eBPF path is the legacy / regression-only path per CLAUDE.md.
Two dataplanes is twice the maintenance, twice the test surface, twice the documentation. Time to consolidate.
Scope summary
Concrete surface to remove or retire:
| Surface |
Count |
Path |
| BPF C source (XDP + TC programs + headers) |
20 files |
bpf/ |
| Go eBPF loader + bpf2go bindings + tests |
65 files |
pkg/dataplane/*.go (excluding pkg/dataplane/userspace/) |
make generate (bpf2go invocation) + related clean rules |
1 target |
Makefile |
Legacy local cluster Make targets (cluster-init/-create/-deploy/-destroy/-ssh/-status/-logs/-start/-stop/-restart) |
~10 targets |
Makefile (these drive bpfrx-fw0/1 legacy eBPF cluster) |
Legacy standalone eBPF reference VM (bpfrx-fw) |
1 VM |
test/incus/setup.sh |
Tests under test/incus/per-stream-diag.sh + cluster-setup.sh that reference bpfrx* cluster names |
3 files |
test/incus/ |
| Docs that reference eBPF / BPF / XDP / bpf2go / BPF verifier rules |
75+ docs |
docs/, CLAUDE.md, README* |
show system buffers CLI surface (currently reports BPF map utilization) |
1 surface |
pkg/cli/ |
| Userspace-dp Rust code unchanged |
181 files |
userspace-dp/src/ |
Feature-parity audit (Phase 0 — gate)
Before removing anything, refresh docs/userspace-dataplane-gaps.md (dated 2026-03-14) against current master. Confirm every eBPF-dataplane feature has a userspace equivalent. Block retirement on any documented gap that production depends on.
Known-implemented in userspace-dp today per the gaps doc:
- Stateful forwarding, zone + global policies, application matching
- SNAT (interface), DNAT, static NAT, NAT64, NPTv6
- Firewall filters, flow export
- TCP MSS clamping, embedded ICMP NAT reversal
- Configurable session timeouts, VLAN, routes + neighbors
Verify also: screen/IDS, SYN cookie flood protection, IPsec, BGP/OSPF integration with FRR, all 7+ CoS classes, RPM probes, NetFlow v9, dynamic feeds, SNMP, DHCP relay/server, syslog, HA chassis cluster state machine + VRRP + sync.
Phases
Phased to minimize blast radius. Each phase ships as a separate PR with triple-review + smoke against loss:xpf-userspace-fw0/1 canonical cluster.
Phase 0 — feature-parity audit + deprecation announcement
Phase 1 — docs migration (the "ALLLLLLLL documentation" pass)
Phase 2 — test environment consolidation
Phase 3 — build system + Go code removal
Phase 4 — BPF source removal
Phase 5 — CLI + observability cleanup
Phase 6 — branding cleanup (optional, can be deferred)
Documentation update list (Phase 1 detailed)
The 75+ docs identified in the survey, grouped by category:
Top-level / canonical:
CLAUDE.md, docs/engineering-style.md, docs/testing.md, docs/development-workflow.md, docs/test_env.md
Architecture / pipeline (most BPF-heavy):
docs/userspace-dataplane-architecture.md, docs/afxdp-packet-processing.md, docs/afxdp-module-split.md, docs/shared-umem-plan.md, docs/xdp-io-uring-userspace-dataplane.md, docs/userspace-xdp-pass-bootstrap-and-ipv6.md, docs/userspace-fabric-redirect-fix.md, docs/userspace-master-merge-20260310.md
Feature/decision docs:
docs/dataplane-decision-dpdk-vs-vpp.md, docs/vpp-dataplane-assessment.md, docs/userspace-dnat-plan.md, docs/userspace-icmp-te-debugging.md, docs/syn-cookie-flood-protection.md, docs/embedded-radvd.md, docs/services-application-identification.md, docs/fabric-cross-chassis-fwd.md, docs/userspace-native-gre-plan.md, docs/sync-protocol.md, docs/fabric-performance-optimizations.md
HA / failover:
docs/ha-cluster-test-plan.md, docs/userspace-ha-validation.md, docs/bug-heartbeat-vrf-rebind-split-brain.md, docs/session-sync-architecture.md
Backlogs + audits:
docs/feature-gaps.md, docs/userspace-dataplane-gaps.md, docs/perf-ranked-backlog.md, docs/authoritative-backlog.md, docs/userspace-performance-plan.md, docs/refactoring-audit.md, docs/bugs.md, docs/phases.md
Fairness / CoS validation (recent active):
docs/fairness-regimes.md
Historical PR plans (deprecation banner, don't rewrite):
docs/pr/881-cpu-windows/plan.md, docs/pr/814-max-interfaces/plan.md + reviews
Risk + rollback
Risks:
- Hidden feature regression: a corner-case eBPF behavior we didn't audit (e.g. a specific NAT64 edge case, or a specific BPF screen check) doesn't have a userspace equivalent. Mitigation: Phase 0 audit + smoke matrix per phase.
- External customers: anyone running production on the eBPF dataplane gets bricked. Mitigation: deprecation announcement in Phase 0 with timeline; major version bump on Phase 3+ merge.
- Performance regression for specific workloads: eBPF native XDP on PF passthrough has measurable advantages over userspace AF_XDP for raw packet rate in some scenarios. Mitigation: document the tradeoff; perf-test gate per phase using fairness-eval + iperf3 12-stream baseline.
- HA failover: if userspace-dp has any subtle HA gap vs eBPF (RETH virtual MAC, fabric forwarding, sync protocol), retirement exposes it. Mitigation:
make test-failover + make test-ha-crash + make test-restart-connectivity smoke per phase merge.
Rollback:
Each phase is a separate PR. Revert is one git revert <phase-PR> away. Phase 4 (BPF source removal) is the point-of-no-return; before that PR merges, verify all prior phases have been stable for at least N days on canonical cluster.
Out of scope
Smoke gate per phase merge
Required before each phase PR merges:
- Pass A (CoS disabled): v4+v6 × push+reverse × multi-stream
-P 12 -R — line rate, 0 retrans
- Pass B (CoS enabled): per-class 5200-5211 + echo 6200-6211 — all 24 measurements pass with 0 retrans on unshaped classes, shaped classes hit configured rate cleanly
make test-failover — 0 packet loss across reboot
make test-ha-crash — daemon-stop + force-stop + multi-cycle recovery
make test-restart-connectivity — 0 packet loss during daemon restart
Refs
Decision
Retire the eBPF dataplane in favor of the Rust userspace AF_XDP dataplane. The userspace path has had feature parity for months, is the canonical smoke target (
loss:xpf-userspace-fw0/fw1), has the active perf/HA development surface (PR #1187 / #1188 / #1206 / #1330 / #1332 / #1341 / #1358 / #1366 all landed against it), and is what every new feature gets built on. The eBPF path is the legacy / regression-only path perCLAUDE.md.Two dataplanes is twice the maintenance, twice the test surface, twice the documentation. Time to consolidate.
Scope summary
Concrete surface to remove or retire:
bpf/pkg/dataplane/*.go(excludingpkg/dataplane/userspace/)make generate(bpf2go invocation) + related clean rulesMakefilecluster-init/-create/-deploy/-destroy/-ssh/-status/-logs/-start/-stop/-restart)Makefile(these drivebpfrx-fw0/1legacy eBPF cluster)bpfrx-fw)test/incus/setup.shtest/incus/per-stream-diag.sh+cluster-setup.shthat referencebpfrx*cluster namestest/incus/docs/,CLAUDE.md,README*show system buffersCLI surface (currently reports BPF map utilization)pkg/cli/userspace-dp/src/Feature-parity audit (Phase 0 — gate)
Before removing anything, refresh
docs/userspace-dataplane-gaps.md(dated 2026-03-14) against current master. Confirm every eBPF-dataplane feature has a userspace equivalent. Block retirement on any documented gap that production depends on.Known-implemented in userspace-dp today per the gaps doc:
Verify also: screen/IDS, SYN cookie flood protection, IPsec, BGP/OSPF integration with FRR, all 7+ CoS classes, RPM probes, NetFlow v9, dynamic feeds, SNMP, DHCP relay/server, syslog, HA chassis cluster state machine + VRRP + sync.
Phases
Phased to minimize blast radius. Each phase ships as a separate PR with triple-review + smoke against
loss:xpf-userspace-fw0/1canonical cluster.Phase 0 — feature-parity audit + deprecation announcement
docs/userspace-dataplane-gaps.mdagainst current masterPhase 1 — docs migration (the "ALLLLLLLL documentation" pass)
CLAUDE.md(project root +~/.claude/CLAUDE.mdif present) — strip BPF Pipeline section, BPF verifier rules, BPF map serialization rules, byte-order rules, XDP/SR-IOV gotchas, kernel-version constraintsdocs/engineering-style.md— drop BPF-specific rulesREADME*— point all examples at userspace-dpdocs/testing.md,docs/development-workflow.md,docs/test_env.md— userspace-dp onlydocs/ha-cluster-test-plan.md— drop legacy cluster referencesdocs/feature-gaps.md— single dataplane columndocs/*-plan.mdPR plan-docs — leave as historical record (don't rewrite history); add deprecation banner at toptest/incus/cluster-setup.shREADME/comments — userspace-dp onlyPhase 2 — test environment consolidation
make cluster-*targets (bpfrx-fw0/1)make test-vm(standalone eBPF reference)test/incus/cluster-setup.shto drop bpfrx env supportloss-cluster-*Make targets tocluster-*(rename to be the canonical, not the alternative)test/incus/per-stream-diag.shto userspace-dp onlymake test-failover+make test-ha-crash+make test-restart-connectivitypoint at userspace-dp clusterPhase 3 — build system + Go code removal
make generatetarget (no more bpf2go)BPF_CFLAGSand BPF object cleanup rulespkg/dataplane/*_bpfel.go+*_bpfeb.go(bpf2go-generated)pkg/dataplane/loader_ebpf.go+loader_stub.gopkg/dataplane/*.go— keep what's shared with userspace (e.g.cpumask.go,types.go,persistent_nat.goif userspace-dp consumes them); remove what's eBPF-onlydaemonlifecycle to not invoke the eBPF loadergo build ./...clean;go test ./...passesPhase 4 — BPF source removal
rm -rf bpf/grep -ri "bpf/" .returns nothing in active code paths (historical PR plan-docs OK)Phase 5 — CLI + observability cleanup
show system buffers— replace BPF-map-utilization output with userspace-dp equivalent (worker queue depths, UMEM frame counts, etc.) OR remove if redundant withshow system userspacexpf_bpf_*metrics if any; ensurexpf_userspace_dp_*cover the same operator needsbpf-prefixed commandsPhase 6 — branding cleanup (optional, can be deferred)
bpfrx-*references everywhere toxpf-*(already the project name)bpfrxdbinary references — if not the current binary name, drop themDocumentation update list (Phase 1 detailed)
The 75+ docs identified in the survey, grouped by category:
Top-level / canonical:
CLAUDE.md,docs/engineering-style.md,docs/testing.md,docs/development-workflow.md,docs/test_env.mdArchitecture / pipeline (most BPF-heavy):
docs/userspace-dataplane-architecture.md,docs/afxdp-packet-processing.md,docs/afxdp-module-split.md,docs/shared-umem-plan.md,docs/xdp-io-uring-userspace-dataplane.md,docs/userspace-xdp-pass-bootstrap-and-ipv6.md,docs/userspace-fabric-redirect-fix.md,docs/userspace-master-merge-20260310.mdFeature/decision docs:
docs/dataplane-decision-dpdk-vs-vpp.md,docs/vpp-dataplane-assessment.md,docs/userspace-dnat-plan.md,docs/userspace-icmp-te-debugging.md,docs/syn-cookie-flood-protection.md,docs/embedded-radvd.md,docs/services-application-identification.md,docs/fabric-cross-chassis-fwd.md,docs/userspace-native-gre-plan.md,docs/sync-protocol.md,docs/fabric-performance-optimizations.mdHA / failover:
docs/ha-cluster-test-plan.md,docs/userspace-ha-validation.md,docs/bug-heartbeat-vrf-rebind-split-brain.md,docs/session-sync-architecture.mdBacklogs + audits:
docs/feature-gaps.md,docs/userspace-dataplane-gaps.md,docs/perf-ranked-backlog.md,docs/authoritative-backlog.md,docs/userspace-performance-plan.md,docs/refactoring-audit.md,docs/bugs.md,docs/phases.mdFairness / CoS validation (recent active):
docs/fairness-regimes.mdHistorical PR plans (deprecation banner, don't rewrite):
docs/pr/881-cpu-windows/plan.md,docs/pr/814-max-interfaces/plan.md+ reviewsRisk + rollback
Risks:
make test-failover+make test-ha-crash+make test-restart-connectivitysmoke per phase merge.Rollback:
Each phase is a separate PR. Revert is one
git revert <phase-PR>away. Phase 4 (BPF source removal) is the point-of-no-return; before that PR merges, verify all prior phases have been stable for at least N days on canonical cluster.Out of scope
dpdk_worker/,pkg/dataplane/dpdk/) — separate experimental path; retirement is independent. Keep DPDK for now.cmd/xpfddaemon binary name — staysxpfd.Smoke gate per phase merge
Required before each phase PR merges:
-P 12 -R— line rate, 0 retransmake test-failover— 0 packet loss across rebootmake test-ha-crash— daemon-stop + force-stop + multi-cycle recoverymake test-restart-connectivity— 0 packet loss during daemon restartRefs
CLAUDE.mddescribes the dual pipeline and notes "eBPF cluster (bpfrx-fw0/1) is regression-only"docs/userspace-dataplane-gaps.md— current feature parity baseline (last updated 2026-03-14)docs/feature-gaps.md— vSRX comparison; should be unaffected by dataplane choice