-
Timestamp: 2026-05-17T15:28:13Z
- Action: PR #1397 follow-up — fixed mouse-latency diagnostics review findings by making
cwnd_settle_oktri-state in manifests (unknown/true/false), correcting cwnd byte-unit parsing to 1024-basedK/M/G/TBytes, recording probe phase timings even on failed/timed-out connect/drain/read attempts, tightening fairness-regimes settle-evidence wording, and extending unit coverage for settle-diagnostics CLI output/status and failure-phase timing counts. - File(s):
test/incus/test-mouse-latency.sh,test/incus/mouse_latency_orchestrate.py,test/incus/mouse_latency_orchestrate_test.py,test/incus/mouse_latency_probe.py,test/incus/mouse_latency_probe_test.py,test/incus/test_mouse_latency_shell_test.py,docs/fairness-regimes.md,_Log.md - Validation:
cd test/incus && python3 -m unittest mouse_latency_orchestrate_test.py mouse_latency_aggregate_test.py test_mouse_latency_shell_test.py && python3 -m unittest mouse_latency_probe_test.py && bash -n test-mouse-latency.sh;python3 -m py_compile test/incus/mouse_latency_orchestrate.py test/incus/mouse_latency_probe.py test/incus/mouse_latency_aggregate.py;git diff --check
- Action: PR #1397 follow-up — fixed mouse-latency diagnostics review findings by making
-
Timestamp: 2026-05-17T15:29:22Z
- Action: Addressed parallel validation follow-up by making
CWND_SETTLE_OK="true"conditional on settle-diagnostics success (explicitelsebranch) and renaming settle-diagnostics test helper variables to descriptive names. - File(s):
test/incus/test-mouse-latency.sh,test/incus/mouse_latency_orchestrate_test.py,_Log.md - Validation:
cd test/incus && python3 -m unittest mouse_latency_orchestrate_test.py test_mouse_latency_shell_test.py;cd test/incus && python3 -m unittest mouse_latency_probe_test.py;bash -n test/incus/test-mouse-latency.sh
- Action: Addressed parallel validation follow-up by making
-
Timestamp: 2026-05-17T15:30:19Z
- Action: Addressed final review nits in orchestrate tests by renaming the settle-diagnostics args mock class for clarity and replacing inline cwnd-byte magic numbers with named KiB constants.
- File(s):
test/incus/mouse_latency_orchestrate_test.py,_Log.md - Validation:
cd test/incus && python3 -m unittest mouse_latency_orchestrate_test.py
-
Timestamp: 2026-05-17T15:31:13Z
- Action: Applied final Copilot review polish in orchestrate tests by making the mock args class name explicitly
Mock*and documenting cwnd KiB fixture constants. - File(s):
test/incus/mouse_latency_orchestrate_test.py,_Log.md - Validation:
cd test/incus && python3 -m unittest mouse_latency_orchestrate_test.py
- Action: Applied final Copilot review polish in orchestrate tests by making the mock args class name explicitly
-
Timestamp: 2026-05-17T08:30:20Z
- Action: PR #1394 round-10 follow-up — fixed standalone userspace event-stream callback wiring by always registering session/full-resync callbacks, and added a regression test that verifies standalone SessionOpen and FullResync frames are ACKed instead of stalling behind an unwired callback queue.
- File(s):
pkg/daemon/daemon_ha_userspace.go,pkg/daemon/userspace_sync_test.go,_Log.md - Validation:
gofmt -w pkg/daemon/daemon_ha_userspace.go pkg/daemon/userspace_sync_test.go;go test ./pkg/daemon ./pkg/dataplane/userspace ./pkg/logging;git diff --check
-
Timestamp: 2026-05-17T06:39:00Z
- Action: PR #1376 housekeeping — restored
go.modafter an unintended direct/indirect dependency classification flip from automation so this branch remains scoped to mirror snapshot fixes. - File(s):
go.mod,_Log.md
- Action: PR #1376 housekeeping — restored
-
Timestamp: 2026-05-17T06:45:00Z
- Action: PR #1376 re-review follow-up — renamed the mirror snapshot fail-closed wrapper to match behavior, rejected negative port-mirroring input rates before uint32 conversion to prevent wraparound, and updated mirror protocol/build tests accordingly.
- File(s):
pkg/dataplane/userspace/snapshot.go,pkg/dataplane/userspace/protocol_test.go,pkg/dataplane/userspace/manager_test.go,_Log.md
-
Timestamp: 2026-05-17T05:06:00Z
- Action: PR #1395 cleanup — reverted an unintended
go.moddirect/indirect dependency classification change introduced by automated tooling so the round-4 fix stays scoped to three-color policer compiler logic/tests/docs. - File(s):
go.mod,_Log.md
- Action: PR #1395 cleanup — reverted an unintended
-
Timestamp: 2026-05-17T05:12:00Z
- Action: Round-5 follow-up fix — in userspace pending-XSK-startup compile path, defer
lastSnapshotcache update until ingress/local/NAT map sync succeeds so sync failures cannot poison cached snapshot state with an unpublished generation. - File(s):
pkg/dataplane/userspace/manager.go,_Log.md
- Action: Round-5 follow-up fix — in userspace pending-XSK-startup compile path, defer
-
Timestamp: 2026-05-17T05:16:00Z
- Action: Restored
go.modafter an unintended direct/indirect dependency classification flip introduced by an automation-only progress update. - File(s):
go.mod,_Log.md
- Action: Restored
-
Timestamp: 2026-05-17T04:48:51Z
- Action: Re-restored
go.modafter a subsequent tooling pass reintroduced the same direct/indirect dependency classification flip. - File(s):
go.mod,_Log.md
- Action: Re-restored
-
Timestamp: 2026-05-17T05:03:00Z
- Action: PR #1395 round-4 follow-up cleanup — moved three-color mode marker assignment outside repeated same-mode child loops to avoid redundant writes while preserving duplicate-sibling merge semantics.
- File(s):
pkg/config/compiler_firewall.go,_Log.md
-
Timestamp: 2026-05-17T04:58:00Z
- Action: PR #1395 round-4 follow-up — fixed three-color policer compiler handling for duplicate same-mode sibling blocks by iterating all
single-rate/two-ratechildren, added hierarchical ambiguity regression coverage in parser/configstore tests, and updated filter module docs to reflect same-mode sibling merge semantics. - File(s):
pkg/config/compiler_firewall.go,pkg/config/parser_ast_test.go,pkg/configstore/store_test.go,userspace-dp/src/filter/README.md,_Log.md
- Action: PR #1395 round-4 follow-up — fixed three-color policer compiler handling for duplicate same-mode sibling blocks by iterating all
-
Timestamp: 2026-05-17T04:58:00Z
- Action: PR #1379 round-4 blocker fix follow-up — removed HA startup ordering race by starting cluster comms only after event fanout wiring, made userspace event-stream callback registration concurrency-safe for post-start wiring, and exposed helper producer sent/dropped counters through status text and Prometheus.
- File(s):
pkg/daemon/daemon_run.go,pkg/dataplane/userspace/eventstream.go,pkg/dataplane/userspace/eventstream_test.go,pkg/dataplane/userspace/protocol.go,pkg/dataplane/userspace/statusfmt.go,pkg/api/metrics.go,pkg/api/metrics_test.go,_Log.md - Validation:
gofmt -w pkg/daemon/daemon_run.go pkg/dataplane/userspace/eventstream.go pkg/dataplane/userspace/eventstream_test.go pkg/dataplane/userspace/protocol.go pkg/dataplane/userspace/statusfmt.go pkg/api/metrics.go pkg/api/metrics_test.go;go test ./pkg/dataplane/userspace ./pkg/logging ./pkg/api ./pkg/daemon;git diff --check
-
Timestamp: 2026-05-17T05:06:00Z
- Action: Addressed automated review nit on the new event-stream lifecycle regression test by replacing fixed sleep + wall-clock polling with timeout contexts and retry loops for listener readiness/connection checks.
- File(s):
pkg/dataplane/userspace/eventstream_test.go,_Log.md - Validation:
gofmt -w pkg/dataplane/userspace/eventstream_test.go;go test ./pkg/dataplane/userspace ./pkg/logging ./pkg/api ./pkg/daemon;git diff --check
-
Timestamp: 2026-05-17T05:13:00Z
- Action: Completed the same timeout-context synchronization pattern in
TestEventStreamDataplaneEventCallbackCanBeSetAfterStartso the new lifecycle regression test no longer relies on fixed sleeps or wall-clock deadline polling. - File(s):
pkg/dataplane/userspace/eventstream_test.go,_Log.md - Validation:
gofmt -w pkg/dataplane/userspace/eventstream_test.go;go test ./pkg/dataplane/userspace ./pkg/logging ./pkg/api ./pkg/daemon;git diff --check
- Action: Completed the same timeout-context synchronization pattern in
-
Timestamp: 2026-05-17T01:12:00Z
- Action: PR #1391 post-smoke follow-up — live q10(24G)+q0(best-effort) contention on
7e7eb07eshowed serviceable-only exact suppression still let q0 drain ~15.6 GB of surplus while exact was backlogged. Reworked the gate from binary serviceability to residual-rate budgeting: non-exact surplus can consume onlyroot_rate - backlogged_exact_guarantee_rates, shared exact queues publish queue masks so one queue's reservation is counted once across workers, and shared interfaces use an interface-global residual token bucket rather than per-worker residual buckets. - File(s):
userspace-dp/src/afxdp/cos/queue_service/mod.rs,userspace-dp/src/afxdp/cos/queue_service/tests.rs,userspace-dp/src/afxdp/cos/tx_completion.rs,userspace-dp/src/afxdp/types/shared_cos_lease.rs,userspace-dp/src/afxdp/types/cos.rs,userspace-dp/src/afxdp/cos/builders.rs,userspace-dp/src/afxdp/worker/cos_tests.rs,userspace-dp/src/afxdp/cos/README.md,userspace-dp/src/afxdp/types/README.md,_Log.md - Validation:
rustfmt --edition 2024on changed Rust files;cargo test --manifest-path userspace-dp/Cargo.toml build_nonexact -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml exact_backlog -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml apply_cos_send_result_debits_shared_residual_surplus_budget -- --nocapture;git diff --check
- Action: PR #1391 post-smoke follow-up — live q10(24G)+q0(best-effort) contention on
-
Timestamp: 2026-05-17T00:08:28Z
- Action: Issue #1390 — initial CoS best-effort strict-priority fix attempt; round-1 review later narrowed the invariant to surplus-only suppression so explicit non-exact guarantees remain Junos-compatible.
- File(s):
userspace-dp/src/afxdp/cos/queue_service/mod.rs,userspace-dp/src/afxdp/cos/queue_service/tests.rs,userspace-dp/src/afxdp/tx/cos_classify_tests.rs,userspace-dp/src/afxdp/worker/cos_tests.rs,userspace-dp/src/afxdp/cos/README.md,_Log.md - Validation:
cargo fmt --manifest-path userspace-dp/Cargo.toml;cargo test --manifest-path userspace-dp/Cargo.toml build_nonexact -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml afxdp::cos::queue_service::tests:: -- --nocapture;git diff --check
-
Timestamp: 2026-05-17T00:35:00Z
- Action: PR #1391 round-1 review follow-up — narrowed exact-over-residual enforcement to the surplus phase so explicit non-exact guarantees keep Junos-compatible service; local and peer suppression now require a serviceable exact queue, peer serviceability uses acquire/release publication, and tests pin non-exact guarantee plus token-starved exact behavior.
- File(s):
userspace-dp/src/afxdp/cos/queue_service/mod.rs,userspace-dp/src/afxdp/cos/queue_service/tests.rs,userspace-dp/src/afxdp/cos/tx_completion.rs,userspace-dp/src/afxdp/types/shared_cos_lease.rs,userspace-dp/src/afxdp/cos/README.md,userspace-dp/src/afxdp/types/README.md,_Log.md - Validation:
cargo test --manifest-path userspace-dp/Cargo.toml build_nonexact -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml afxdp::cos::queue_service::tests:: -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml exact_backlog -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml reset_binding_cos_runtime_clears_shared_exact_backlog_slot -- --nocapture;git diff --check
-
Timestamp: 2026-05-17T00:48:00Z
- Action: PR #1385 round-3 review follow-up — corrected README prose so pool-mode SNAT is no longer described as an explicit userspace capability gate after the pool-mode fixes; the remaining caveat is cross-backend
address-persistentparity under #1377. - File(s):
README.md,_Log.md - Validation:
git diff --check
- Action: PR #1385 round-3 review follow-up — corrected README prose so pool-mode SNAT is no longer described as an explicit userspace capability gate after the pool-mode fixes; the remaining caveat is cross-backend
-
Timestamp: 2026-05-17T01:03:00Z
- Action: PR #1382 round-3 rebase follow-up — aligned the Phase 0 audit with the now-merged #1385 pool-mode SNAT fixes, and kept #1377 as the remaining cross-backend
address-persistentparity blocker. - File(s):
docs/pr/1373-retire-ebpf-dataplane/plan.md,_Log.md - Validation:
git diff --check
- Action: PR #1382 round-3 rebase follow-up — aligned the Phase 0 audit with the now-merged #1385 pool-mode SNAT fixes, and kept #1377 as the remaining cross-backend
-
Timestamp: 2026-05-16T23:58:00Z
- Action: PR #1385 round-2 review follow-up — made Rust pool-mode SNAT skip matched rules whose pool has no address for the packet family so later compatible rules can apply, added wrong-family regression tests, and documented userspace/eBPF/DPDK address-persistent algorithm divergence until #1377 defines a shared contract.
- File(s):
userspace-dp/src/nat.rs,userspace-dp/src/nat_tests.rs,docs/userspace-dataplane-architecture.md,docs/userspace-dataplane-gaps.md,README.md,_Log.md - Validation:
cargo test --manifest-path userspace-dp/Cargo.toml pool_snat_wrong_family;go test ./pkg/dataplane/userspace;git diff --check
-
Timestamp: 2026-05-16T22:18:20Z
- Action: PR #1385 round-1 review fixes — make userspace source-NAT pool snapshots fail closed when the referenced pool is missing, empty, or has an invalid port range; add regression coverage for each unsafe pool-mode case; refresh the userspace dataplane gap date after the pool-mode capability update.
- File(s):
pkg/dataplane/userspace/snapshot.go,pkg/dataplane/userspace/manager_test.go,docs/userspace-dataplane-gaps.md,_Log.md - Validation:
go test ./pkg/dataplane/userspace;git diff --check
-
Timestamp: 2026-05-17T00:06:00Z
- Action: PR #1386 round-2 review follow-up — changed userspace buffer capacity fallback from all-or-nothing to per-row fallback so mixed helper snapshots with one populated
per_binding[]row and one sparse row do not undercount capacity. - File(s):
pkg/dataplane/userspace/buffersfmt.go,pkg/dataplane/userspace/buffersfmt_test.go,_Log.md - Validation:
gofmt -w pkg/dataplane/userspace/buffersfmt.go pkg/dataplane/userspace/buffersfmt_test.go;go test ./pkg/dataplane/userspace;git diff --check
- Action: PR #1386 round-2 review follow-up — changed userspace buffer capacity fallback from all-or-nothing to per-row fallback so mixed helper snapshots with one populated
-
Timestamp: 2026-05-16T22:19:53Z
- Action: PR #1386 round-1 review fixes — preserve the
Active sessionsfooter in userspaceshow system buffers, make non-detail output aggregate-only whilebuffers detailadds per-binding rows, and fall back tobindings[]whenper_binding[]lacks capacity gauges. - File(s):
pkg/dataplane/userspace/buffersfmt.go,pkg/dataplane/userspace/buffersfmt_test.go,pkg/cli/cli_show_system.go,pkg/grpcapi/server_show.go,pkg/grpcapi/server_show_system_buffers_test.go,docs/userspace-dataplane-architecture.md,_Log.md - Validation:
go test ./pkg/dataplane/userspace ./pkg/grpcapi ./pkg/cli ./pkg/fwdstatus;git diff --check
- Action: PR #1386 round-1 review fixes — preserve the
-
Timestamp: 2026-05-16T22:21:02Z
- Action: PR #1388 round-1 review fix — preserve old-status-JSON display compatibility for shaped interfaces whose userspace runtime synthesizes default best-effort queue 0, while keeping residual scheduler-map queues from inheriting false guarantees.
- File(s):
pkg/dataplane/userspace/cosfmt.go,pkg/dataplane/userspace/cosfmt_test.go,_Log.md - Validation:
go test ./pkg/dataplane/userspace;git diff --check
-
Timestamp: 2026-05-17T00:55:00Z
- Action: PR #1383 round-3 review follow-up — added the cluster sync stale-reconcile path to the interface split plan so bulk reconciliation uses the same
SessionStore/NAT-owned companion-delete semantics as GC instead of keeping a second BPF-shaped reverse-session/DNAT cleanup copy. - File(s):
docs/pr/1381-dataplane-interface-split/plan.md,_Log.md - Validation:
git diff --check
- Action: PR #1383 round-3 review follow-up — added the cluster sync stale-reconcile path to the interface split plan so bulk reconciliation uses the same
-
Timestamp: 2026-05-17T00:16:00Z
- Action: PR #1383 round-2 review follow-up — extended the target
ApplyResultcontract with firewall filter IDs/spans and NAT counter IDs needed by current show paths, and documented GC migration ownership for session-change counters, per-IP session-limit publish, persistent-NAT preservation, DNAT reverse cleanup, and backend-neutral stats. - File(s):
docs/pr/1381-dataplane-interface-split/plan.md,_Log.md - Validation:
git diff --check
- Action: PR #1383 round-2 review follow-up — extended the target
-
Timestamp: 2026-05-16T23:10:00Z
- Action: PR #1383 round-1 review follow-up — removed backend-specific userspace DTOs from the proposed public session-delta interface to avoid a
pkg/dataplane↔pkg/dataplane/userspaceimport cycle, added the missing caller inventory, and documented compatibility, risk, architectural-mismatch, import-canary, and Phase 1 acceptance gates. - File(s):
docs/pr/1381-dataplane-interface-split/plan.md,_Log.md
- Action: PR #1383 round-1 review follow-up — removed backend-specific userspace DTOs from the proposed public session-delta interface to avoid a
-
Timestamp: 2026-05-17T00:59:00Z
- Action: PR #1384 round-3 review follow-up — changed the blocker-plan bundle introduction to say plans land before their listed #1373 retirement phase, avoiding a Phase 4 blanket statement now that #1380 is a Phase 5 observability blocker.
- File(s):
docs/pr/1373-retire-ebpf-dataplane/README.md,_Log.md - Validation:
git diff --check
-
Timestamp: 2026-05-17T00:24:00Z
- Action: PR #1384 round-2 review follow-up — aligned #1380 with the Phase 5 observability gate, and made the #1377 plan explicitly cover address-persistent SNAT pool selection plus current userspace/eBPF/DPDK algorithm divergence and required compatibility fixtures.
- File(s):
docs/pr/1373-retire-ebpf-dataplane/README.md,docs/pr/1373-retire-ebpf-dataplane/plan-1377-snat-pools.md,_Log.md - Validation:
git diff --check
-
Timestamp: 2026-05-16T23:24:00Z
- Action: PR #1384 round-1 review follow-up — aligned the #1373 blocker-plan index with #1377/#1380 scope, made all required-before labels explicit, tightened SYN-cookie full-epoch/low-bit-wrap semantics, added risk sections and capability-admission tests to blocker plans, and added SNAT-pool and userspace-buffer parity plan docs.
- File(s):
docs/pr/1373-retire-ebpf-dataplane/README.md,docs/pr/1373-retire-ebpf-dataplane/plan-1374-syn-cookies.md,docs/pr/1373-retire-ebpf-dataplane/plan-1375-three-color-policers.md,docs/pr/1373-retire-ebpf-dataplane/plan-1376-port-mirroring.md,docs/pr/1373-retire-ebpf-dataplane/plan-1377-snat-pools.md,docs/pr/1373-retire-ebpf-dataplane/plan-1378-policy-schedulers.md,docs/pr/1373-retire-ebpf-dataplane/plan-1379-dataplane-events.md,docs/pr/1373-retire-ebpf-dataplane/plan-1380-userspace-buffers.md,_Log.md
-
Timestamp: 2026-05-16T23:39:00Z
- Action: PR #1382 round-1 review follow-up — reconciled Phase 0 retirement docs with #1385/#1386 fix-forward dependencies, removed stale userspace architecture limitations for features already implemented in Rust, changed userspace-dp wording to "being retired", and added Phase 0 exit criteria plus rollback path.
- File(s):
README.md,docs/userspace-dataplane-gaps.md,docs/userspace-dataplane-architecture.md,userspace-dp/README.md,docs/pr/1373-retire-ebpf-dataplane/plan.md,_Log.md - Validation:
git diff --check
-
Timestamp: 2026-05-16T04:36:00Z
- Action: PR #1320 round-3 review fixes — validate typed scheduler leaves against the apply-groups-expanded tree before compile; reject
transmit-rate exactunless the scheduler also has a typed rate; wireConfigClassOfServiceSchedulersinto the real config-modeset class-of-service schedulers <name>completion tree; update module docs to keep the byte-size-only scheduler buffer contract explicit. - File(s):
pkg/cmdtree/schema_validate.go,pkg/cmdtree/tree.go,pkg/cmdtree/tree_test.go,pkg/cmdtree/README.md,pkg/config/schema_validate_test.go,pkg/config/README.md,pkg/configstore/store.go,pkg/configstore/store_test.go,_Log.md - Validation:
go test -count=1 ./pkg/cmdtree/... ./pkg/config/... ./pkg/configstore/...;go test -count=1 ./pkg/...;git diff --check
- Action: PR #1320 round-3 review fixes — validate typed scheduler leaves against the apply-groups-expanded tree before compile; reject
-
Timestamp: 2026-05-16T00:14:30Z
- Action: Issue #1330 — split
userspace-dp/src/bin/fairness-eval.rsinto a thin CLI shell plususerspace-dp/src/fairness_eval/modules for args, inputs, windowing, per-worker aggregation, RSS expectation, verdict construction, and report emission without changing CLI behavior or fairness semantics. - File(s):
userspace-dp/src/bin/fairness-eval.rs,userspace-dp/src/fairness_eval/*,userspace-dp/src/bin/README.md,docs/pr/1330-fairness-eval-library/plan.md,_Log.md - Validation:
cargo test --manifest-path userspace-dp/Cargo.toml --bin fairness-eval;cargo test --manifest-path userspace-dp/Cargo.toml --test fairness_eval_blackbox;git diff --check
- Action: Issue #1330 — split
-
Timestamp: 2026-05-16T00:06:42Z
- Action: PR #1316 Copilot follow-up — narrow the validation TSV auditability claim to the fields present in the checked-in TSV schema, permit TSV summaries in the
docs/pr/evidence convention, and qualify the old dominant-failure heading as a pre-buffer-sizing snapshot. - File(s):
docs/pr/1316-lowrate-cos-buffers/validation.md,docs/pr/README.md,docs/cos-validation-notes.md,_Log.md
- Action: PR #1316 Copilot follow-up — narrow the validation TSV auditability claim to the fields present in the checked-in TSV schema, permit TSV summaries in the
-
Timestamp: 2026-05-15T23:13:00Z
- Action: PR #1316 round-4 review follow-up — filled the full seven-class validation table's high-rate Max CoV values from the committed TSV evidence, and documented that the historical 8-matrix
full-cos.setfile was later amended with q0/q4 buffer-size headroom. - File(s):
docs/pr/1316-lowrate-cos-buffers/validation.md,docs/pr/line-rate-investigation/8matrix-findings.md,_Log.md
- Action: PR #1316 round-4 review follow-up — filled the full seven-class validation table's high-rate Max CoV values from the committed TSV evidence, and documented that the historical 8-matrix
-
Timestamp: 2026-05-15T23:00:00Z
- Action: Restored
go.modto pre-PR state after an unintended direct/indirect dependency classification flip during automation-only progress updates. - File(s):
go.mod,_Log.md - Validation:
go test -count=1 ./pkg/dataplane/userspace;git diff --check
- Action: Restored
-
Timestamp: 2026-05-15T22:56:00Z
- Action: Reverted unintended
go.moddirect/indirect dependency reorder so round-1 fix remains scoped to CoS runtime lookup logic and tests. - File(s):
go.mod,_Log.md - Validation:
go test -count=1 ./pkg/dataplane/userspace;git diff --check
- Action: Reverted unintended
-
Timestamp: 2026-05-15T22:52:00Z
- Action: Round-1 follow-up cleanup — remove duplicate VLAN candidate append path in CoS runtime candidate generation while preserving VLAN-first ordering for unit-zero lookups.
- File(s):
pkg/dataplane/userspace/cosfmt.go,_Log.md - Validation:
go test -count=1 ./pkg/dataplane/userspace;git diff --check
-
Timestamp: 2026-05-15T22:45:00Z
- Action: Round-1 review hardening for issue #1278 — fix unit-zero candidate ordering so VLAN binding ifindex is preferred over parent binding when both exist, preventing wrong runtime CoS counters from being shown.
- File(s):
pkg/dataplane/userspace/cosfmt.go,pkg/dataplane/userspace/cosfmt_test.go,_Log.md - Validation:
go test -count=1 ./pkg/dataplane/userspace;git diff --check
-
Timestamp: 2026-05-15T22:15:00Z
- Action: Issue #1298 — add an idle debug-updater regression test proving wall-clock publishes age active-flow, flow-worker-map, and CoS active-flow snapshots to zero. Round-1 review narrowed the claim to the helper path exercised by the test and tied the aging loop to
ACTIVE_WINDOW_EPOCHSinstead of a hard-coded epoch count. - File(s):
userspace-dp/src/afxdp/flow_cache.rs,userspace-dp/src/afxdp/umem/tests.rs,_Log.md - Validation:
cargo test idle_debug_updater_ages_active_flow_snapshots;cargo test idle_debug_state_publish_cadence_is_wall_clock_based;git diff --check
- Action: Issue #1298 — add an idle debug-updater regression test proving wall-clock publishes age active-flow, flow-worker-map, and CoS active-flow snapshots to zero. Round-1 review narrowed the claim to the helper path exercised by the test and tied the aging loop to
-
Timestamp: 2026-05-15T22:05:00Z
- Action: Issue #1278 — make
show class-of-service interfacejoin configured reverse-egress CoS interfaces to live runtime by configured name first and binding egress ifindex second, so alias drift betweenge-0-0-1.0and the runtime snapshot no longer hides queue counters. - File(s):
pkg/dataplane/userspace/cosfmt.go,pkg/dataplane/userspace/cosfmt_test.go,docs/cos-validation-notes.md,_Log.md - Validation:
go test -count=1 ./pkg/dataplane/userspace;git diff --check
- Action: Issue #1278 — make
-
Timestamp: 2026-05-15T21:23:20Z
- Action: PR #1312 CoS TX-error attribution — round-3 fixes: (1) mirror reset-time CoS queue drains into
binding.live.dbg_cos_queue_overflowinreset_binding_cos_runtimeso the binding-scoped subset stays lifetime-matched withtx_errors; (2) add Rust regression testreset_binding_cos_runtime_mirrors_drops_to_binding_cos_counter; (3) updatedocs/cos-validation-notes.mdto state the binding-scoped subset includes admission rejects AND reset-time queue drains, and rephrase reason-counter lines as aggregate current-runtime sums (not per-queue rows); (4) extractsaturatingAddU64/saturatingSubU64intoformat_math.gowith doc comments; (5) rename ECN accumulator tocosAdmissionEcnMarked(Go-style Ecn casing). - File(s):
userspace-dp/src/afxdp/worker/cos.rs,userspace-dp/src/afxdp/worker/cos_tests.rs,pkg/dataplane/userspace/statusfmt.go,pkg/dataplane/userspace/statusfmt_test.go,pkg/dataplane/userspace/cosfmt.go,pkg/dataplane/userspace/format_math.go,docs/cos-validation-notes.md,_Log.md
- Action: PR #1312 CoS TX-error attribution — round-3 fixes: (1) mirror reset-time CoS queue drains into
-
Timestamp: 2026-05-15T21:42:00Z
- Action: PR #1315 Copilot follow-up — keep the historical
dbg_cos_queue_overflowwire key but relabel the CLI/docs as binding-lifetime CoS queue drops because the subset now includes reset-time CoS queue drains in addition to admission rejects. - File(s):
pkg/dataplane/userspace/statusfmt.go,pkg/dataplane/userspace/statusfmt_test.go,pkg/dataplane/userspace/protocol.go,userspace-dp/src/protocol.rs,docs/cos-validation-notes.md,_Log.md - Validation:
go test -count=1 ./pkg/dataplane/userspace;git diff --check
- Action: PR #1315 Copilot follow-up — keep the historical
-
Timestamp: 2026-05-15T20:33:36Z
- Action: PR #1316 / issue #1312 hostile-review follow-up — clarified low-rate CoS fixture docs with the actual implicit base/cap pipeline (
rate/100+ 96 KB floor, flow-share expansion, #717 delay clamp), documented the queue-residence tradeoff for q0/q4 overrides, added a regression test pin that 1 Gbps q4buffer-size 4mremains above delay-cap clamping, updated canonical fixture buffers, and committed durable validation evidence. - File(s):
test/incus/cos-iperf-config.set,test/incus/cos-iperf-symmetric.set,test/incus/cos-iperf-same-class.set,docs/cos-validation-notes.md,docs/fairness-regimes.md,docs/per-5-tuple/state.md,docs/pr/README.md,docs/pr/1316-lowrate-cos-buffers/validation.md,docs/pr/1316-lowrate-cos-buffers/evidence/focused-summary.tsv,docs/pr/1316-lowrate-cos-buffers/evidence/focused-dataplane-summary.tsv,docs/pr/1316-lowrate-cos-buffers/evidence/focused-equal-flow-summary.tsv,docs/pr/1316-lowrate-cos-buffers/evidence/full-summary.tsv,docs/pr/1316-lowrate-cos-buffers/evidence/full-dataplane-summary.tsv,docs/pr/1316-lowrate-cos-buffers/evidence/full-equal-flow-summary.tsv,docs/pr/line-rate-investigation/full-cos.set,userspace-dp/src/afxdp/cos/admission_tests.rs,_Log.md
- Action: PR #1316 / issue #1312 hostile-review follow-up — clarified low-rate CoS fixture docs with the actual implicit base/cap pipeline (
-
Timestamp: 2026-05-14T19:33:03Z
- Action: PR #1308 round-2 review follow-up — added explicit equal-flow scrape framing tests for nested begin and end-without-begin paths, added the missing
BindingCountersSnapshotround-trip pin fortx_shared_recycle_unknown_slot_drops, appliedgofmttoprotocol.go, and renamed sweep progress output fromwrapper_statustoexit_statuswith a named infrastructure-exit constant. - File(s):
test/incus/fairness_multi_sample_test.py,test/incus/fairness-cos-class-sweep.sh,pkg/dataplane/userspace/protocol.go,pkg/dataplane/userspace/protocol_test.go,_Log.md
- Action: PR #1308 round-2 review follow-up — added explicit equal-flow scrape framing tests for nested begin and end-without-begin paths, added the missing
-
Timestamp: 2026-05-14T19:19:32Z
- Action: PR #1308 round-1 review follow-up — made equal-flow capture reduction fail closed on SIGTERM-truncated marked scrapes and non-integer active-worker counts, and made sweep summary rows report infrastructure exit status
2when equal-flow capture fails after the wrapper succeeds. - File(s):
test/incus/fairness_equal_flow_capture.py,test/incus/fairness-cos-class-sweep.sh,test/incus/fairness_multi_sample_test.py,docs/fairness-regimes.md,docs/per-5-tuple/state.md,_Log.md
- Action: PR #1308 round-1 review follow-up — made equal-flow capture reduction fail closed on SIGTERM-truncated marked scrapes and non-integer active-worker counts, and made sweep summary rows report infrastructure exit status
-
Timestamp: 2026-05-14T18:35:00Z
- Action: Issue #1306 — add first-class per-class equal-flow estimator capture to the CoS class sweep harness. The sweep now brackets each wrapper run with continuous Prometheus scraping, preserves raw scrapes, reduces target-class equal-flow aggregate/worker rows, appends equal-flow evidence to
summary.md, and fails closed on empty/missing/invalid estimator captures. - File(s):
test/incus/fairness-cos-class-sweep.sh,test/incus/fairness_equal_flow_capture.py,test/incus/fairness_multi_sample_test.py,docs/fairness-regimes.md,docs/per-5-tuple/state.md,_Log.md
- Action: Issue #1306 — add first-class per-class equal-flow estimator capture to the CoS class sweep harness. The sweep now brackets each wrapper run with continuous Prometheus scraping, preserves raw scrapes, reduces target-class equal-flow aggregate/worker rows, appends equal-flow evidence to
-
Timestamp: 2026-05-14T17:53:10Z
- Action: PR #1305 round-3 review follow-up — extended artifact-warning wording to the equal-flow capped-bps, worker-cap-bps, and throughput-loss-ratio Prometheus help strings.
- File(s):
pkg/api/metrics.go,_Log.md
-
Timestamp: 2026-05-14T17:38:10Z
- Action: PR #1305 round-2 review follow-up — made the out-of-range worker test assert directly against
bytesByWorkerso the worker-delta cap is independently covered, and added artifact warnings to the equal-flow target/suppression metric help strings. - File(s):
pkg/dataplane/userspace/fairness_throughput_test.go,pkg/api/metrics.go,_Log.md
- Action: PR #1305 round-2 review follow-up — made the out-of-range worker test assert directly against
-
Timestamp: 2026-05-14T16:49:39Z
- Action: PR #1305 review follow-up — bounded equal-flow estimator worker IDs with the existing fairness RSS worker-slot cap, sharpened Prometheus help strings, and added validity boundary tests for single-worker, unsampled, zero-window, and out-of-range-worker cases.
- File(s):
pkg/dataplane/userspace/fairness_throughput.go,pkg/dataplane/userspace/fairness_throughput_test.go,pkg/api/metrics.go,docs/pr/1304-equal-flow-estimator/plan.md,_Log.md
-
Timestamp: 2026-05-14T14:51:46Z
- Action: Issue #1304 Phase 0 — add measurement-only equal-flow rate-suppression estimator telemetry for exact CoS queues, document its invariants, and pin estimator math/Prometheus emission tests.
- File(s):
pkg/dataplane/userspace/fairness_throughput.go,pkg/dataplane/userspace/fairness_throughput_test.go,pkg/api/metrics.go,pkg/api/metrics_test.go,docs/fairness-regimes.md,docs/per-5-tuple/state.md,docs/pr/1304-equal-flow-estimator/plan.md,_Log.md
-
Timestamp: 2026-05-14T04:01:00Z
- Action: PR #1301 review follow-up — removed power-of-two UMEM frame-size assumption in memmove fallback bounds calculation by switching to modulo-based in-frame offset math.
- File(s):
userspace-dp/src/afxdp/frame/mod.rs,_Log.md
-
Timestamp: 2026-05-14T03:52:00Z
- Action: PR #1301 review follow-up — tightened in-frame memmove fallback slice bounds to the current UMEM chunk and added regression coverage for
FillOnSlotWithOffsetrecycle tracking. - File(s):
userspace-dp/src/afxdp/frame/mod.rs,userspace-dp/src/afxdp/tx/transmit_tests.rs,_Log.md
- Action: PR #1301 review follow-up — tightened in-frame memmove fallback slice bounds to the current UMEM chunk and added regression coverage for
-
Timestamp: 2026-05-12T07:50:00Z
- Action: PR #1274 Copilot follow-up — use verdict JSON key names consistently in the Accepted Path publish list.
- File(s):
docs/per-5-tuple/tcp-head-start-floor.md,_Log.md
-
Timestamp: 2026-05-12T06:46:48Z
- Action: PR #1274 review follow-up — wrapped TCP head-start policy prose and made the observed CoV prose/JSON-field distinction explicit.
- File(s):
docs/per-5-tuple/tcp-head-start-floor.md,_Log.md
-
Timestamp: 2026-05-12T06:29:27Z
- Action: PR round-2 review follow-up — expanded AFD acronym at first use (line 5), changed
observed_covtoobserved_CoVin prose formulas (lines 86, 99), made epsilon explicit as 0.05. - File(s):
docs/per-5-tuple/tcp-head-start-floor.md,_Log.md
- Action: PR round-2 review follow-up — expanded AFD acronym at first use (line 5), changed
-
Timestamp: 2026-05-12T07:35:00Z
- Action: PR #1271 round-3 follow-up — add same-VLAN/different-RETH synthetic-ifindex regressions so
reth0.Nandreth1.Ncannot collapse into one logical Rust dataplane state key. - File(s):
pkg/dataplane/userspace/manager_test.go,_Log.md
- Action: PR #1271 round-3 follow-up — add same-VLAN/different-RETH synthetic-ifindex regressions so
-
Timestamp: 2026-05-12T07:20:00Z
- Action: PR #1271 cleanup — removed unrelated
go.moddirect/indirect dependency churn introduced by local test tooling to keep the diff scoped to synthetic-ifindex changes. - File(s):
go.mod,_Log.md
- Action: PR #1271 cleanup — removed unrelated
-
Timestamp: 2026-05-12T07:16:00Z
- Action: PR #1271 validation follow-up — documented synthetic-ifindex range rationale, improved exhaustion panic guidance, deduplicated VLAN test constants, and reverted unrelated
go.moddrift from local test tooling. - File(s):
pkg/dataplane/userspace/snapshot.go,pkg/dataplane/userspace/manager_test.go,go.mod,_Log.md
- Action: PR #1271 validation follow-up — documented synthetic-ifindex range rationale, improved exhaustion panic guidance, deduplicated VLAN test constants, and reverted unrelated
-
Timestamp: 2026-05-12T07:08:00Z
- Action: PR #1271 follow-up — enriched synthetic-ifindex exhaustion panic diagnostics and replaced test magic VLAN bound with named constants during validation pass.
- File(s):
pkg/dataplane/userspace/snapshot.go,pkg/dataplane/userspace/manager_test.go,_Log.md
-
Timestamp: 2026-05-12T06:55:00Z
- Action: PR #1271 round-2 follow-up — made parent-bound RETH VLAN synthetic ifindex allocation deterministic/config-derived, removed kernel-ifindex seeding, switched to high synthetic range with hard-fail on exhaustion, and added sibling-VLAN determinism regression coverage.
- File(s):
pkg/dataplane/userspace/snapshot.go,pkg/dataplane/userspace/manager_test.go,_Log.md
-
Timestamp: 2026-05-12T00:30:00Z
- Action: PR #1267 round-2 review follow-up — fixed fairness throughput window boundary pruning/rate denominator coupling to prevent false-positive saturation at steady sub-cap traffic, and added a regression test for the 10s-scrape/30s-window boundary case.
- File(s):
pkg/dataplane/userspace/fairness_throughput.go,pkg/dataplane/userspace/fairness_throughput_test.go,_Log.md
-
Timestamp: 2026-05-10T15:24:00Z
- Action: PR #1253 review follow-up — corrected
userspace-dp/src/server/README.mdRSS-indirection behavior to matchpkg/daemon/rss_indirection.go(reshape conditions, workers>=queues stale-table cleanup restore path, and queue concentration semantics). - File(s):
userspace-dp/src/server/README.md,_Log.md
- Action: PR #1253 review follow-up — corrected
-
Timestamp: 2026-05-10T15:05:00Z
- Action: PR #1253 review pass — corrected
pkg/configstore/README.mdencryption-key location wording to matchmaster.keyunder the configstore DB directory (db.dir) and removed stale/etc/xpf/config-keypath guidance. - File(s):
pkg/configstore/README.md,_Log.md
- Action: PR #1253 review pass — corrected
-
Timestamp: 2026-05-10T04:06:32Z
- Action: PR comment follow-up review for
docs/per-5-tuple/state.md— replaced a non-existent memory-file reference with in-repo issue/table references (#836/#937/#1215) to keep the fairness section self-contained and verifiable. - File(s):
docs/per-5-tuple/state.md,_Log.md
- Action: PR comment follow-up review for
- Timestamp: 2026-05-07T16:13:00Z
- Action: PR #1211 archive doc follow-up — fixed stale/missing cross-references in closure docs, updated per-5-tuple state to mark Path 2 closed with archive links, and clarified memory-hook wording as external memory (not an in-repo file).
- File(s):
docs/per-5-tuple/path2-archive/CLOSING-RATIONALE.md,docs/per-5-tuple/state.md
- Timestamp: 2026-04-19
- Action: Fold Codex round-1 review into the Architect plan. Close 3 HIGH findings: small-batch amortization collapse (§3.1 per-commit stamping with honest
inserted == 1worst-case), relaxed-atomic cross-CPU visibility (§3.6.a Relaxed+documented, invariants 6/7 rewritten, §8 hard-stop #4 uses bounded-skew delta), sidecar false-sharing (§3.3 per-binding single-writer confirmed byRc<WorkerUmemInner>+shared_umem=falsein code). Rewrite §3.4 overhead budget with three operating-point numbers (inserted=256/64/1) and a correct per-queue denominator (481 ns/pkt at 25 Gbps), not per-worker. Rewrite §11.3 Bonferroni family to match actual composite tests (3/cell × 12 cells = 36, not 192). Close MED #5 (sentinel vs clock-0), MED #6 (sidecar size 192 KiB, not 64 KiB), MED #7 (wire-size growth), MED #8 (Bonferroni family), MED #9 (two-thread test replaced by partial-batch / retry-unwind / bounded-skew tests). Close LOW #10 (nonow_nsreuse), LOW #13 (named const asserts + boundary test).- File(s):
docs/pr/812-tx-latency-histogram/plan.md
- File(s):
- Timestamp: 2026-04-17
- Action: Issue #678 — Architect plan for remaining hot-path CPU cuts. Remeasured on loss userspace cluster (master 7c1e55b9): poll_binding 10.4%/10.6% (down from issue's 13.4%/13.3%), enqueue_pending_forwards 0.71%/<1% (down from 4.3%/3.7%), apply_nat_ipv6 <1% (down from 3.2% IPv6). Recommendation: Option A — split poll_binding into orchestration shell + per-descriptor hot path as a measurement-first structural refactor. Options B/C/D deferred as issues; Option F (close as subsumed) is the expected path post-split.
- File(s): docs/678-hotpath-cuts-plan.md
- Timestamp: 2026-04-03
- Action: Issue #547 — Split pkg/grpcapi/server.go (8411 lines) into 8 domain files. Mechanical move of functions, no logic changes. server.go reduced to 241 lines (types + server lifecycle).
- File(s): pkg/grpcapi/server.go, server_config.go, server_show.go, server_nat.go, server_routing.go, server_diag.go, server_helpers.go, server_dhcp.go, server_cluster.go
-
Timestamp: 2026-04-07T00:00:00Z
- Action: Issue #545 — Split
pkg/config/compiler.go(5878 lines) into 8 domain-specific files. Mechanical refactor, no logic changes. Functions moved to: compiler_security.go, compiler_interfaces.go, compiler_protocols.go, compiler_ipsec.go, compiler_routing.go, compiler_firewall.go, compiler_system.go, compiler_services.go. compiler.go retains top-level dispatch + applications + validators (793 lines). - File(s): pkg/config/compiler.go, pkg/config/compiler_security.go, pkg/config/compiler_interfaces.go, pkg/config/compiler_protocols.go, pkg/config/compiler_ipsec.go, pkg/config/compiler_routing.go, pkg/config/compiler_firewall.go, pkg/config/compiler_system.go, pkg/config/compiler_services.go
- Action: Issue #545 — Split
-
Timestamp: 2026-04-07T00:00:00Z
- Action: Issue #532 — Fix IPv6 TTL-expired (hop limit exceeded) probe responses not being returned in userspace dataplane. Added TTL/hop-limit check with ICMP Time Exceeded generation to both session-hit and flow-cache-hit paths. Previously only the session-miss path generated TE responses; subsequent packets hitting an existing session or flow cache were silently dropped when TTL<=1 because the rewrite functions returned None without generating a response.
- File(s): userspace-dp/src/afxdp.rs
-
Timestamp: 2026-04-05T22:00:00Z
- Action: Issue #485 — Fix TCP stream death on failback (node1→node0). Three fixes: (1) Reorder cluster Primary handler: set rg_active + pre-install neighbors BEFORE ForceRGMaster so BPF can forward the first packet arriving after VRRP installs VIPs. (2) Reorder cluster Secondary handler: run preflight (flow cache flush to FabricRedirect) BEFORE ResignRG so traffic shifts to fabric before VRRP removes VIPs. (3) Add syncMsgPrepareActivation message: demoting node notifies peer to pre-warm neighbor cache after preflight completes, giving the activating node a head start on ARP/NDP resolution.
- File(s): pkg/daemon/daemon_ha.go, pkg/cluster/sync.go
-
Timestamp: 2026-04-05T12:00:00Z
- Action: Fix TCP stream death on failback due to cold ARP cache on standby node. Root cause:
resolveNeighborsInner()usednetlink.RouteGet()to find the outgoing interface for static route next-hops, but on standby nodes the kernel route doesn't exist (FRR only installs it on the active). AddedaddByIPOrConfig()fallback that resolves the outgoing interface from config by matching the next-hop IP against configured interface subnets when the kernel FIB lookup fails. - File(s): pkg/daemon/daemon.go
- Action: Fix TCP stream death on failback due to cold ARP cache on standby node. Root cause:
-
Timestamp: 2026-04-05T10:00:00Z
- Action: Issue #475 — Fix 0 throughput on pre-existing TCP streams after failover+failback. Root cause:
prewarm_reverse_synced_sessions_for_owner_rgspublished USERSPACE_SESSIONS BPF map entries for reverse sessions but not forward sessions during RG activation. Forward sessions relied on async worker processing, creating a window where the XDP shim had no REDIRECT entry. Added synchronous BPF map publishing for forward sessions in prewarm, plus a comprehensiverepublish_bpf_session_entries_for_owner_rgsthat iterates ALL sessions in thesessionsowner-RG index (not just thereverse_prewarmsubset) to ensure no session is missed. - File(s): userspace-dp/src/afxdp/shared_ops.rs, userspace-dp/src/afxdp/ha.rs, userspace-dp/src/afxdp/session_glue.rs
- Action: Issue #475 — Fix 0 throughput on pre-existing TCP streams after failover+failback. Root cause:
-
Timestamp: 2026-04-05T08:00:00Z
- Action: Issue #473 — Fix XSK bindings BPF map going stale after peer crash+reconnect. Added
verifyBindingsMapLocked()watchdog to the 1s status poll loop. AfterapplyHelperStatusLockedruns, the watchdog reads each BPFuserspace_bindingsentry and compares it against the helper's reported binding state. If a queue is Registered+Armed in the helper but the BPF map entry is all zeros, the watchdog rewrites the entry. Also repairs aliased bindings (VLAN children). This prevents silent transit traffic drops when a Compile() or HA transition zeroes the bindings map without repopulating it. - File(s): pkg/dataplane/userspace/manager.go
- Action: Issue #473 — Fix XSK bindings BPF map going stale after peer crash+reconnect. Added
-
Timestamp: 2026-04-05T06:55:00Z
- Action: Issue #466 — Fix bulk sync triggering on every reconnect/fabric-flip. Added
bulkEverCompletedatomic flag to SessionSync that tracks whether a full bulk exchange has ever completed during the daemon's lifetime.handleNewConnectionnow only triggersdoBulkSyncon true cold start (flag is false). Active-fabric changes no longer trigger bulk at all. Daemon'sonSessionSyncPeerConnected/onSessionSyncPeerDisconnectedpreserve primed state and sync readiness whenbulkEverCompletedis true. - File(s): pkg/cluster/sync.go, pkg/cluster/sync_test.go, pkg/daemon/daemon_ha.go, pkg/daemon/session_sync_readiness_test.go
- Action: Issue #466 — Fix bulk sync triggering on every reconnect/fabric-flip. Added
-
Timestamp: 2026-04-04T21:30:00Z
- Action: Issue #467 — Fix bulk-prime retry loop not restarting after failed demotion barrier.
prepareUserspaceRGDemotionWithTimeoutstopped the retry loop by advancingsyncPrimeRetryGenbefore waiting on barriers, but on barrier failure returned without restarting the loop, stranding the peer in an unprimed state. Added a defer that restartsstartSessionSyncPrimeRetryon failure when peer is still connected and not yet primed. - File(s): pkg/daemon/daemon_ha.go
- Action: Issue #467 — Fix bulk-prime retry loop not restarting after failed demotion barrier.
-
Timestamp: 2026-04-04T20:50:00Z
- Action: Issue #458 — Fix session sync barrier timeout on second failover cycle. Root cause:
handleDisconnectresetbarrierSeqto 0, causing sequence collisions between stale goroutines and new barriers. Also closed waiter channels on disconnect to prevent goroutine leaks. AddedbarrierAckSeqcheck inWaitForPeerBarrierto distinguish disconnect from ack. - File(s): pkg/cluster/sync.go, pkg/cluster/sync_bulk.go, pkg/cluster/sync_test.go
- Action: Issue #458 — Fix session sync barrier timeout on second failover cycle. Root cause:
-
Timestamp: 2026-04-03T18:00:00Z
- Action: Issue #457 — Fix standby losing userspace readiness after partial RG demotion. The rgTransitionInFlight flag in UpdateRGActive was unconditionally set for both activation and demotion. During demotion, this caused ctrl.Enabled=0 in the BPF map globally, disrupting forwarding for other active RGs. Now only set rgTransitionInFlight during activation transitions; demotion leaves ctrl enabled so other RGs continue forwarding.
- File(s): pkg/dataplane/userspace/manager_ha.go, pkg/dataplane/userspace/manager_test.go
-
Timestamp: 2026-04-03T12:00:00Z
- Action: Issue #451 — Fix neighbor miss spike after RG failover. Part 1: resolve config-based next-hops synchronously during RG activation (VRRP MASTER and cluster-primary paths) using new
resolveNeighborsImmediatevariant that sends ARP probes without blocking for replies. Part 2: increase failover test neighbor miss threshold from 20 to 60 to accommodate observed spikes of 25-52. - File(s): pkg/daemon/daemon.go, pkg/daemon/daemon_ha.go, scripts/userspace-ha-failover-validation.sh
- Action: Issue #451 — Fix neighbor miss spike after RG failover. Part 1: resolve config-based next-hops synchronously during RG activation (VRRP MASTER and cluster-primary paths) using new
-
Timestamp: 2026-04-03T10:22:00Z
- Action: Issue #418 — Replace bulk session sync with event stream replay on connect. Added
export_all_sessions_to_event_stream()to Rust Coordinator that iterates shared sessions and pushes Open events through the event stream. Added"export_all_sessions"control request handler. Go daemon'sbulkSyncViaEventStreamOrFallback()tries event stream export first, falls back to legacy BulkSync. - File(s): userspace-dp/src/afxdp/ha.rs, userspace-dp/src/main.rs, pkg/dataplane/userspace/manager_ha.go, pkg/daemon/daemon_ha.go, pkg/daemon/userspace_sync_test.go
- Action: Issue #418 — Replace bulk session sync with event stream replay on connect. Added
- Timestamp: 2026-04-02T20:34:00Z
- Action: Issue #403 — Planned failover must not depend on bulk sync. Added priority barrierCh channel to SessionSync so barriers/acks bypass bulk data in sendLoop. Removed syncPeerBulkPrimed gate from demotion prep. Reduced manual failover barrier timeout to 5s.
- File(s): pkg/cluster/sync.go, pkg/cluster/sync_bulk.go, pkg/daemon/daemon_ha.go, pkg/cluster/sync_test.go
-
Timestamp: 2026-04-01T05:30:00Z
- Action: Merged PR #301 (userspace forwarding and failover gap audit doc)
- File(s): docs/userspace-forwarding-and-failover-gap-audit.md
-
Timestamp: 2026-04-01T06:00:00Z
- Action: Implemented strict userspace mode, HA install fence, deterministic reverse companions (PR #313, issues #302-#312)
- File(s): pkg/dataplane/userspace/manager.go, pkg/dataplane/userspace/protocol.go, pkg/cluster/cluster.go, pkg/cluster/sync.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/session_glue.rs, userspace-dp/src/main.rs, userspace-xdp/src/lib.rs, docs/ha-forwarding-state-inventory.md, docs/bugs.md, docs/phases.md
-
Timestamp: 2026-04-01T06:30:00Z
- Action: Address PR #313 copilot review findings — rename STRICT_PASS_BLOCKED, strict ctrl=0 drop, mode reporting, fallback names, VLAN sub-interface exclusion
- File(s): pkg/dataplane/userspace/manager.go, userspace-xdp/src/lib.rs, docs/phases.md
-
Timestamp: 2026-04-01T13:52:00Z
- Action: Fix HA session sync starvation — async bulk ack, HA sync throttle 5s, 6 retries (ba1c4304)
- File(s): pkg/cluster/sync.go, pkg/daemon/daemon.go, pkg/dataplane/userspace/manager.go
-
Timestamp: 2026-04-01T14:44:00Z
- Action: Replace bulk-sync gate with barrier check for failover readiness (e42c882e)
- File(s): pkg/daemon/daemon.go, pkg/daemon/userspace_sync_test.go
-
Timestamp: 2026-04-01T15:39:00Z
- Action: Explicit refresh_owner_rgs on RG activation + async barrier ack (a9e0501e)
- File(s): pkg/dataplane/userspace/manager.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/session_glue.rs, userspace-dp/src/afxdp/types.rs, userspace-dp/src/main.rs, pkg/cluster/sync.go
-
Timestamp: 2026-04-01T15:59:00Z
- Action: Re-resolve synced sessions with owner_rg_id=0 on active node (7417144e)
- File(s): userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-01T16:10:00Z
- Action: Add logging rules to CLAUDE.md, remove debug eprintln (12478964)
- File(s): CLAUDE.md, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-01T16:59:00Z
- Action: Mirror reverse sessions to helper, worker-completion ack, logging rules (#314, #315, #316) (24166737)
- File(s): CLAUDE.md, pkg/daemon/daemon.go, pkg/dataplane/userspace/manager.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/types.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-01T17:00:00Z
- Action: Route barrier/bulk acks through sendCh instead of direct writeMu (9d2814c4)
- File(s): pkg/cluster/sync.go
-
Timestamp: 2026-04-01T19:32:00Z
- Action: Fix RefreshOwnerRGs skipped synced sessions — refresh_for_ha_activation (71b80b3d). THE key SNAT fix.
- File(s): userspace-dp/src/session.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-01T20:20:00Z
- Action: Simplify HA failover — epoch flow cache, resolve-on-receipt, owner_rg_id, demotion (#325, #326, #327, #330) (a21018f3)
- File(s): pkg/daemon/daemon.go, pkg/dataplane/userspace/manager.go, pkg/dataplane/userspace/manager_test.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/types.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-01T21:52:00Z
- Action: Write userspace sessions to BPF conntrack map for zone/interface display (fab9230c)
- File(s): pkg/dataplane/dataplane.go, pkg/dataplane/userspace/manager.go, pkg/dataplane/userspace/protocol.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/bpf_map.rs, userspace-dp/src/afxdp/session_glue.rs, userspace-dp/src/afxdp/types.rs, userspace-dp/src/main.rs
-
Timestamp: 2026-04-01T22:00:00Z
- Action: Use BPF_ANY for conntrack map writes (244912f8)
- File(s): userspace-dp/src/afxdp/bpf_map.rs
-
Timestamp: 2026-04-01T22:30:00Z
- Action: Userspace/eBPF audit — counters, conntrack flush bugs, session visibility (PR #336, issues #332-#335)
- File(s): pkg/conntrack/gc.go, pkg/daemon/daemon.go, pkg/dataplane/dataplane.go, pkg/dataplane/dpdk/dpdk_cgo.go, pkg/dataplane/dpdk/dpdk_stub.go, pkg/dataplane/maps.go, pkg/dataplane/userspace/manager.go, pkg/dataplane/userspace/manager_test.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/bpf_map.rs
-
Timestamp: 2026-04-01T23:00:00Z
- Action: Address PR #336 copilot review — idle time, BPF_EXIST, counter race, safeDelta, RX counter, flush cutoff (d15d5629)
- File(s): pkg/dataplane/loader.go, pkg/dataplane/maps.go, pkg/dataplane/userspace/manager.go, pkg/dataplane/userspace/manager_test.go, userspace-dp/src/afxdp/bpf_map.rs
-
Timestamp: 2026-04-01T23:15:00Z
- Action: Thread conntrack FDs through DeleteSynced for BPF cleanup (671e5561)
- File(s): userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-01T23:30:00Z
- Action: Unify synced flag + adaptive event-first session sync (#328, #320) (dcc59c67)
- File(s): pkg/daemon/daemon.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/bpf_map.rs, userspace-dp/src/afxdp/forwarding.rs, userspace-dp/src/afxdp/session_glue.rs, userspace-dp/src/afxdp/tunnel.rs, userspace-dp/src/event_stream.rs, userspace-dp/src/main.rs, userspace-dp/src/session.rs
-
Timestamp: 2026-04-02T18:05:00Z
- Action: Start
#400— separate transfer readiness from takeover readiness in cluster status and explicit peer-failover admission, with daemon wiring for session-sync transfer-readiness reasons - File(s): pkg/cluster/cluster.go, pkg/cluster/cluster_test.go, pkg/daemon/daemon_ha.go, pkg/daemon/userspace_sync_test.go
- Action: Start
-
Timestamp: 2026-04-02T17:20:00Z
- Action: Start
#398fix — add explicit session-sync transfer-readiness snapshot and fast-fail manual failover demotion when bulk receive or pending bulk ack proves the sync path is not settled; filed#400for exposing transfer readiness separately from takeover readiness - File(s): pkg/cluster/sync.go, pkg/cluster/sync_bulk.go, pkg/cluster/sync_test.go, pkg/daemon/daemon_ha.go, pkg/daemon/userspace_sync_test.go
- Action: Start
-
Timestamp: 2026-04-02T16:45:00Z
- Action: Validate
#397onloss-userspace-cluster— settled RG0 manual failover now completes on explicit failover ack + commit ack instead of heartbeat observation; filed residual issue#398for failover admission while requester is still in bulk receive - File(s): testing-docs/manual-failover-transfer-commit-validation.md, testing-docs/README.md
- Action: Validate
-
Timestamp: 2026-04-02T13:15:00Z
- Action: Second #390 slice — add explicit sync-channel failover ack handshake so manual RG transfer returns applied/rejected instead of inferring success from send-only behavior
- File(s): pkg/cluster/sync.go, pkg/cluster/sync_test.go, pkg/cluster/cluster.go, pkg/daemon/daemon_ha.go, pkg/cli/cli.go
-
Timestamp: 2026-04-02T13:45:00Z
- Action: Third #390 slice — wait for actual local RG promotion after peer transfer-out ack so CLI/local control returns on observed ownership, not just request delivery
- File(s): pkg/cluster/cluster.go, pkg/cluster/cluster_test.go, pkg/cli/cli.go
-
Timestamp: 2026-04-02T14:15:00Z
- Action: Address PR #396 copilot review — typed remote-failover rejection, failover request IDs, out-of-range RG guard, timeout race guard, active-conn ack routing, and consistent gRPC wording
- File(s): pkg/cluster/sync.go, pkg/cluster/sync_test.go, pkg/daemon/daemon_ha.go, pkg/grpcapi/server.go
-
Timestamp: 2026-04-02T16:30:00Z
- Action: Next #390 slice — replace heartbeat-observed manual failover completion with explicit sync-channel transfer commit, local primary commit, peer transfer-out finalization, and commit-ack coverage
- File(s): pkg/cluster/cluster.go, pkg/cluster/cluster_test.go, pkg/cluster/sync.go, pkg/cluster/sync_test.go, pkg/daemon/daemon_ha.go, pkg/cli/cli.go, pkg/grpcapi/server.go
-
Timestamp: 2026-04-02T17:05:00Z
- Action: Address PR #397 Copilot review — preserve in-flight peer transfer-out state across heartbeat refreshes until transfer commit completes or aborts
- File(s): pkg/cluster/cluster.go, pkg/cluster/cluster_test.go
-
Timestamp: 2026-04-02T12:30:00Z
- Action: First #390 slice — replace weight-zero manual failover with explicit secondary-hold transfer-out state, keep ForceSecondary on zero-weight drain semantics, and teach election to promote on peer transfer-out without mutating monitor weight
- File(s): pkg/cluster/cluster.go, pkg/cluster/election.go, pkg/cluster/cluster_test.go, pkg/cluster/election_test.go, pkg/cluster/sync.go
-
Timestamp: 2026-04-02T01:30:00Z
- Action: Merged PR #337 (HA simple failover design doc). Fixed copilot review — issue reference swap in phases 3/5/6.
- File(s): docs/ha-simple-failover-design.md
-
Timestamp: 2026-04-02T02:00:00Z
- Action: Fix HA activation cleanup — deduplicate refresh, skip resolved, log mirror errors (#341, #342, #345, #346) (31b600d5)
- File(s): pkg/dataplane/userspace/manager.go, userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-02T02:30:00Z
- Action: Fix watchdog threshold (2→10s), reverse companion leak on delete, remove debug eprintln (#349, #351, #352) (52254b7e)
- File(s): pkg/dataplane/userspace/manager.go, userspace-dp/src/afxdp.rs, userspace-dp/src/main.rs
-
Timestamp: 2026-04-02T03:30:00Z
- Action: Simplify HA — remove refresh RPC, skip blackhole routes, dead code cleanup, throttle post-transition sync (#353, #354, #355, #356) (5ac423a3)
- File(s): pkg/dataplane/userspace/manager.go, pkg/daemon/daemon.go
-
Timestamp: 2026-04-02T06:00:00Z
- Action: Merged PR #357 (flow cache simplification refactors). Implemented phases 3+4 from docs/flow-cache-simplification.md — explicit is_cacheable() + 10 unit tests (624a1f83)
- File(s): userspace-dp/src/afxdp/types.rs, docs/flow-cache-simplification.md
-
Timestamp: 2026-04-02T11:45:00Z
- Action: Added HA failover implementation plan tying current simplification audit to executable phases and issue dependencies (49eaf9d6)
- File(s): docs/ha-failover-implementation-plan.md, docs/ha-failover-simplification-audit.md
-
Timestamp: 2026-04-03T00:16:04Z
- Action: First #389 slice — add derived owner-RG indexes for helper shared session stores and use them for demotion-time BPF cleanup and shared-session demotion without whole-table scans
- File(s): userspace-dp/src/afxdp.rs, userspace-dp/src/afxdp/types.rs, userspace-dp/src/afxdp/ha.rs, userspace-dp/src/afxdp/shared_ops.rs, userspace-dp/src/afxdp/session_glue.rs, userspace-dp/src/afxdp/forwarding.rs, userspace-dp/src/afxdp/tunnel.rs
-
Timestamp: 2026-04-03T00:34:06Z
- Action: Address PR #404 Copilot review — make owner-RG index updates heal missing same-owner entries and serialize demotion-time key collection against in-flight shared-session publishes
- File(s): userspace-dp/src/afxdp/shared_ops.rs, userspace-dp/src/afxdp/ha.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-03T00:42:41Z
- Action: Second #389 slice — add reverse-prewarm owner-RG candidate indexes so HA activation prewarm targets only affected synced forward sessions instead of scanning the full shared forward map
- File(s): userspace-dp/src/afxdp/types.rs, userspace-dp/src/afxdp/shared_ops.rs, userspace-dp/src/afxdp/ha.rs, userspace-dp/src/afxdp/session_glue.rs
-
Timestamp: 2026-04-03T01:01:45Z
- Action: Final #389 slice — index worker-local sessions by owner RG and use those indexes for export, demotion, and activation refresh so helper HA apply no longer scans the full live session table
- File(s): userspace-dp/src/session.rs, userspace-dp/src/afxdp/session_glue.rs, _Log.md
-
Timestamp: 2026-04-03T03:00:15Z
- Action: Applied Copilot review fixes for stacked #389 PRs — make reverse-prewarm owner-RG index updates lock once per refresh, restore derived indexes on rejected session updates, and remove unnecessary hot-path clones
- File(s): userspace-dp/src/afxdp/shared_ops.rs, userspace-dp/src/session.rs, userspace-dp/src/afxdp/session_glue.rs, userspace-dp/src/afxdp.rs, _Log.md
- Action: Wire BulkSyncOverride in daemon_ha.go so initial bulk sync uses event stream
- File(s):
pkg/daemon/daemon_ha.go
- File(s):
- Action: Fix stuck bulk receive state on disconnect — reset bulkInProgress in handleDisconnect
- File(s):
pkg/cluster/sync.go
- File(s):
- Action: Add sendBulkMarkers() to send empty BulkStart/BulkEnd after event stream export
- File(s):
pkg/cluster/sync_bulk.go
- File(s):
- Action: Fix HA session promotion — push forward sessions to workers + bump rg_epochs on activation
- File(s):
userspace-dp/src/afxdp/ha.rs,userspace-dp/src/afxdp/shared_ops.rs,userspace-dp/src/afxdp/session_glue.rs
- File(s):
- Bulk sync completes correctly on both nodes (event stream + bulk markers)
- Transfer ready: yes on both nodes after deploy
- Manual failover test PASSES: iperf3 -P2 at 11 Gbps survives RG move with no visible throughput drop
- Automated script reports false failure (samples at exact transition moment)
- Action: Add mark_ecn_ce_ipv4 / mark_ecn_ce_ipv6 / maybe_mark_ecn_ce / apply_cos_admission_ecn_policy; wire into enqueue_cos_item
- File(s):
userspace-dp/src/afxdp/tx.rs
- File(s):
- Action: Add CoSQueueDropCounters.admission_ecn_marked field + protocol/worker/coordinator aggregation
- File(s):
userspace-dp/src/afxdp/types.rs,userspace-dp/src/afxdp/worker.rs,userspace-dp/src/afxdp/coordinator.rs,userspace-dp/src/protocol.rs
- File(s):
- Result: 16 new tests (11 marker, 5 admission); full suite 667 pass / 0 fail (baseline 651); Local variant only, Prepared deferred to #718-followup
- Action: Add
maybe_mark_ecn_ce_prepared(req, umem)helper; extendapply_cos_admission_ecn_policyto handle bothCoSPendingTxItem::Localand::Preparedunder a singleadmission_ecn_markedcounter; take a shared&MmapAreainsideenqueue_cos_itemvia split-borrow and thread it to the policy call- File(s):
userspace-dp/src/afxdp/tx.rs
- File(s):
- Action: Add 5 Prepared-variant admission tests (IPv4 ECT(0), IPv6 ECT(0), NOT-ECT, out-of-range offset, combined Local+Prepared counter pin); remove stale
admission_does_not_mark_prepared_variantnegative pin- File(s):
userspace-dp/src/afxdp/tx.rs
- File(s):
- Result: admission_ecn group 11/11 pass, mark_ecn_ce group 11/11 pass, full suite 680/680 pass. Marker now fires on the XSK-RX→XSK-TX zero-copy hot path (iperf3, NAT'd flows); acceptance target per
docs/cos-validation-notes.mdisecn_markedbecoming non-zero during live 16-flow iperf3
- Action: Add
DRAIN_HIST_BUCKETS = 16const-asserted,bucket_index_for_nsbranchless helper,drain_latency_hist+drain_invocations+drain_noop_invocations+redirect_acquire_hist+redirect_sample_counter+pps_owner_vs_peeronBindingLiveState; addnew_seeded(worker_id)constructor so per-worker redirect samples don't lockstep- File(s):
userspace-dp/src/afxdp/umem.rs
- File(s):
- Action: Time every
drain_shaped_txinvocation with one pair ofmonotonic_nanos()calls; count owner-local vs peer-redirected packets oningest_cos_pending_txsplit-point; sampleenqueue_tx_owned1-in-256 producer-side- File(s):
userspace-dp/src/afxdp/tx.rs,userspace-dp/src/afxdp/umem.rs
- File(s):
- Action: Extend
CoSQueueStatusserde with histograms + owner/peer pps; populate from owner binding's live snapshot inbuild_worker_cos_statuseswithmaxaggregation across workers (only owner writes non-zero). Cross-worker coordinator aggregation mirrorsadmission_ecn_markedshape- File(s):
userspace-dp/src/protocol.rs,userspace-dp/src/afxdp/worker.rs,userspace-dp/src/afxdp/coordinator.rs
- File(s):
- Action: Go-side protocol mirror +
OwnerProfile:line inshow class-of-service interfaceunder the existingDrops:line (only for exact queues with named owner)- File(s):
pkg/dataplane/userspace/protocol.go,pkg/dataplane/userspace/cosfmt.go,pkg/dataplane/userspace/cosfmt_test.go
- File(s):
- Action: Prometheus gauges/counters for
xpf_cos_drain_latency_ns_bucket,xpf_cos_drain_invocations_total,xpf_cos_redirect_acquire_ns_bucket,xpf_cos_owner_pps,xpf_cos_peer_pps. Cardinality ≤ 16896 series (within plan §5 envelope)- File(s):
pkg/api/metrics.go
- File(s):
- Action: New "Reading the owner-profile counters" section with decision tree mapping drain_p99 / redirect_p99 / owner_pps ratio to #709 Option B/C/D follow-ups
- File(s):
docs/cos-validation-notes.md
- File(s):
- Result: 7 new Rust tests (+692 total, baseline 685), 3 new Go tests; full
cargo test+go test ./...green. Telemetry-only: no hot-path allocations, no new syscalls, MPSC invariants preserved, histogram bucket select branchless
- Timestamp: 2026-04-17
- Action: Write #708 enqueue-pacing architect plan — Option B (per-SFQ-bucket token bucket), measurement-first, pacing gate strictly AFTER ECN marker to preserve #718 invariants. Honest framing on residual retrans (most of the ~100k retrans signal is likely ECN-induced recovery entries, not wire loss, so pacing is unlikely to move retrans meaningfully; §3 says so explicitly)
- File(s):
docs/708-enqueue-pacing-plan.md(new)
- Timestamp: 2026-04-21
- Action: Codex HIGH-1 — drop stale
worker-tids.txtbefore launching step1; install SIGINT/SIGTERM trap- File(s):
test/incus/step2-sched-switch-capture.sh
- File(s):
- Action: Codex HIGH-2 — reducer drift halt stamps
suspect_reason: "drift_ge_5s"on every JSONL line and exits 5 (H-STOP-5); classifier detects sentinel and emitsverdict=SUSPECT; optional--drift-halt-markersidecar; summary log line surfacessuspect_reason- File(s):
test/incus/step2-sched-switch-reduce.py,test/incus/step2-sched-switch-classify.py,test/incus/step2-sched-switch-capture.sh
- File(s):
- Action: Codex HIGH-3 — capture adds
perf record -k CLOCK_REALTIMEandperf script --ns; reducer treats perf timestamps as absolute unix wall-clock ns and drops first-event offsetting; PERF_START_NS is diagnostic only (drift measurement)- File(s):
test/incus/step2-sched-switch-capture.sh,test/incus/step2-sched-switch-reduce.py
- File(s):
- Action: Codex MEDIUM-4 — restore plan §4.1
stat_runtime_check±1% accounting check against(block_duration * n_workers - total_off_cpu)- File(s):
test/incus/step2-sched-switch-reduce.py
- File(s):
- Action: Codex LOW-5 — classifier meta.json top-level is plan-contracted
{verdict, rho, pvalue, duty_cycle_pct, warn_blocks}; extras moved todiagnosticsub-object- File(s):
test/incus/step2-sched-switch-classify.py
- File(s):
- Action: Codex LOW-6 — G8.2 grep uses
grep -qEwith whitespace-tolerant pattern; G8.3 perf-record stderr no longer suppressed- File(s):
test/incus/step2-sched-switch-capture.sh
- File(s):
- Action: Codex LOW-7 — add
TestReducerNegativeWakeDeltasuite with wake-before-switch and equal-ts exercises documenting branch unreachability under monotonic perf- File(s):
test/incus/step2-sched-switch-reduce_test.py
- File(s):
- Action: pyshell M1 — SIGINT/SIGTERM trap added in capture.sh
- File(s):
test/incus/step2-sched-switch-capture.sh
- File(s):
- Action: pyshell M2 —
reduce_eventsdocstring moved to first statement per PEP 257- File(s):
test/incus/step2-sched-switch-reduce.py
- File(s):
- Result:
python3 -m py_compileOK on all 4 modified.pyfiles; reducer tests 13/13 green (was 10, +3 new); classifier tests 11/11 green (was 8, +3 new); V8 non-regression preserved (step1-histogram-classify.pyunchanged)
- Action: Codex HIGH-1 — drop stale
- Timestamp: 2026-05-10T03:24:56Z
- Action: Correct stale filename references in module READMEs (
eventengine.go/dhcprelay.go->engine.go/relay.go) so entry-point file:line links resolve. - File(s):
pkg/eventengine/README.md,pkg/dhcprelay/README.md
- Action: Correct stale filename references in module READMEs (
- Timestamp: 2026-05-10T05:18:20Z
- Action: Correct
SessionCloseDataattribution tologging.EventReadersession-close records (not conntrack GC delete callbacks). - File(s):
pkg/flowexport/README.md
- Action: Correct
- Timestamp: 2026-05-10T05:18:20Z
- Action: Correct configstore encryption note to match implementation (
master.key+ HKDF with configured PRF). - File(s):
pkg/configstore/README.md
- Action: Correct configstore encryption note to match implementation (
-
Timestamp: 2026-05-12T06:50:50Z
- Action: Round-3 follow-up — tighten verdict JSON detection to the canonical fairness-eval verdict-key set; remove the
os.getpgidtimeout race by using the process-group leader PID directly; add a bounded post-killcommunicate(); remove a stale threshold-source reference. - File(s): test/incus/fairness_multi_sample.py, test/incus/fairness_multi_sample_test.py, docs/per-5-tuple/v8-multi-sample.md
- Action: Round-3 follow-up — tighten verdict JSON detection to the canonical fairness-eval verdict-key set; remove the
-
Timestamp: 2026-05-12T07:45:00Z
- Action: PR #1273 Copilot follow-up — align multi-sample verdict filtering with the canonical 10-key fairness-eval schema and validate summary numeric fields (
cstruct,gap, optionalaggregate_mbps, and integerstarved_flow_count) instead of onlyobserved_cov. - File(s): test/incus/fairness_multi_sample.py, test/incus/fairness_multi_sample_test.py, docs/per-5-tuple/v8-multi-sample.md, docs/per-5-tuple/state.md, _Log.md
- Action: PR #1273 Copilot follow-up — align multi-sample verdict filtering with the canonical 10-key fairness-eval schema and validate summary numeric fields (
-
Timestamp: 2026-05-12T06:29:25Z
- Action: HIGH1 - Tighten extract_verdict_objects to require verdict+observed_cov+discriminator field
- Action: HIGH2 - Replace subprocess.run with Popen(start_new_session=True)+os.killpg for process-group cleanup on timeout
- Action: MINOR - Replace statistics.fmean with statistics.mean; remove dead timeout_stream_text
- Action: Docs - Add fresh-iperf3 requirement and threshold derivation to v8-multi-sample.md
- Action: Tests - Add schema-incomplete and process-group tests; move import time to top
- File(s): test/incus/fairness_multi_sample.py, test/incus/fairness_multi_sample_test.py, docs/per-5-tuple/v8-multi-sample.md
- Result: 12/12 tests green (was 10)
- Timestamp: 2026-05-12T07:02:16Z
- Action: Fix pgid capture race - capture os.getpgid(proc.pid) immediately after Popen before communicate() can reap the leader; use cached pgid in both _kill_process_group() calls.
- File(s): test/incus/fairness_multi_sample.py
- Result: 14/14 tests green
-
Timestamp: 2026-05-12T06:52:27Z
- Action: PR #1272 round-3 review follow-up — clarify the top-level guard comment to reference
iface_filter_active, and pin guard failure tests onexpected,non-starved, anddir_multsubstrings. - File(s):
userspace-dp/src/bin/fairness-eval.rs,userspace-dp/tests/fairness_eval_blackbox.rs
- Action: PR #1272 round-3 review follow-up — clarify the top-level guard comment to reference
-
Timestamp: 2026-05-12T06:29:24Z
- Action: Fix Harness guard failure message to print
expected_sumanddir_multalongsiden_non_starvedso operators can see the bidirectional expansion factor. Update block comment to correctly describemax(2, floor(10% × expected_sum))formula. - File(s):
userspace-dp/src/bin/fairness-eval.rs
- Action: Fix Harness guard failure message to print
-
Timestamp: 2026-05-12T06:29:24Z
- Action: Rename
guard_low_n_iface_input_accepts_p2_single_direction_recency_undercount→guard_low_n_iface_input_accepts_absolute_floor_p2_gap1; add inline math comment explaining why absolute floor (not recency) is the operative gate. Drop misleading "recency" claim from assertion messages. - File(s):
userspace-dp/tests/fairness_eval_blackbox.rs
- Action: Rename
-
Timestamp: 2026-05-13T20:22:00-07:00
- Action: Enable cross-NIC shared UMEM in the loss userspace HA config, add node-local Phase 0 artifacts, push artifacts during deploy when the config requests shared UMEM, surface shared-UMEM binding mode/role in userspace status, and document the perf/counter contract for copy-free validation.
- File(s):
docs/ha-cluster-userspace.conf,test/incus/cluster-setup.sh,test/incus/loss-userspace-shared-umem-phase0-node0.json,test/incus/loss-userspace-shared-umem-phase0-node1.json,pkg/dataplane/userspace/protocol.go,pkg/dataplane/userspace/statusfmt.go,pkg/dataplane/userspace/statusfmt_test.go,docs/shared-umem-plan.md,docs/userspace-perf-compare.md,_Log.md
-
Timestamp: 2026-05-13T20:44:00-07:00
- Action: Make cross-NIC shared-UMEM selection artifact-driven by default so the HA config no longer hardcodes interface names; add
selected_device_setas the generic artifact key while keepingselected_device_pairas a legacy alias. - File(s):
userspace-dp/src/afxdp/shared_umem.rs,docs/ha-cluster-userspace.conf,docs/shared-umem-plan.md,test/incus/loss-userspace-shared-umem-phase0-node0.json,test/incus/loss-userspace-shared-umem-phase0-node1.json,_Log.md
- Action: Make cross-NIC shared-UMEM selection artifact-driven by default so the HA config no longer hardcodes interface names; add
-
Timestamp: 2026-05-13T20:49:00-07:00
- Action: Make cross-NIC shared UMEM opportunistic by default: no config stanza or Phase 0 artifact is required for normal copy-free forwarding,
mode offremains the debug kill switch, and Phase 0 artifacts are audit-only instead of production gates. - File(s):
userspace-dp/src/afxdp/shared_umem.rs,docs/ha-cluster-userspace.conf,docs/shared-umem-plan.md,pkg/config/ast.go,pkg/config/types.go,test/incus/cluster-setup.sh,README.md,_Log.md
- Action: Make cross-NIC shared UMEM opportunistic by default: no config stanza or Phase 0 artifact is required for normal copy-free forwarding,
-
Timestamp: 2026-05-13T22:30:00-07:00
- Action: Close PR #1301 round-3 blockers: document the intentional PR #1297 contract change, restore Phase 0 as non-blocking runtime audit, retry failed shared-UMEM groups as private UMEM, publish fallback status through live binding snapshots, and route cancellable foreign-slot prepared recycles through the shared recycle queue when worker context is available.
- File(s):
userspace-dp/src/afxdp/shared_umem.rs,userspace-dp/src/afxdp/worker/mod.rs,userspace-dp/src/afxdp/umem/mod.rs,userspace-dp/src/afxdp/types/runtime.rs,userspace-dp/src/afxdp/tx/transmit.rs,userspace-dp/src/afxdp/tx/dispatch.rs,userspace-dp/src/afxdp/tx/drain.rs,userspace-dp/src/afxdp/session_glue/mod.rs,userspace-dp/src/afxdp/session_delta.rs,userspace-dp/src/afxdp/worker/lifecycle.rs,pkg/dataplane/userspace/statusfmt.go,pkg/dataplane/userspace/statusfmt_test.go,pkg/config/types.go,docs/shared-umem-plan.md,_Log.md - Validation:
cargo test --manifest-path userspace-dp/Cargo.toml shared_umem -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml remember_prepared_recycle -- --nocapture;go test ./pkg/dataplane/userspace ./pkg/config
-
Timestamp: 2026-05-13T23:35:00-07:00
- Action: Close PR #1301 round-4 recycle-routing and mixed-mode safety blockers: thread the shared recycle accumulator through close-delta purge, pending TX bound/drop, CoS enqueue demotion, cross-binding prepared redirect, queue-service prepared rejection, neighbor retry, CoS runtime reset, and worker-shaped request paths; remove local-only prepared recycle exports; remove arbitrary-binding fallback for unknown shared recycle slots.
- File(s):
userspace-dp/src/afxdp/tx/transmit.rs,userspace-dp/src/afxdp/tx/mod.rs,userspace-dp/src/afxdp/tx/dispatch.rs,userspace-dp/src/afxdp/tx/drain.rs,userspace-dp/src/afxdp/tx/cos_classify.rs,userspace-dp/src/afxdp/tx/tcp_segmentation.rs,userspace-dp/src/afxdp/cos/cross_binding.rs,userspace-dp/src/afxdp/cos/queue_service/mod.rs,userspace-dp/src/afxdp/cos/queue_service/service.rs,userspace-dp/src/afxdp/cos/queue_service/drain.rs,userspace-dp/src/afxdp/session_glue/mod.rs,userspace-dp/src/afxdp/session_delta.rs,userspace-dp/src/afxdp/neighbor_dispatch.rs,userspace-dp/src/afxdp/worker/cos.rs,userspace-dp/src/afxdp/worker/lifecycle.rs,userspace-dp/src/afxdp/worker/mod.rs,userspace-dp/src/afxdp/tx/README.md,userspace-dp/src/afxdp/cos/README.md,docs/shared-umem-plan.md,_Log.md - Validation:
cargo test --manifest-path userspace-dp/Cargo.toml cancelled_prepared --no-run;cargo test --manifest-path userspace-dp/Cargo.toml shared_umem -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml cancelled_prepared -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml drain_exact_prepared -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml demote_prepared -- --nocapture;go test ./pkg/dataplane/userspace ./pkg/config;git diff --check
-
Timestamp: 2026-05-14T18:55:40Z
- Action: #1307 minimal TX-error attribution: add
tx_shared_recycle_unknown_slot_dropsas a per-binding subset oftx_errorsfor shared-UMEM unknown-slot recycle drops, mirror it through Rust/Go status, and make the local fallbackTxError::Droppath incrementtx_submit_error_drops. - File(s):
userspace-dp/src/afxdp/umem/mod.rs,userspace-dp/src/afxdp/worker/mod.rs,userspace-dp/src/afxdp/coordinator/mod.rs,userspace-dp/src/afxdp/tx/dispatch.rs,userspace-dp/src/afxdp/session_glue/mod.rs,userspace-dp/src/afxdp/tx/drain.rs,userspace-dp/src/protocol.rs,pkg/dataplane/userspace/protocol.go,pkg/dataplane/userspace/statusfmt.go,userspace-dp/src/afxdp/tx/README.md,docs/shared-umem-plan.md,_Log.md
- Action: #1307 minimal TX-error attribution: add
-
Timestamp: 2026-05-14T06:05:00Z
- Action: Fix shared-UMEM live-status publication discovered during cluster smoke: the shared bind path now publishes the selected mode/group/role into
BindingLiveStatebefore worker refresh so status snapshots match the kernel bind result instead of reportingShared UMEM bindings: 0/N. - File(s):
userspace-dp/src/afxdp/worker/mod.rs,userspace-dp/src/afxdp/worker/README.md,_Log.md
- Action: Fix shared-UMEM live-status publication discovered during cluster smoke: the shared bind path now publishes the selected mode/group/role into
-
Timestamp: 2026-05-14T06:45:00Z
- Action: PR #1301 round-5 minor follow-up: add regression coverage for stale/wrong/unknown shared-recycle slot routing, increment
tx_errorswhen the all-bindings shared-recycle router drops an unknown slot, and downgrade the external IPv6mtrfinal-hop miss to a warning after the controlled IPv6 dataplane checks pass. - File(s):
userspace-dp/src/afxdp/tx/dispatch.rs,userspace-dp/src/afxdp/tx/dispatch_tests.rs,userspace-dp/src/afxdp/tx/README.md,docs/shared-umem-plan.md,scripts/userspace-ha-validation.sh,_Log.md
- Action: PR #1301 round-5 minor follow-up: add regression coverage for stale/wrong/unknown shared-recycle slot routing, increment
-
Timestamp: 2026-05-14T07:10:00Z
- Action: Harden the userspace HA smoke validator after live PR #1301 smoke: retry preferred-node failover while XSK liveness propagates into RG readiness, set the default throughput shape to
PARALLEL=6so the smoke covers the six-worker RSS set, and document the IPv6 external-mtrwarning semantics. - File(s):
scripts/userspace-ha-validation.sh,docs/userspace-ha-validation.md,.codex/skills/userspace-ha-validation/SKILL.md,_Log.md
- Action: Harden the userspace HA smoke validator after live PR #1301 smoke: retry preferred-node failover while XSK liveness propagates into RG readiness, set the default throughput shape to
-
Timestamp: 2026-05-14T07:35:00Z
- Action: PR #1301 round-6 minor follow-up: make the split-slice shared-recycle router use the same slot-resolution helper as the all-bindings cleanup path and add split-slice helper coverage for stale/unknown lookup behavior.
- File(s):
userspace-dp/src/afxdp/tx/dispatch.rs,userspace-dp/src/afxdp/tx/dispatch_tests.rs,userspace-dp/src/afxdp/tx/README.md,_Log.md
-
Timestamp: 2026-05-14T08:05:00Z
- Action: PR #1301 round-7 polish: aggregate unknown-slot shared-recycle stderr diagnostics to one bounded line per drain while preserving full
tx_errorsaccounting. - File(s):
userspace-dp/src/afxdp/tx/dispatch.rs,userspace-dp/src/afxdp/tx/README.md,_Log.md
- Action: PR #1301 round-7 polish: aggregate unknown-slot shared-recycle stderr diagnostics to one bounded line per drain while preserving full
-
Timestamp: 2026-05-14T09:45:00Z
- Action: PR #1311 round-3 review fixes (on top of remote
4de390d3which already rewrote the stress test with a quadratic schema and added afence(Acquire)to the reader). Sync stale file-level doc that claimed "All atomics use Relaxed" to mention the seqlock + reader fence; reword "60-120s window" doc nits acrossworker_runtime.rs,protocol.rs,pkg/dataplane/userspace/protocol.go, andpkg/dataplane/userspace/statusfmt.goto match the ~1 Hz publish cadence (~60-61s under normal cadence;WindowNScarries exact width); drop the "default Prometheus scrape interval" wording onWR_WINDOW_INTERVAL_NS; fix the stale1.5s CPU over 60scomment instatusfmt_test.goto45s CPU over 60s = 75%. Add anonzero_snapshots > 1_000guard at the end of the stress test so a broken reader returning all zeros can't silently pass (Codex round-3 ask not covered by4de390d3). - File(s):
userspace-dp/src/afxdp/worker_runtime.rs,userspace-dp/src/afxdp/worker_runtime_tests.rs,userspace-dp/src/protocol.rs,pkg/dataplane/userspace/protocol.go,pkg/dataplane/userspace/statusfmt.go,pkg/dataplane/userspace/statusfmt_test.go,_Log.md
- Action: PR #1311 round-3 review fixes (on top of remote
-
Timestamp: 2026-05-15T14:32:39-07:00
- Action: #1318 scoped CoS drain idle fix: gate
drain_shaped_txroot priming on runnable queues or due parked wake ticks, skipping timer-wheel advance and shared-root lease top-up for not-yet-serviceable parked roots while preserving due wake service. - File(s):
userspace-dp/src/afxdp/cos/queue_service/mod.rs,userspace-dp/src/afxdp/cos/queue_service/tests.rs,userspace-dp/src/afxdp/cos/tx_completion.rs,userspace-dp/src/afxdp/cos/tx_completion_tests.rs,userspace-dp/src/afxdp/cos/README.md,_Log.md - Validation:
cargo test --manifest-path userspace-dp/Cargo.toml drain_shaped_tx_ -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml root_serviceability_tracks_parked_queue_wakeup_tick -- --nocapture;cargo test --manifest-path userspace-dp/Cargo.toml queue_service;cargo test --manifest-path userspace-dp/Cargo.toml tx_completion
- Action: #1318 scoped CoS drain idle fix: gate
-
Timestamp: 2026-05-14T20:30:00Z
- Action: #1319 Phase 1 + Phase 2 (schedulers subtree only): add typed-leaf schema infrastructure (
ValueTypeenum +ValueDesc+ValueExamples+Validatoroncmdtree.Node); add strict error-returning parsersparseBandwidthLimitStrict/parseBurstSizeLimitStrict/parseScaledDecimalUnitStrictalongside the legacy zero-return parsers; add stateless validators inpkg/config/schema_validators.go(ValidateRate,ValidateByteSize,ValidateInteger,ValidateEnum,ValidatePercent); declare per-leaf schema forclass-of-service schedulers <name> { ... }compiler-consumed leaves (transmit-rate,priority,buffer-size,surplus-sharing,equal-flow-enforcement); wirecmdtree.SchemaValidateintoconfigstore.compileTreesocommit checkrejectstransmit-rate asdwith a human-readable error before the legacy zero-return parser silently writes 0 bps. Surfaces?placeholder + examples for typed leaves. - File(s):
pkg/cmdtree/tree.go,pkg/cmdtree/schema_validate.go(new),pkg/cmdtree/tree_test.go,pkg/cmdtree/README.md,pkg/config/compiler_protocols.go,pkg/config/schema_validators.go(new),pkg/config/schema_validate_test.go(new),pkg/config/README.md,pkg/configstore/store.go,pkg/configstore/store_test.go,_Log.md - Validation:
GOCACHE=/dev/shm/cache GOTMPDIR=/dev/shm go build ./...(clean);go test ./pkg/cmdtree/... ./pkg/config/... ./pkg/configstore/...(PASS); fullgo test ./...(all packages PASS).
- Action: #1319 Phase 1 + Phase 2 (schedulers subtree only): add typed-leaf schema infrastructure (
-
Timestamp: 2026-05-15T22:33:24Z
- Action: PR #1320 round-1 review fixes: preserve split Junos-style
transmit-rate exact; make typed scheduler leaves fail closed on missing values and unknown trailing modifiers; reject sub-byte scheduler rates that compile to zero bytes/sec; alignbuffer-sizevalidation and help with the compiler's byte-size-only representation; remove unsupported scheduler-levelshaping-ratefrom the typed schema; update module docs to match the enforced schema. - File(s):
pkg/cmdtree/schema_validate.go,pkg/cmdtree/tree.go,pkg/cmdtree/README.md,pkg/config/schema_validators.go,pkg/config/schema_validate_test.go,pkg/config/README.md,_Log.md - Validation:
go test -count=1 ./pkg/config ./pkg/cmdtree ./pkg/configstore(PASS);go test -count=1 ./pkg/...(PASS);git diff --check(clean).
- Action: PR #1320 round-1 review fixes: preserve split Junos-style
-
Timestamp: 2026-05-16T14:17:38-07:00
- Action: #1373 Phase 0 documentation/audit update: refresh userspace dataplane gap blockers, add retirement notices to active docs and README entry points, and add an umbrella tracker for the staged eBPF dataplane retirement. Explicitly document that Phase 0 removes no BPF source.
- File(s):
docs/userspace-dataplane-gaps.md,docs/pr/1373-retire-ebpf-dataplane/plan.md,README.md,CLAUDE.md,docs/testing.md,docs/development-workflow.md,bpf/README.md,userspace-dp/README.md,pkg/dataplane/README.md,testing-docs/README.md,_Log.md
-
Timestamp: 2026-05-17T09:34:59-07:00
- Action: #1373 Phase 1 documentation migration: mark Rust AF_XDP userspace as the primary/default dataplane development and validation target, demote eBPF wording to legacy compatibility/regression context, and preserve explicit retirement blockers for #1374-#1381 without claiming unresolved gaps closed.
- File(s):
README.md,CLAUDE.md,docs/testing.md,docs/development-workflow.md,docs/test_env.md,docs/userspace-dataplane-gaps.md,docs/feature-gaps.md,docs/userspace-dataplane-architecture.md,docs/afxdp-packet-processing.md,docs/ha-cluster-test-plan.md,testing-docs/README.md,bpf/README.md,userspace-dp/README.md,pkg/dataplane/README.md,userspace-dp/src/afxdp/README.md,_Log.md