Add #1378 live policy scheduler evidence#1426
Conversation
There was a problem hiding this comment.
Pull request overview
This PR captures and commits live HA evidence artifacts for issue #1378 (policy schedulers in userspace dataplane) collected from the loss userspace HA cluster, and updates tracking docs (#1373/#1378) to mark the scheduler live-evidence gate as closed.
Changes:
- Adds the
evidence-1378-policy-scheduler-live-20260519/artifact set (configs, counters, raw ip-link/XDP captures, failover/restore stdouts, README) accepted bytest/incus/policy_scheduler_validate.py --rule-id 'lan->wan/scheduled-allow'. - Updates
plan-1378-policy-schedulers.mdcloseout section and switches the documented--rule-idexample fromtrust->untrust/scheduled-allowtolan->wan/scheduled-allow. - Updates
userspace-dataplane-gaps.md,userspace-dataplane-architecture.md,1373-retire-ebpf-dataplane/README.md, andplan.mdto reflect that #1378 is closed for retirement purposes.
Reviewed changes
Copilot reviewed 53 out of 86 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| docs/userspace-dataplane-gaps.md | Marks #1378 as live HA evidence captured; removes it from remaining work narrative. |
| docs/userspace-dataplane-architecture.md | Reflects scheduler live artifact validation closeout. |
| docs/pr/1373-retire-ebpf-dataplane/README.md | Tracker table entry for #1378 set to closed. |
| docs/pr/1373-retire-ebpf-dataplane/plan.md | Plan table/narrative update reflecting #1378 closure. |
| docs/pr/1373-retire-ebpf-dataplane/plan-1378-policy-schedulers.md | Adds 2026-05-19 closeout note, updates rule-id example, adds validation slice. |
| .../evidence-1378-policy-scheduler-live-20260519/* | New evidence artifact set: configs, counters JSONs, raw link/XDP captures, failover/restore stdouts, README, run metadata. |
Claude r1 review on
|
Round-1 triple-review synthesis on
|
| Reviewer | Verdict |
|---|---|
| Claude | MERGE-READY |
| Codex | MERGE-READY |
| Gemini Pro 3 | MERGE-READY |
All three converge. Clean evidence-only PR.
Codex independent verification
"Scope: no production-code drift in
1fd17502. The huge diff is docs-only. Evidence is +37507/-0, five non-evidence docs account for +31/-20. No production files or validator files changed."
"Validator contract passes against the new artifacts: Counters for
lan->wan/scheduled-alloware active 5, rebuild 5, inactive 5, failover 20; bytes are 490/490/490/1876. That satisfies active > 0, rebuild >= active, inactive == rebuild, failover >= rebuild + 1. Missing-scheduler artifact shows commit-check rejection, not success."
"xdp_userspace evidence present: All status files and raw
*-entry-programs.jsonshow entries4/5/6 = xdp_userspace_p."
"RG ownership flipped:
failover-cluster-status.txtshows RG1/RG2 primary onnode1;failover-status.jsonis from the node1 side (ge-7-0-*, peer10.99.13.1, RG1/RG2 active), while active/rebuild/inactive are node0 side (ge-0-0-*, peer10.99.13.2)."
"No validator goalpost movement:
test/incus/policy_scheduler_validate.pyandpolicy_scheduler_validate_test.pyblob hashes are identical between1fd17502^and1fd17502. The docs changed the invoked--rule-idto match the live topology, but the validator config/code did not change."
Recommendation
Merge-ready. Closes the long-standing #1378 live evidence gap with proper RG-flip + xdp_userspace attachment + monotonically-increasing counters. Validator config unchanged.
Codex task: task-mpc366uv-rpgfzj. Gemini task: task-mpc36fbn-rnsnw8. Not merging — author's decision.
Capture the accepted loss userspace HA evidence set for #1378. The artifacts cover active, rebuild, inactive, and post-failover status snapshots for lan->wan/scheduled-allow, plus a strict undefined-scheduler commit-check rejection. The failover status was captured from xpf-userspace-fw1 after RG1 and RG2 moved to node1. The scheduled-rule counter advances from 5 packets / 490 bytes before failover to 20 packets / 1876 bytes on the new owner, while inactive remains equal to rebuild. Add a concise evidence README and update the #1373/#1378 trackers to mark policy schedulers closed for the live HA evidence gate. The lab restore artifacts show node0 primary again and no scheduler-policy residue in the final config. Validation: - python3 test/incus/policy_scheduler_validate.py docs/pr/1373-retire-ebpf-dataplane/ evidence-1378-policy-scheduler-live-20260519 --rule-id 'lan->wan/scheduled-allow' - git diff --check - git diff --cached --check
1fd1750 to
b1c1410
Compare
Summary
Validation
python3 test/incus/policy_scheduler_validate.py docs/pr/1373-retire-ebpf-dataplane/evidence-1378-policy-scheduler-live-20260519 --rule-id 'lan->wan/scheduled-allow'git diff --checkgit diff --cached --checkgit show -s --format=%B HEAD | awk 'length($0)>72 { print length($0) ":" $0 }'