Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
850c618
plumb userspace policy scheduler inactive state
psaab May 17, 2026
572859f
userspace: publish policy scheduler state
psaab May 17, 2026
9d369d9
userspace: serialize policy scheduler snapshots
psaab May 17, 2026
bd59177
userspace: fail closed scheduled policy drift
psaab May 17, 2026
94aa3f2
fix: keep scheduler fail-closed and surface protocol mismatch apply e…
Copilot May 17, 2026
bdd81d8
chore: tighten protocol error text and test clarity
Copilot May 17, 2026
0135c3a
chore: restore go.mod classification drift
Copilot May 17, 2026
5fe6429
chore: finalize round-4 blocker validation
Copilot May 17, 2026
f4c38b0
userspace: avoid caching rejected scheduler snapshots
psaab May 17, 2026
f0c503d
fix: cache deferred userspace snapshot only after map sync succeeds
Copilot May 17, 2026
363f92a
chore: restore go.mod after unintended dependency flip
Copilot May 17, 2026
32000b8
userspace: harden policy scheduler fail-closed paths
psaab May 17, 2026
1074c62
Harden policy scheduler apply lifecycle
psaab May 17, 2026
33baa49
fix: keep degraded compile contract except protocol mismatch
Copilot May 17, 2026
29a0c27
Preserve degraded compile contract; fail closed only on userspace sch…
Copilot May 17, 2026
2fda579
daemon: pin policy scheduler apply edge cases
psaab May 17, 2026
e243e51
daemon: preserve scheduler state across aborting applies
psaab May 17, 2026
ffa694b
fix: align scheduler rule slot updates for ebpf/dpdk
Copilot May 17, 2026
fb36b3e
dataplane: schedule compiled policy slots
psaab May 17, 2026
3a79c99
dataplane: remove obsolete scheduler slot helper
Copilot May 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions _Log.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,18 @@
- **Action**: PR #1395 cleanup — reverted an unintended `go.mod` direct/indirect dependency classification change introduced by automated tooling so the round-4 fix stays scoped to three-color policer compiler logic/tests/docs.
- **File(s)**: `go.mod`, `_Log.md`

- **Timestamp**: 2026-05-17T05:12:00Z
- **Action**: Round-5 follow-up fix — in userspace pending-XSK-startup compile path, defer `lastSnapshot` cache update until ingress/local/NAT map sync succeeds so sync failures cannot poison cached snapshot state with an unpublished generation.
- **File(s)**: `pkg/dataplane/userspace/manager.go`, `_Log.md`

- **Timestamp**: 2026-05-17T05:16:00Z
- **Action**: Restored `go.mod` after an unintended direct/indirect dependency classification flip introduced by an automation-only progress update.
- **File(s)**: `go.mod`, `_Log.md`

- **Timestamp**: 2026-05-17T04:48:51Z
- **Action**: Re-restored `go.mod` after a subsequent tooling pass reintroduced the same direct/indirect dependency classification flip.
- **File(s)**: `go.mod`, `_Log.md`

Comment on lines +14 to +25
- **Timestamp**: 2026-05-17T05:03:00Z
- **Action**: PR #1395 round-4 follow-up cleanup — moved three-color mode marker assignment outside repeated same-mode child loops to avoid redundant writes while preserving duplicate-sibling merge semantics.
- **File(s)**: `pkg/config/compiler_firewall.go`, `_Log.md`
Expand Down
37 changes: 29 additions & 8 deletions docs/pr/1373-retire-ebpf-dataplane/plan-1378-policy-schedulers.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ scheduled policy rules activate and deactivate correctly without the eBPF

## Dependencies

- #1381 must land first. `UpdatePolicyScheduleState` currently dispatches
through the embedded eBPF `DataPlane`; userspace needs either the split
interface or an explicit stub/snapshot branch.
- The safe slice no longer waits on #1381. The userspace manager now shadows
`UpdatePolicyScheduleState` and republishes a userspace snapshot instead of
falling through to the embedded eBPF manager.

## Design

Expand All @@ -20,6 +20,14 @@ identity must not depend on transient array position alone; use a config-driven
UUID if available or `(policy_set_id, policy_name, rule_name)`/equivalent
compiled identity.

Safe #1378 slice status: this change wires `rule_id`, `scheduler_name`, and
`inactive` through userspace policy snapshots and Rust policy evaluation. The
daemon reconciles the scheduler lifecycle on every committed config while
holding the apply semaphore; userspace snapshot rebuilds are seeded with that
same active-state map, and runtime scheduler ticks acquire the same semaphore
before publishing one coherent snapshot delta. Missing scheduler references are
compile errors.

On scheduler state changes, publish one atomic userspace snapshot delta that
contains the updated inactive bits for all affected rules. Do not issue
per-rule fast-path toggles because first-match ordering requires same-instant
Expand All @@ -31,9 +39,14 @@ existing sessions unless a separate `policy-rematch` feature is implemented.
That matches Junos default behavior: schedulers block new lookups, not existing
sessions.

Scheduler granularity is 60 seconds. Tests and docs must use deterministic
clock injection or windows that span multiple evaluator ticks; the earlier
30-second integration target is invalid.
Scheduler granularity is 60 seconds. The wall clock is used only by the Go
control-plane scheduler to decide the next active-state map; workers receive
booleans in the snapshot and never evaluate wall-clock time in the packet path.
The scheduler compares wall elapsed time with Go's monotonic elapsed time at
each evaluation. Backward wall-clock steps or drift beyond tolerance fail
closed for that evaluation by publishing all scheduler bits inactive.
Tests and docs must use deterministic scheduler inputs or windows that span
multiple evaluator ticks; the earlier 30-second integration target is invalid.

Missing scheduler references fail closed as commit errors. Do not copy the
existing eBPF behavior that can default missing scheduler state to active.
Expand All @@ -43,6 +56,13 @@ existing eBPF behavior that can default missing scheduler state to active.
- One inactive-branch per rule on miss path is acceptable; no scheduler clock
evaluation occurs in the packet worker.
- Snapshot publication is ArcSwap-atomic across all rule inactive bits.
- Snapshots carrying scheduler inactive bits require protocol version 2; the
Rust control server rejects older/unknown snapshot versions instead of
silently ignoring scheduling fields, and status exposes the helper's supported
snapshot protocol so new Go refuses to publish scheduled-policy snapshots to
an old helper before the fail-open path can occur. The refusal actively
disarms helper forwarding with `set_forwarding_state armed=false`; recording
a compile error while leaving the old helper armed is not fail-closed.
- Hit counters are keyed by stable rule identity outside rebuilt rule structs so
counters survive scheduler snapshot rebuilds.
- Do not copy the existing eBPF indexing bug in
Expand All @@ -64,8 +84,9 @@ existing eBPF behavior that can default missing scheduler state to active.
- Scheduler atomicity: first-match policy ordering requires affected inactive
bits to publish as one coherent snapshot. Per-rule toggles can expose an
impossible mixed policy state.
- Clock drift: scheduler state is daemon-clock derived. HA peers must recompute
after failover rather than trusting stale peer-local state.
- Clock drift: scheduler state is daemon-clock derived. The scheduler must
fail closed on wall-clock discontinuity, and HA peers must recompute after
failover rather than trusting stale peer-local state.
- Counter continuity: stable rule identity is mandatory because inactive flips
and snapshot rebuilds must not reset operator-visible hit counters.
- Missing scheduler references: fail-open behavior admits traffic outside the
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ require (
github.com/insomniacslk/dhcp v0.0.0-20251020182700-175e84fbb167
github.com/mdlayher/ndp v1.1.0
github.com/prometheus/client_golang v1.23.2
github.com/prometheus/client_model v0.6.2
github.com/vishvananda/netlink v1.3.1
golang.org/x/net v0.47.0
golang.org/x/sync v0.18.0
Expand All @@ -27,7 +28,6 @@ require (
github.com/mdlayher/socket v0.5.0 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pierrec/lz4/v4 v4.1.14 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
github.com/u-root/uio v0.0.0-20230220225925-ffce2a382923 // indirect
Expand Down
40 changes: 39 additions & 1 deletion pkg/config/compiler.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,6 @@ func compileExpanded(tree *ConfigTree) (*Config, error) {
if err := validateThreeColorPolicersStrict(cfg.Firewall.ThreeColorPolicers); err != nil {
return nil, err
}

if warnings := ValidateConfig(cfg); len(warnings) > 0 {
for _, w := range warnings {
cfg.Warnings = append(cfg.Warnings, w)
Expand Down Expand Up @@ -273,6 +272,37 @@ func validateThreeColorPolicersStrict(policers map[string]*ThreeColorPolicerConf
return nil
}

func validatePolicySchedulerReferencesStrict(cfg *Config) error {
if cfg == nil {
return nil
}
check := func(pol *Policy) error {
if pol == nil || pol.SchedulerName == "" {
return nil
}
if _, ok := cfg.Schedulers[pol.SchedulerName]; ok {
return nil
}
return fmt.Errorf("policy %q references undefined scheduler %q", pol.Name, pol.SchedulerName)
}
for _, zpp := range cfg.Security.Policies {
if zpp == nil {
continue
}
for _, pol := range zpp.Policies {
if err := check(pol); err != nil {
return err
}
}
}
for _, pol := range cfg.Security.GlobalPolicies {
if err := check(pol); err != nil {
return err
}
}
return nil
}
Comment on lines +275 to +304
Comment on lines +275 to +304

Comment on lines +275 to +305
func validateClassOfServiceStrict(cos *ClassOfServiceConfig) error {
if cos == nil {
return nil
Expand Down Expand Up @@ -518,6 +548,14 @@ func ValidateConfig(cfg *Config) []string {
}
}
}
for _, p := range cfg.Security.GlobalPolicies {
if p.SchedulerName != "" {
if _, ok := cfg.Schedulers[p.SchedulerName]; !ok {
warnings = append(warnings, fmt.Sprintf(
"global policy %q: scheduler %q not defined", p.Name, p.SchedulerName))
}
}
}

// Validate routing-instance interface references
for _, ri := range cfg.RoutingInstances {
Expand Down
63 changes: 56 additions & 7 deletions pkg/config/parser_ast_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1473,7 +1473,6 @@ security {
policy sched-test {
match { source-address any; destination-address any; application any; }
then { permit; }
scheduler-name missing-sched;
}
}
}
Expand All @@ -1488,26 +1487,76 @@ security {
if err != nil {
t.Fatalf("CompileConfig: %v", err)
}
var foundIfaceWarn, foundPoolWarn, foundSchedWarn bool
var foundIfaceWarn, foundPoolWarn bool
for _, w := range cfg.Warnings {
if strings.Contains(w, "missing-iface") && strings.Contains(w, "not in interfaces") {
foundIfaceWarn = true
}
if strings.Contains(w, "missing-pool") && strings.Contains(w, "not defined") {
foundPoolWarn = true
}
if strings.Contains(w, "missing-sched") && strings.Contains(w, "not defined") {
foundSchedWarn = true
}
}
if !foundIfaceWarn {
t.Errorf("missing warning for zone referencing unconfigured interface, got: %v", cfg.Warnings)
}
if !foundPoolWarn {
t.Errorf("missing warning for SNAT referencing undefined pool, got: %v", cfg.Warnings)
}
if !foundSchedWarn {
t.Errorf("missing warning for policy referencing undefined scheduler, got: %v", cfg.Warnings)
}

func TestPolicySchedulerMissingReferenceWarns(t *testing.T) {
input := `security {
policies {
from-zone trust to-zone untrust {
policy sched-test {
match { source-address any; destination-address any; application any; }
then { permit; }
scheduler-name missing-sched;
}
}
}
}
`
parser := NewParser(input)
tree, errs := parser.Parse()
if len(errs) > 0 {
t.Fatalf("parse errors: %v", errs)
}
cfg, err := CompileConfig(tree)
if err != nil {
t.Fatalf("CompileConfig returned error for warning-only missing scheduler reference: %v", err)
}
warnings := strings.Join(cfg.Warnings, "\n")
if !strings.Contains(warnings, `policy "sched-test": scheduler "missing-sched" not defined`) {
t.Fatalf("CompileConfig warnings = %v, want missing scheduler warning", cfg.Warnings)
}
}

func TestGlobalPolicySchedulerMissingReferenceWarns(t *testing.T) {
input := `security {
policies {
global {
policy sched-global {
match { source-address any; destination-address any; application any; }
then { permit; }
scheduler-name missing-sched;
}
}
}
}
`
parser := NewParser(input)
tree, errs := parser.Parse()
if len(errs) > 0 {
t.Fatalf("parse errors: %v", errs)
}
cfg, err := CompileConfig(tree)
if err != nil {
t.Fatalf("CompileConfig returned error for warning-only missing global scheduler reference: %v", err)
}
warnings := strings.Join(cfg.Warnings, "\n")
if !strings.Contains(warnings, `global policy "sched-global": scheduler "missing-sched" not defined`) {
t.Fatalf("CompileConfig warnings = %v, want missing global scheduler warning", cfg.Warnings)
}
}

Expand Down
21 changes: 21 additions & 0 deletions pkg/daemon/compile_error_policy_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
package daemon

import (
"errors"
"fmt"
"testing"

dpuserspace "github.com/psaab/xpf/pkg/dataplane/userspace"
)

func TestCompileErrorMustAbortApply(t *testing.T) {
if !compileErrorMustAbortApply(dpuserspace.ErrPolicySchedulerProtocolIncompatible) {
t.Fatal("protocol incompatibility must abort apply")
}
if !compileErrorMustAbortApply(fmt.Errorf("wrapped: %w", dpuserspace.ErrPolicySchedulerProtocolIncompatible)) {
t.Fatal("wrapped protocol incompatibility must abort apply")
}
if compileErrorMustAbortApply(errors.New("compile failed for unrelated dataplane reason")) {
t.Fatal("non-protocol compile failures must not abort apply")
}
}
3 changes: 3 additions & 0 deletions pkg/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ type Daemon struct {
snmpAgent *snmp.Agent
lldpMgr *lldp.Manager
scheduler *scheduler.Scheduler
schedulerCancel context.CancelFunc
policySchedulerConfigHash [32]byte
policySchedulerEpoch atomic.Uint64
cluster *cluster.Manager
sessionSync *cluster.SessionSync
syncBulkPrimed atomic.Bool
Expand Down
Loading