Skip to content
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ TC Egress: main -> screen_egress -> conntrack -> nat -> forward
| NAT64 (IPv6↔IPv4) | Yes | Yes |
| NPTv6 (RFC 6296) | Yes | Yes |
| Screen/IDS (11 checks) | Yes | Most checks yes; SYN-cookie behavior falls back |
| Firewall filters + policers | Yes | Filters yes; three-color policers still gated |
| Firewall filters + policers | Yes | Filters yes; three-color policers admitted for color-blind `then discard` slice, with remaining #1375 hardening still open |
| TCP MSS clamping | Yes | Yes |
| GRE tunnel transit | Yes | Yes (passthrough) |
| IPsec / XFRM | Yes | Yes (passthrough) |
Expand All @@ -97,11 +97,12 @@ TC Egress: main -> screen_egress -> conntrack -> nat -> forward

The userspace dataplane now covers most of the transit feature set in native
Rust, but it is not "fallback-free". Current explicit gates in code still
include SYN-cookie-dependent screen behavior, three-color policers, and port
mirroring. Pool-mode SNAT is admitted, and #1385 added userspace-v1
`address-persistent` selection; #1377 still owns per-pool `persistent-nat`
lease reuse, allocator/exhaustion counters, and the mixed-backend rollback
boundary. The exact admission boundary is documented in
include SYN-cookie-dependent screen behavior and port mirroring. Three-color
policers are admitted only for the bounded color-blind `then discard` runtime
slice while #1375 hardening remains. Pool-mode SNAT is admitted, and #1385
added userspace-v1 `address-persistent` selection; #1377 still owns per-pool
`persistent-nat` lease reuse, allocator/exhaustion counters, and the
mixed-backend rollback boundary. The exact admission boundary is documented in
[`docs/userspace-dataplane-gaps.md`](docs/userspace-dataplane-gaps.md).

## Architecture
Expand Down
22 changes: 22 additions & 0 deletions _Log.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,28 @@
- **File(s)**: `userspace-dp/src/afxdp/umem/mod.rs`, `_Log.md`
- **Validation**: `go test ./pkg/dataplane/userspace`; `cargo test --manifest-path userspace-dp/Cargo.toml mirror::` (expected environment failure: missing libelf headers/pkg-config); `git diff --check`

- **Timestamp**: 2026-05-17T21:58:24Z
- **Action**: PR #1410 residual review follow-up — made emit-on-wire inject tuple identity an explicit Go/Rust control-wire contract, added helper status gating for mixed-version fail-closed behavior, and stopped Rust from synthesizing source tuple fields for emitted inject packets.
- **File(s)**: `pkg/dataplane/userspace/protocol.go`, `pkg/dataplane/userspace/inject.go`, `pkg/dataplane/userspace/inject_test.go`, `pkg/dataplane/userspace/protocol_test.go`, `pkg/cmdtree/tree.go`, `userspace-dp/src/protocol.rs`, `userspace-dp/src/server/lifecycle.rs`, `userspace-dp/src/server/README.md`, `userspace-dp/src/afxdp/coordinator/inject.rs`, `userspace-dp/src/afxdp/coordinator/tests.rs`, `userspace-dp/src/afxdp/frame/mod.rs`, `_Log.md`
- **Validation**: `go test ./pkg/dataplane/userspace ./pkg/cmdtree`; `cargo test --manifest-path userspace-dp/Cargo.toml inject_packet -- --nocapture`; `cargo test --manifest-path userspace-dp/Cargo.toml injected_packet -- --nocapture`; `cargo check --manifest-path userspace-dp/Cargo.toml`; `git diff --check`

- **Timestamp**: 2026-05-17T21:39:00Z
- **Action**: PR #1410 review follow-up — reconciled README userspace capability wording so three-color policers are described as partially admitted (color-blind `then discard` slice) rather than fully gated, matching current userspace capability documentation and runtime admission behavior.
- **File(s)**: `README.md`, `_Log.md`

- **Timestamp**: 2026-05-17T21:23:55Z
- **Action**: PR #1410 round-3 blocker follow-up — added explicit pending-forward CoS resolution state so resolved `None`/`None` selections are not metered again, carried metadata-derived ICMP flow keys through local and embedded ICMP prebuilt-forward paths, stamped emitted inject packets with synthetic ICMP tuples before TX selection, and preserved local tunnel tuple metadata through TX.
- **File(s)**: `userspace-dp/src/afxdp/types/tx.rs`, `userspace-dp/src/afxdp/forward_request.rs`, `userspace-dp/src/afxdp/tx/dispatch.rs`, `userspace-dp/src/afxdp/icmp.rs`, `userspace-dp/src/afxdp/poll_descriptor.rs`, `userspace-dp/src/afxdp/coordinator/inject.rs`, `userspace-dp/src/afxdp/tunnel.rs`, `userspace-dp/src/afxdp/tx/dispatch_tests.rs`, `userspace-dp/src/afxdp/tests.rs`, `userspace-dp/src/afxdp/frame/tests.rs`, `userspace-dp/src/afxdp/coordinator/tests.rs`, `_Log.md`
- **Validation**: `cargo test --manifest-path userspace-dp/Cargo.toml build_local_time_exceeded_request -- --nocapture`; `cargo test --manifest-path userspace-dp/Cargo.toml pending_forward_cos_resolution -- --nocapture`; `cargo test --manifest-path userspace-dp/Cargo.toml stamp_injected_packet_tuple -- --nocapture`; `cargo test --manifest-path userspace-dp/Cargo.toml build_live_forward_request_marks_empty_cos_selection_resolved -- --nocapture`; `cargo test --manifest-path userspace-dp/Cargo.toml build_live_forward_request_meters_non_l4_metadata_flow -- --nocapture`; `cargo test --manifest-path userspace-dp/Cargo.toml three_color -- --nocapture`; `cargo check --manifest-path userspace-dp/Cargo.toml`; `git diff --check`

- **Timestamp**: 2026-05-17T20:30:00Z
- **Action**: PR #1410 round-1 review follow-up — removed flow-cache hit TX-selection cloning from the packet fast path, switched local ICMP/tunnel/control-packet CoS resolution to timestamped `_at` evaluation with flow-key fallback, and enforced `cos.drop` handling on those paths so three-color policer drops are not bypassed when metadata-only classification is used.
- **File(s)**: `userspace-dp/src/afxdp/poll_descriptor.rs`, `userspace-dp/src/afxdp/icmp.rs`, `userspace-dp/src/afxdp/tunnel.rs`, `userspace-dp/src/afxdp/coordinator/inject.rs`, `_Log.md`

- **Timestamp**: 2026-05-17T20:37:00Z
- **Action**: Addressed post-validation review nit by lazily constructing cached precomputed TX-selection descriptors only on flow-cache fallback forwarding, avoiding unnecessary per-hit descriptor construction on successful in-place TX hits.
- **File(s)**: `userspace-dp/src/afxdp/poll_descriptor.rs`, `_Log.md`

- **Timestamp**: 2026-05-17T15:28:13Z
- **Action**: PR #1397 follow-up — fixed mouse-latency diagnostics review findings by making `cwnd_settle_ok` tri-state in manifests (unknown/true/false), correcting cwnd byte-unit parsing to 1024-based `K/M/G/TBytes`, recording probe phase timings even on failed/timed-out connect/drain/read attempts, tightening fairness-regimes settle-evidence wording, and extending unit coverage for settle-diagnostics CLI output/status and failure-phase timing counts.
- **File(s)**: `test/incus/test-mouse-latency.sh`, `test/incus/mouse_latency_orchestrate.py`, `test/incus/mouse_latency_orchestrate_test.py`, `test/incus/mouse_latency_probe.py`, `test/incus/mouse_latency_probe_test.py`, `test/incus/test_mouse_latency_shell_test.py`, `docs/fairness-regimes.md`, `_Log.md`
Expand Down
2 changes: 1 addition & 1 deletion cmd/cli/request.go
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ func (c *ctl) handleRequestChassisClusterDataPlane(args []string) error {
return err
}
action = fmt.Sprintf("userspace-inject:%d:%s", slot, mode)
target = extra["destination-ip"]
target = dpuserspace.EncodeInjectPacketTarget(extra)
case len(args) > 0 && args[0] == "forwarding":
armed, err := dpuserspace.ParseForwardingCommand(args)
if err != nil {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,37 @@
Add userspace support for Junos three-color policers so configs under
`firewall three-color-policer` no longer require the eBPF dataplane.

## Current Status

The bounded runtime slice is implemented after #1395:

- Rust compiles three-color policer snapshots into stable name-sorted runtime
IDs and links filter terms to shared runtime handles.
- Live forwarding-path TX selection meters srTCM/trTCM policers, applies red
drops for `then discard`, and records green/yellow/red/drop packet and byte
counters.
- Flow-cache hits carry cached policer handles and meter them before cached
forwarding.
- Rust status, Go protocol, status formatting, and Prometheus expose
per-color/drop counters.
- `deriveUserspaceCapabilities()` admits the current color-blind `then
discard` runtime slice for `firewall three-color-policer` configs.

Remaining #1375 work is validation and hardening rather than admission:

- Color-aware inherited-color handling remains fail-closed until packet
metadata carries trusted incoming color end-to-end. This avoids silently
promoting yellow/red traffic to green.
- Replace the per-policer mutex runtime with the approved sharded or packed
atomic state if throughput testing shows contention.
- Preserve counters and token state across snapshot rebuilds if operator
continuity is required for #1373 removal.
- Wire non-drop per-color actions, especially loss-priority propagation, into
the downstream forwarding/CoS path. Until then, non-`discard` three-color
actions remain fail-closed.
- Run integration traffic, failover, and performance evidence for
green/yellow/red classification and red drops.

## Dependencies

- #1381 should land first so userspace capability removal and snapshot delivery
Expand All @@ -16,7 +47,9 @@ Add userspace support for Junos three-color policers so configs under

Extend the userspace policer snapshot and Rust types with srTCM, trTCM,
`color_blind`, color-aware input handling, and per-color actions for DSCP
rewrite plus red drop/count behavior.
rewrite plus red drop/count behavior. The current runtime enables only the
subset with enforceable semantics: color-blind metering and red drop/count
for `then discard`.

Use `u128` token refill math with `monotonic_nanos`. Reject invalid config at
compile/commit time: zero rate, zero burst, `PIR < CIR`, `PBS < CBS`, and
Expand All @@ -30,17 +63,21 @@ requires both C and P tokens, yellow requires only P tokens, red otherwise.

Color-aware mode must respect incoming color and never promote packets above
their incoming color. Color-blind mode evaluates each packet without inherited
color.
color. Until inherited color is carried in trusted packet metadata, userspace
must reject color-aware three-color policers rather than defaulting every
packet to green.

## Hot-Path Invariants

- Flow-cache hits still execute the policer before forwarding.
- No `f64` token math in the dataplane.
- No `FxHashMap<String, PolicerState>` mutable hot-path lookup as the final
production model; use stable rule IDs with sharded or packed atomic state.
- No `FxHashMap<String, PolicerState>` mutable hot-path lookup. The current
runtime uses stable name-sorted IDs with shared handles; sharded or packed
atomic state remains the scaling follow-up.
- Per-color DSCP rewrite and red drop decisions happen in the same forwarding
decision that accounts tokens.
- Per-color counters are updated without central hot atomics.
- Per-color counters are attached to the stable policer runtime. The current
counters use relaxed atomics per logical policer/color.

## State and HA Behavior

Expand All @@ -49,8 +86,8 @@ color.
adds token sync.
- Config snapshots carry stable policer/rule identity so counters can survive
snapshot rebuilds where practical.
- Status exposes green/yellow/red packet and byte counters, DSCP rewrites, and
red drops through Rust status, Go protocol, CLI, and Prometheus.
- Status exposes green/yellow/red packet and byte counters plus red drops
through Rust status, Go protocol, CLI, and Prometheus.

## Risks

Expand All @@ -61,9 +98,9 @@ color.
preserving one logical bucket per configured policer identity.
- Color semantics: color-aware mode must never promote incoming yellow/red
traffic; one wrong branch turns a security control into a bandwidth grant.
- Counter attribution: green/yellow/red/drop counters must survive snapshot
rebuilds by stable identity, or operators cannot audit policer behavior after
commits.
- Counter attribution: green/yellow/red/drop counters are stable inside a
compiled runtime. Carrying them across snapshot rebuilds remains a follow-up
if operators need continuity across commits.

## Exact Tests

Expand All @@ -74,14 +111,16 @@ color.
- Cargo: `policer::color_blind_ignores_incoming_color`.
- Cargo: `policer::u128_bucket_math_boundary_inputs`.
- Cargo: `policer::three_color_dscp_rewrite`.
- Cargo: `policer::flow_cache_hits_run_policer`.
- Cargo: `filter::tests::three_color_runtime_ids_and_miss_path_counters_are_stable`.
- Cargo: `filter::tests::flow_cache_hits_run_three_color_policer`.
- Go: userspace snapshot round-trip for three-color policer fields, per-color
actions, and `ColorBlind`.
- Go: compiler validation rejects zero rates/bursts, `PIR < CIR`, and
`PBS < CBS`.
- Go: `deriveUserspaceCapabilities()` admits three-color policer configs only
after the userspace snapshot and Rust runtime support are wired, and rejects
them before that point.
- Go: `deriveUserspaceCapabilities()` admits three-color policer configs after
the userspace snapshot and Rust runtime support are wired.
- Go: ProcessStatus, status formatting, and Prometheus tests cover
three-color per-color/drop counters.
- Integration: controlled-rate traffic against userspace cluster verifies
green/yellow/red classification, DSCP rewrite, red drop behavior, and
per-color counters.
Expand Down
23 changes: 13 additions & 10 deletions docs/userspace-dataplane-gaps.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ These capabilities exist in the current Rust userspace dataplane code path:
| NPTv6 | Implemented | Stateless prefix translation |
| Firewall filters | Implemented | Filter snapshots and evaluation in Rust |
| Flow export | Implemented | Userspace flow export snapshot and runtime |
| Three-color policers | Implemented with caveats | srTCM/trTCM runtime, forwarding-path and flow-cache-hit metering, red drops for `then discard`, status/CLI/Prometheus counters. Sharded state, cross-snapshot continuity, non-drop color actions, and integration evidence remain #1375 follow-up work. |
| TCP MSS clamping | Implemented | Flow snapshot fields are delivered and used in Rust |
| Embedded ICMP NAT reversal | Implemented | Includes reverse-session repair paths |
| Configurable session timeouts | Implemented | Snapshot-driven timeouts in `session.rs` |
Expand All @@ -52,7 +53,6 @@ These are the remaining explicit configuration gates in
|----------------------|-------------|--------------------|
| Unsupported policy shapes | Gated | Address/application expansion must succeed for userspace |
| Screen behavior requiring SYN cookies | Gated; userspace screen runtime has fail-closed cookie challenge/ACK-validation/cache scaffolding, but no HA key publication or SYN-ACK/RST TX yet | #1374 |
| Three-color policers | Gated | #1375 |
| Port mirroring | Gated; partial runtime | #1376 still needs full path coverage and integration evidence before the gate is removed |

Port mirroring now has snapshot/wire plumbing plus a bounded forwarded-path
Expand Down Expand Up @@ -87,7 +87,7 @@ The current #1373 audit produced these tracked blockers:
| #1378 | Finish the policy-scheduler retirement contract after #1396 userspace propagation: hit-counter survival across scheduler snapshot rebuilds and strict missing-scheduler commit behavior landed in the 2026-05-17 closeout slice; remaining blocker is integration/failover validation evidence | Phase 4 BPF source removal |
| #1379 | Emit policy-deny, screen-drop, and filter-log dataplane events from userspace | Phase 4 BPF source removal |
| #1374 | Implement userspace SYN-cookie flood protection or an approved equivalent. #1393 and the 2026-05-17 runtime slice cover deterministic cookie codec/layout, snapshot propagation, fail-closed screen challenge selection, session-miss ACK validation, and a bounded validated-client cache. Lower-layer coverage in `userspace-dp/src/screen_tests.rs` pins 4-way validated-client cache replacement; poll-stage tests only pin the operational invalid-ACK drop/bypass semantics. Remaining: validated-client cache expiration semantics, secret-epoch rotation, bounded SYN-ACK TX, ACK RST emission, HA-safe secret publication/cache survivability, counters/status, integration/failover validation, and userspace capability gate removal. | Phase 4 BPF source removal |
| #1375 | Implement userspace RFC 2697/2698 three-color policers | Phase 4 BPF source removal |
| #1375 | Finish userspace RFC 2697/2698 three-color policer hardening: sharded/packed state decision, cross-snapshot counter continuity decision, non-drop color action handling, and integration/failover/performance evidence | Phase 4 BPF source removal |
| #1376 | Implement userspace port mirroring or explicitly retire the feature | Phase 4 BPF source removal |
| #1380 | Retire the remaining BPF-map-oriented `show system buffers` operator surface. Userspace now renders the bounded helper status that exists; only optional new helper capacity denominators for session-table / flow-cache / neighbor-cache fill remain undecided. | Phase 5 CLI / observability cleanup |

Expand All @@ -101,11 +101,11 @@ Recommended dependency order:
userspace-v1 selector plus mixed-backend rollback boundary, but per-pool
`persistent-nat` and allocator exhaustion counters remain #1377 runtime
gaps. #1378 is no longer missing basic userspace propagation after #1396,
and the 2026-05-17 closeout slice added counter continuity plus strict
missing-scheduler commit rejection; keep it open for the remaining
integration/HA failover evidence.
3. #1374, #1375, and #1376 before Phase 4, because these are explicit feature
gaps currently protected by the legacy eBPF fallback.
but its remaining counter/validation/evidence contract still blocks BPF
source removal.
3. #1374 and #1376 before Phase 4, because these are explicit feature gaps
currently protected by the legacy eBPF fallback. Keep #1375 on the Phase 4
list for validation and hardening evidence, not as a capability gate.
4. #1380 in Phase 5, after the dataplane boundary is settled but before the
remaining operator-facing BPF map surface disappears.

Expand Down Expand Up @@ -149,9 +149,12 @@ The highest-value remaining work on current `master` is:
2. fix #1377 and #1379 to remove silent correctness and visibility
regressions; keep #1385 plus the userspace-v1 fixtures as evidence of the
current AF_XDP SNAT pool selector, not full persistent-NAT parity. Keep
#1378 open only for the remaining policy-scheduler integration/HA failover
evidence.
3. close #1374, #1375, and #1376 before any BPF source removal
#1378 open for the remaining policy-scheduler counter/validation/evidence
contract after #1396.
3. close #1374 and #1376 before any BPF source removal, and finish the #1375
hardening/evidence checklist. The three-color capability gate is removed
only for the current color-blind `then discard` slice; color-aware and
non-drop treatments stay fail-closed.
4. carry the narrowed #1380 denominator decision into Phase 5; the current
userspace command already avoids BPF-map fallback when helper status is
available
Expand Down
Loading