Emit non-PBR userspace filter log events#1428
Conversation
Complete the next #1379 event slice by carrying filter-log identity through the Rust filter engine and TX selection path. Emit non-PBR input, output, cached output, and lo0 filter logs without double-emitting PBR routing-instance terms. Policy deny events now use the numeric userspace snapshot policy ID, matching the compiled policy slot instead of losing rule identity. Add deterministic Go syslog fanout coverage for raw userspace RT_FLOW policy, screen, and filter frames. Update the #1373/#1379 docs with the implemented call sites and the remaining live-cluster evidence gap.
There was a problem hiding this comment.
Pull request overview
Adds non-PBR userspace filter-log emission and stable compiled policy IDs to the userspace AF_XDP dataplane, extending the existing event-stream wiring so RT_FLOW filter-log frames now cover non-PBR input filters, live/cached output filters, and lo0/local-delivery filters, and so RT_FLOW policy-deny frames carry the snapshot's numeric policy ID. Adds deterministic Go syslog fanout coverage for raw userspace event-stream frames and updates the #1379/#1373 plan/docs.
Changes:
- Carry compiled filter ID, term ID, action through filter evaluation, live TX selection, and cached TX descriptors; emit RT_FLOW filter-log events at non-PBR input, live/cached output, and lo0/local-delivery sites in
poll_descriptor.rsandforward_request.rs(the non-PBR helper skips routing-instance terms to avoid double emit against the PBR path). - Plumb
policy_idthroughPolicyRuleSnapshot,PolicyRule, and a newevaluate_policy_result_with_lensoemit_policy_deny_eventcarries a stable compiled ID; userspace snapshot builds the ID withpolicySetID * MaxRulesPerPolicy + ruleIndexand accounts for per-application expansion. - Add a deterministic UDP syslog fanout test that feeds raw policy-deny, screen-drop, and filter-log frames through
EventReader.ProcessRawEvent, plus docs/README updates describing the new call sites and remaining live-evidence work.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| userspace-dp/src/policy.rs | Adds PolicyEvaluationResult and a _result_with_len evaluator that returns policy_id alongside the action. |
| userspace-dp/src/policy_tests.rs | Adds a test that asserts the new evaluator returns the snapshot's policy_id. |
| userspace-dp/src/protocol.rs | Adds policy_id to PolicyRuleSnapshot with default/round-trip coverage. |
| userspace-dp/src/filter/mod.rs | Adds FilterLogMatch, has_log_terms, and log-match fields on filter results / cached TX descriptors. |
| userspace-dp/src/filter/compiler.rs | Sets has_log_terms and includes log-only filters in fast-path indexes. |
| userspace-dp/src/filter/engine.rs | Adds evaluate_lo0_filter_log_match / evaluate_interface_filter_log_match and threads log matches into TX-selection paths. |
| userspace-dp/src/filter/tests.rs | Covers the new log-match helper, including skip-routing-instance behavior. |
| userspace-dp/src/filter/README.md | Documents non-PBR/lo0/output filter-log behavior and ID stability. |
| userspace-dp/src/afxdp/poll_descriptor.rs | Wires non-PBR input, cached-output, and lo0 filter-log emissions and passes policy_id into policy-deny events. |
| userspace-dp/src/afxdp/forward_request.rs | Adds an event-stream parameter; emits live output filter-log events from the live TX-selection path. |
| userspace-dp/src/afxdp/flow_cache.rs | Adds filter_log to CachedTxSelectionDescriptor. |
| userspace-dp/src/afxdp/tx/cos_classify.rs | Plumbs filter_log through live and cached CoS selection. |
| userspace-dp/src/afxdp/event_emit.rs | emit_policy_deny_event now takes and emits a policy_id. |
| userspace-dp/src/afxdp/forwarding/tests.rs | Asserts policy-deny event carries the new policy_id. |
| userspace-dp/src/afxdp/frame/tests.rs | Adds output-filter-log emission test for the live request builder. |
| userspace-dp/src/afxdp/tests.rs / umem/tests.rs | Updates fixtures for the new optional filter_log / event-stream arguments. |
| userspace-dp/src/afxdp/README.md | Documents the broader RT_FLOW filter-log producer scope. |
| pkg/dataplane/userspace/protocol.go | Adds PolicyID to the Go PolicyRuleSnapshot. |
| pkg/dataplane/userspace/snapshot.go | Computes stable compiled PolicyIDs (with application-expansion stride) into the snapshot. |
| pkg/dataplane/userspace/manager_test.go | Asserts the new PolicyID values across round-trips. |
| pkg/dataplane/userspace/eventstream_test.go | Adds the deterministic raw-frame → syslog fanout coverage with a typed payload builder. |
| pkg/logging/README.md | Documents the new userspace syslog harness as the canonical place for raw-frame coverage. |
| docs/userspace-dataplane-gaps.md, docs/pr/1373-retire-ebpf-dataplane/plan-1379-dataplane-events.md | Updates #1379/#1373 status, call sites, and remaining evidence/gaps. |
Claude r1 review on
|
Round-1 triple-review synthesis on
|
| Reviewer | Verdict |
|---|---|
| Claude | MERGE-READY → MERGE-NEEDS-MAJOR (self-correcting) |
| Codex | MERGE-NEEDS-MAJOR |
| Gemini Pro 3 | MERGE-NEEDS-MAJOR |
Gemini MAJOR — hot-path O(N) filter scan per packet
"
emit_non_pbr_input_filter_logcallsevaluate_interface_filter_log_matchdirectly in the RX loop before the flow cache lookup. This linearly scans all firewall filter terms for every packet on the hot path."
Quoted from poll_descriptor.rs:339-347:
emit_non_pbr_input_filter_log(
worker_ctx.forwarding,
worker_ctx.event_stream,
flow.as_ref(),
meta,
ingress_zone_override,
now_ns,
);
// ── Flow cache fast path ────────────────────────────The emit fn runs BEFORE the cache check. So every packet — even ones that would hit the cache and bypass the full filter pipeline — pays the O(N) filter-term-scan cost. This is exactly the kind of regression where cached flows lose their fast-path advantage.
Codex MAJOR 1 — operator can't disambiguate PBR vs non-PBR
"PBR and non-PBR paths all emit the same
DataplaneEventKind::FilterLog, with no chain/stage/PBR bit in the payload. Go logging path then makes this worse:FILTER_LOGsyslog prints src/dst/proto/action/zone only, dropping filter id, term id, egress zone, and stage inringbuf.go:626."
Operator sees FILTER_LOG events but can't tell which chain (input/output/cached output/lo0/PBR) matched. Triage breaks.
Codex MAJOR 2 — End-to-end production test gap STILL not closed
"I do not see a test driving non-PBR filter logging through
poll_binding_process_descriptor, which is the production integration point for input, cached output, and lo0 logging."
Same class of failure as #1418 r3 — the prior round caught "helper-level test would still pass if production call site removed", and r2 added an end-to-end test for the policy-deny path. The non-PBR filter-log path needs the same.
Codex MEDIUM — snapshot protocol versioning fuzzy
"Filter-log itself does not add snapshot fields, so it does not require a version bump. But this same commit adds policy identity into snapshots while Go
ProtocolVersionremains2inprotocol.go:10. If policy id correctness is contractual, this needs versioning or an explicit compatibility story."
Both MINORs converged
IP assertions (#1418 r3 finding): Gemini quotes eventstream_test.go:816-820 — only substring match on "RT_FLOW POLICY_DENY", doesn't verify the actual src/dst IPs in the emitted frame. Codex flagged the same at event_emit.rs:378 and frame/tests.rs:2304.
Per-filter rate-limit: neither author found a per-filter throttle. Rate-limiting is bucketed per (zone × kind). A single noisy filter blinds the entire zone's FilterLog bucket.
Hot-path verification on the budget pattern
Codex AND Gemini both verify: budget acquire-encode-release lifecycle from #1404 is preserved. No leak on Full/Disconnected error paths.
But the hot-path FILTER EVALUATION (Gemini's MAJOR) runs BEFORE the budget try_acquire — so on cached-flow hits, even when the event would be dropped or rate-limited, the filter term scan still happens. That's the perf killer.
Self-correction
My r1 review said MERGE-READY structurally. I flagged hot-path concerns to verify but didn't check the call-site placement relative to the flow cache. Gemini did and found that filter scan runs pre-cache. Codex caught the operator disambiguation issue. Both are real.
Recommendation
Block on:
- Move filter-log emit AFTER flow cache hit check — for cached flows, the previously-cached filter-log decision should re-fire without re-scanning all terms. Or gate the emit by a stored "this flow has a logged filter" flag in the flow cache entry.
- Add PBR vs non-PBR distinction in DataplaneEventPayload (e.g., new chain enum or PBR bit). Update Go ringbuf.go to render filter_id/term_id/stage in the syslog output.
- Add end-to-end production-call-site test via
poll_binding_process_descriptor— same pattern as the Emit userspace security dataplane events #1418 r3 fix for policy-deny. - Pin IP assertions in the new test — assert
event.src_ipandevent.dst_ipmatch the encoded frame's actual IPs, not just a substring match.
Strongly consider: per-filter rate-limit (separate bucket per matched filter ID) to prevent zone-wide blinding.
Defer: snapshot protocol versioning audit for the policy-identity field (consolidate with whichever PR bumps the snapshot version next).
Codex task: task-mpc374ys-jgo838. Gemini task: task-mpc37fjf-kl2lil. Not merging — author's decision.
Filter-log events need to tell operators which runtime surface emitted the log. A PBR hit and a non-PBR input, output, or lo0 hit can otherwise look identical once they arrive as RT_FLOW FILTER_LOG records. Add a small Rust FilterLogSource enum and place its wire value in the RT_FLOW reason byte for filter-log events. PBR, input, output, cached-output, and lo0 producers now pass their source explicitly. Move non-PBR input filter-log emission off the pre-cache hot path. The helper now emits those logs only on the slow path after a flow-cache miss, so established cache hits do not scan input filters just to find a log-only term. Teach the Go userspace event adapter and SSE formatter to render the filter-log source, filter id, and term id. Extend the UDP syslog fanout test so a raw userspace FILTER_LOG frame must surface source=pbr end to end. Update the #1379 docs with the source-disambiguated contract and the slow-path-only input-filter invariant.
Claude round-2 review on
|
Round-2 triple-review synthesis on
|
| Reviewer | Verdict |
|---|---|
| Claude (self-corrected) | MERGE-NEEDS-MAJOR (was MINOR; Codex caught 4 verified additional MAJORs) |
| Codex | MERGE-NEEDS-MAJOR (4 MAJORs + 1 MINOR) |
| Gemini Pro 3 | pending (task-mpc5i0ml-idmdas, still running at 4:30) |
Self-correction (Claude r2 → MAJOR)
My round-2 Claude review marked this MINOR after verifying the structural fixes for r1 MAJORs. Codex r2 found additional verified MAJORs I missed:
Verified Codex MAJORs
1. Junos parity broken: input filter logs under-emitted on cached flows. emit_non_pbr_input_filter_log only fires on the session-miss slow path at poll_descriptor.rs:944. Cache hits call emit_cached_output_filter_log at :391, which only emits FilterLogSource::CachedOutput. So if a packet matches both input and output filters with then log, INPUT events fire once per session (first slow-path packet) and are silent on subsequent cached hits. Operators expecting Junos "log every matched packet" semantics see one-per-session for INPUT but per-packet for OUTPUT — asymmetric and contradicting Junos behavior.
2. "After session creation" claim in PR description is false. emit_non_pbr_input_filter_log at poll_descriptor.rs:944 fires BEFORE policy evaluation, NAT finalization, route resolution, AND session install. Forward install at :1498, reverse at :1627, missing-neighbor seed at :2399 — all later. The emit is "after flow-cache miss," not "after session creation." Packets emit FILTER_LOG events even when session install or policy evaluation later fails.
3. Wire semantic change unversioned. userspace-dp/src/protocol.rs:10 and pkg/dataplane/userspace/protocol.go:10 both still = 2. No event-stream version exists. payload[134] now means filter-log source for FILTER_LOG and close reason elsewhere. Current Go gates correctly at ringbuf.go:324 and :566 by EventType, but an older Go reader without that gate would decode source byte 2 as "TCP FIN", 3 as "TCP RST", etc. Acceptable internally if helper/daemon lockstep is enforced; needs to be documented as a hard invariant.
4. r1 end-to-end test gap remains. pkg/dataplane/userspace/eventstream_test.go:773 injects synthetic Go-built payloads into the Go decode path; does NOT exercise Rust emit → wire → Go decode. Only asserts source=pbr at :896; no source=input end-to-end assertion. PBR-vs-input disambiguation is not proven through production paths.
Verified Codex MINOR
5. Lo0 tag is local-delivery logging, not lo0 ingress filter logging. emit_lo0_filter_log at poll_descriptor.rs:2218 is in the LocalDelivery branch and evaluates configured lo0 filter state. The actual evaluate_lo0_filter(_counted) enforcement path is not used in production Rust — kernel/nft applyLo0Filter handles real lo0 ingress filtering. The Lo0 tag means "userspace local-delivery matched configured lo0 log term", not proof of lo0 ingress filter behavior. Document operator semantics accordingly.
Codex Confirmed correct
- Cache fast path no longer calls
emit_non_pbr_input_filter_log(verified). - Current Go decoding is correctly gated by event type.
FilterLogSourceis Copy,wire_reason()is inline, no hot-path allocation.- No downstream FILTER_LOG regex/parser dependency in
pkg/api,pkg/grpcapi, orcmd/cliwould break on the new format additions.
Recommendation
Block on MAJORs 1-4:
- M1 (Junos parity): decide intent — either accept under-emit as a Junos-parity divergence (document it explicitly in the PR / gaps.md so operators know) or extend cached-flow emission to also fire INPUT events.
- M2 (PR description correctness): update the PR description to say "after flow-cache miss" instead of "after session creation"; the audit/intent docs should accurately describe when emission happens.
- M3 (wire version): either bump
CONFIG_SNAPSHOT_PROTOCOL_VERSION/ introduce an event-stream version, OR document the helper/daemon lockstep invariant explicitly. Right now the version says "no protocol change" butpayload[134]has changed semantics. - M4 (end-to-end test): add a test that exercises Rust emit → wire encoding → Go decode → asserts
source=input(PBR-vs-input disambiguation proven through the production path).
Strongly consider M5 (Lo0 tag): document the actual semantic ("local-delivery match against configured lo0 log term") so operators don't misinterpret it.
Codex task: task-mpc55oqs-o478g3. Gemini Pro 3 re-running (task-mpc5i0ml-idmdas); will update synthesis when complete.
Not merging — author's decision.
Round-2 Gemini Pro 3 verification — independently verified all 5 Codex findingsGemini Pro 3 (
Note on Gemini's "Codex hallucinated" sub-claimGemini's r2 finding 3 included this hostile assertion:
Verified false. At Both names and values match Codex's r2 citation exactly. Gemini's "Codex hallucinated" sub-claim is itself a hallucination — the cited constants and values are correct. The core architectural finding (wire change is unversioned) stands. NetAll 5 Codex findings independently verified. The MERGE-NEEDS-MAJOR verdict is solid. Recommendations from the earlier round-2 synthesis still apply (M1 Junos parity / M2 description accuracy / M3 wire versioning or lockstep doc / M4 end-to-end test / M5 Lo0 semantics). Codex task: Not merging — author's decision. |
The previous filter-log closeout moved input-filter matching off the cached-hit hot path, but it also changed semantics: input logs fired only on the first packet that missed the flow cache. Cached packets could still emit output logs, so operators saw asymmetric FILTER_LOG records. Compute the input log match on the slow path only, after a flow-cache miss. A successful session install emits the input log and stores the matched term in the flow-cache descriptor. Later cached hits replay that cached input log without rescanning firewall terms. Keep source disambiguation explicit in the RT_FLOW reason byte and document the helper/daemon lockstep requirement for that event-stream semantic. The config snapshot protocol is unchanged because this is not a snapshot field. Add a poll-descriptor production-path test for input FILTER_LOG emission and a flow-cache constructor test proving the cached input log match is preserved for cached hits. Document that the lo0 source label means userspace local-delivery log matching, not kernel lo0 ingress enforcement.
Claude round-3 review on
|
Round-3 triple-review synthesis on
|
| Reviewer | Verdict |
|---|---|
| Claude r3 (self-corrected) | MERGE-NEEDS-MAJOR (was MERGE-READY) |
| Codex r3 | MERGE-NEEDS-MAJOR (3 new MAJORs) |
| Gemini Pro 3 | pending (re-dispatched: task-mpc7ygfl-ttud6l) |
Self-correction (Claude r3 → MAJOR)
My round-3 Claude review marked this MERGE-READY after verifying the post-install gate and cached input log replay. Codex r3 found 3 verified MAJORs my review missed. I read the structural fixes but didn't walk the consistency between slow-path and cached emission paths, or check whether filter ACTIONS (not just LOG matches) are enforced.
Verified Codex r3 MAJORs
M1 — Cached ingress_zone mismatch (cached emit always uses 0):
- Slow-path emission at
poll_descriptor.rs:1576-1587usesingress_zone_idfromevaluate_non_pbr_input_filter_log()(which callsfilter_log_ingress_zone_id(forwarding, meta, ingress_zone_override, ingress_ifindex)). - Cache population at
:2221-2235passessession_ingress_zone: Option<u16>intoFlowCacheEntry::from_forward_decision.session_ingress_zoneis initializedNoneat:707and only set at:771inside a differentmatch resolvedbranch — NOT on session-miss install path that populates cache for the slow-path-installed flow. flow_cache.rs:318-320storesmetadata.ingress_zone = ingress_zone.unwrap_or(0)→ 0 for None.- Cached replay at
:164-169readscached_metadata.ingress_zone(= 0) and passes toemit_input_filter_log_match.
Result: first slow-path packet emits RT_FLOW FILTER_LOG ... zone=<from_zone>->...; subsequent cached packets emit RT_FLOW FILTER_LOG ... zone=0->.... Inconsistent operator-visible field. Verified against the code at 5dc6b1d4.
M2 — End-to-end test stops before Go syslog formatter:
poll_descriptor_input_filter_log_path_emits_rt_flow_eventdrives realpoll_binding_process_descriptorintoevent_stream::test_worker_handleand decodes the Rust event-stream frame. It does NOT extend through the Go event-stream callback /EventReader/ syslog formatter.- Existing Go syslog test (
eventstream_test.go:890-894) only assertssource=pbr, notsource=input. - Result: PBR-vs-input disambiguation is not proven through production paths.
M3 — then log discard action not enforced:
evaluate_non_pbr_input_filter_logreturnsFilterLogMatch{action: Discard}forthen discard logterms, butpoll_descriptor.rsonly consults the match for log emission, not for drop enforcement.- Verified by grep at
5dc6b1d4:crate::filter::calls inpoll_descriptor.rsare limited toevaluate_interface_filter_log_match(log-only),evaluate_lo0_filter_log_match(log-only),record_filter_counter, andapply_cached_three_color_policers. No call toevaluate_filterorevaluate_filter_counted(the action-returning variants) for input filters. - Result: a packet matching
then discard logis logged asaction=DENYbut still forwarded. Cached hits replay the same misleading log while still forwarding.
This is likely pre-existing (input filter action enforcement is a separate gap, not introduced by #1428), but #1428's new logging makes it visible — operators reading the new FILTER_LOG action=DENY events will reasonably assume the packet was dropped when it wasn't. Either:
- (a) Block on this PR and add input filter action enforcement, OR
- (b) Land Emit non-PBR userspace filter log events #1428 and file an explicit follow-up issue + document the gap in plan-1379 / gaps.md so operators understand the current semantic.
Codex confirmed correct
- Post-install gating at
:1576-1587(only emits afterinstall_with_protocol_with_originreturns true). - Cached emission order at
:420-427(input log emit BEFORE output log emit, BEFORE continue/break). - Wire semantics doc at
event_stream/README.md:25-30(lockstep invariant). - Lo0 wording at
plan-1379:56-58(userspace local-delivery, not kernel/nft).
Recommendation
Block on M1 (ingress_zone mismatch) — purely a fix-the-cache-stamp problem; can be addressed in the same PR. Either set session_ingress_zone = Some(from_zone_id) before cache population, OR store the resolved ingress_zone_id alongside input_filter_log in RewriteDescriptor and replay it on cached hit.
Block-or-document M3 (discard not enforced) — author's call:
- If this is a pre-existing scope-defer, update
plan-1379-dataplane-events.mdto explicitly document thatFILTER_LOG action=DENYevents do NOT imply the packet was dropped under the current userspace AF_XDP path; drop enforcement is tracked separately. - If the goal is parity, add input filter action enforcement in the same PR (the post-install gate makes this straightforward — short-circuit before install when action is Discard).
Strongly consider M2 (end-to-end test) — extend eventstream_test.go to drive a real Rust emit → Go decode for source=input and assert the syslog text matches.
Codex task: task-mpc7gbcx-osossq. Gemini Pro 3 re-dispatched (task-mpc7ygfl-ttud6l).
Not merging — author's decision.
Round-3 Gemini infrastructure unavailable — final r3 verdict on Codex + ClaudeAll 4 Gemini r3 re-dispatches hit Final r3 verdict: MERGE-NEEDS-MAJOR Three Codex MAJORs verified independently against the code at
See the earlier synthesis comment for quoted-line evidence on each. Block on M1 (purely a cache-stamp fix); M3 is author's-call (block-and-fix vs document-and-defer). Not merging — author's decision. |
Input FILTER_LOG support still treated non-PBR input filters as telemetry-only. A logged discard term could produce an RT_FLOW record while the packet continued into route lookup, policy evaluation, and session install. Evaluate the first non-routing input-filter term on the slow path and treat discard or reject as terminal before routing. Logged terminal terms emit source=input with the deny or reject RT_FLOW action and do not enqueue a forward request or install a session. Carry the computed ingress-zone ID inside the cached input-log descriptor. Cached hits now replay the same source=input zone that the slow path computed, instead of borrowing SessionMetadata.ingress_zone and risking a zero-zone or fabric-override mismatch. Keep cache population free of extra counter side effects by using a log-only lookup for the descriptor after the counted slow-path input filter evaluation. Update the #1379 plan to document the terminal action and cached-zone contracts.
Claude round-4 review on
|
Round-4 triple-review synthesis on
|
| Reviewer | Verdict |
|---|---|
| Claude r4 (self-corrected) | MERGE-NEEDS-MAJOR (was MERGE-READY) |
| Codex r4 | MERGE-NEEDS-MAJOR (3 MAJORs) |
| Gemini Pro 3 | infrastructure outage; re-dispatched |
Self-correction (Claude r4 → MAJOR)
My round-4 Claude review marked this MERGE-READY with two non-blocking observations (Reject treated as Discard, cache-stamp double-eval). Codex r4 elevated Reject from non-blocking to MAJOR with a stronger argument I missed, and found a NEW MAJOR I didn't probe (lo0 path).
Codex r4 verified MAJORs
M2 — Still not fixed: end-to-end test gap. f967a112 adds Rust-side event-stream assertions only. The new poll_descriptor_input_filter_discard_drops_and_logs test decodes event.decode_dataplane_event() directly without going through Go EventReader → syslog formatter. Existing Go syslog fanout test at pkg/dataplane/userspace/eventstream_test.go:878-898 still uses reason: 1 (pbr) and asserts wantExtra: "source=pbr" — never proves source=input renders correctly through the Go path. The r3 M2 finding remains open.
Reject is silent drop — contract divergence (new MAJOR). enum FilterAction { ..., Reject /* Drop with ICMP unreachable */ } at userspace-dp/src/filter/mod.rs:35-41. Slow-path gate at poll_descriptor.rs:1038-1049 treats Reject identically to Discard (drop + continue, no ICMP emitted). But event_emit.rs:200-205 maps Reject → RT_FLOW action 2 (reject). Operators reading the log see "reject" but the dataplane silently dropped — log says one thing, wire behavior is another. Either implement ICMP-unreachable on Reject OR change the enum doc-comment + emit mapping so Reject behaves as Discard explicitly. My Claude r4 flagged this as non-blocking; Codex correctly elevates it to MAJOR because the contract divergence misleads operators.
lo0/local-delivery filter action bypassed (new MAJOR I missed). poll_descriptor.rs:255-282 evaluate_lo0_filter_log_match is log-only — returns FilterLogMatch but never an action. Local delivery at :2331-2347 emits the log then continues into maybe_reinject_slow_path. A then log discard/reject on a lo0 filter logs but does NOT drop. Same class of bug as r3 M3 was for non-PBR input filters, but on the lo0 path. The r4 fix didn't address it.
Codex confirmed correct
- M1 ingress_zone fix:
CachedInputFilterLog { log_match, ingress_zone_id }stored inRewriteDescriptor; cached emit usescached_log.ingress_zone_id, no longercached_metadata.ingress_zone. (flow_cache.rs:35-40) - M3 discard enforcement on non-PBR input: action gate at
poll_descriptor.rs:1029-1050is terminal before route override, policy eval, and session install. Verified by Codex against the surrounding control flow. evaluate_interface_filter_non_routing_countedis correct: first-match-wins, PBR-first returns default Accept (defers to PBR routing), non-PBR returns action+log.- Cache-stamping consistency between
evaluate_non_pbr_input_filter(slow path) andevaluate_non_pbr_input_filter_log_only(cache stamp): both walk terms in declaration order, return None on PBR-first-match, return same log_match on non-PBR-first-match. - New Rust test asserts the right fields (event.reason, action=0, ingress_zone=TEST_LAN_ZONE_ID NOT 0, scratch_forwards.is_empty()).
- PBR + non-PBR mixed filters: PBR-first defers correctly.
- IPv6 covered by the same evaluator path.
- Fragment / non-flow packets return Accept default (no flow → no eval).
Recommendation
Block on three MAJORs:
- M2 (Go syslog test): add a Go-side test that asserts the Go decode + syslog format produces
source=inputend-to-end. The existingEventReader → syslogpath is untested for non-PBR input source. - Reject contract: decide one of: (a) implement Reject as ICMP-unreachable (match Junos), OR (b) update enum doc-comment + RT_FLOW action mapping to make Reject behave identically to Discard explicitly. Don't ship the log/wire divergence.
- lo0 action enforcement: mirror the non-PBR fix on the lo0 path. Either evaluate action there too, or document that lo0 filter action is currently log-only.
Codex task: task-mpcpwcvt-g02bch. Gemini Pro 3 re-dispatched after r4 task silently dropped from the queue.
Not merging — author's decision.
Round-4 Gemini Pro 3 verification — all 3 Codex MAJORs independently VERIFIEDGemini Pro 3 re-dispatch ( Triple-review fully converged on MERGE-NEEDS-MAJOR
Gemini's quoted-line verificationM2 — Rust/Go test sync gap. Gemini quoted the new Rust test ( Reject is silent drop — contract divergence. Gemini quoted three lines:
Operators see lo0 filter action bypassed. Gemini quoted both Gemini's framing on M3
Sharper than Codex's framing on lo0 — calls out the security boundary issue. Worth keeping in mind for the fix: Recommendations (unchanged from earlier r4 synthesis)
Codex task: Not merging — author's decision. |
Summary
live TX selection, and cached TX descriptors.
live output filters, cached output-filter hits, and lo0/local-delivery
filters while skipping routing-instance terms in the non-PBR helper.
from the userspace snapshot.
screen-drop, and filter-log event-stream frames.
remaining evidence scope.
Validation
cargo test --manifest-path userspace-dp/Cargo.tomlgo test -count=1 ./pkg/dataplane/userspace ./pkg/logginggit diff --checkOne earlier full Rust run hit the existing timing-sensitive
afxdp::umem::tests::tx_latency_hist_cross_thread_snapshot_skew_within_boundflap; rerunning the same full suite passed.
Remaining #1379 Evidence Gap
Live userspace-cluster syslog evidence is still needed if #1373 Phase 4 wants
operator artifacts beyond the deterministic local UDP syslog harness. That
should include deny policy, screen drop, PBR filter log, non-PBR input/output
filter log, lo0/local-delivery filter log, and deny-storm starvation/loss
counter checks.