Skip to content

events: close filter-log enforcement gaps#1430

Merged
psaab merged 1 commit into
masterfrom
codex/d-1379-filter-log-fix
May 19, 2026
Merged

events: close filter-log enforcement gaps#1430
psaab merged 1 commit into
masterfrom
codex/d-1379-filter-log-fix

Conversation

@psaab
Copy link
Copy Markdown
Owner

@psaab psaab commented May 19, 2026

Summary

Why

#1428 was merged at f967a112 before the round-4 review findings were fully closed. This fix-forward PR carries the follow-up commit that closes the three verified blockers:

  • M2 Go syslog test gap
  • Reject log/wire action contract gap
  • lo0 filter action bypass before slow-path reinjection

Validation

  • RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log_can_encode_reject_action -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log_event_emit_preserves_reject_action -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
  • git diff --check HEAD^
  • commit-message line-length check

Refs #1379
Refs #1373

Copilot AI review requested due to automatic review settings May 19, 2026 15:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes several remaining gaps in userspace dataplane event emission and enforcement, ensuring FILTER_LOG actions (including reject) are preserved end-to-end and that lo0/local-delivery filter terminal actions are enforced before slow-path reinjection or pass-to-kernel session installation.

Changes:

  • Enforce lo0/local-delivery filter discard/reject actions as terminal during packet processing (preventing slow-path reinjection/session installation) while still emitting FILTER_LOG when configured.
  • Pin/validate FILTER_LOG reject action handling across the Rust wire codec + encoder and the Go event-stream → syslog fanout path (with added tests).
  • Update #1379 event plan documentation to reflect the enforced lo0 terminal behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
userspace-dp/src/event_stream/codec.rs Adds RT_FLOW reject action constant to the codec contract.
userspace-dp/src/event_stream/codec_tests.rs Adds round-trip test ensuring FILTER_LOG can encode/decode reject.
userspace-dp/src/afxdp/tests.rs Adds regression test asserting lo0 discard drops without slow-path reinjection/session install.
userspace-dp/src/afxdp/poll_descriptor.rs Applies lo0 filter action before LocalDelivery slow-path reinjection/session caching; emits lo0 filter-log when matched.
userspace-dp/src/afxdp/event_emit.rs Adds test ensuring emit path preserves reject action in emitted FILTER_LOG events.
pkg/logging/ringbuf.go Corrects binary record documentation for action encoding (0=deny, 1=permit, 2=reject).
pkg/dataplane/userspace/eventstream_test.go Extends syslog fanout test coverage to include FILTER_LOG reject action.
docs/pr/1373-retire-ebpf-dataplane/plan-1379-dataplane-events.md Updates plan text to match implemented lo0 terminal behavior and wording.

)
{
telemetry.dbg.local += 1;
telemetry.dbg.policy_deny += 1;
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-1 review on a911d147

Verdict: MERGE-READY (pending Codex + Gemini r1 hostile verification, with two non-blocking observations)

Fix-forward correctly addresses all 3 r4 MAJORs from PR #1428 (which was merged at f967a112 before the round-4 closeout review landed).

Verified fixes

M3 — lo0 filter action enforcement is now terminal.

  • emit_lo0_filter_log renamed to apply_lo0_filter_action at poll_descriptor.rs:263-300. Now calls evaluate_lo0_filter_counted (action+log) instead of evaluate_lo0_filter_log_match (log-only). Returns bool = !matches!(result.action, FilterAction::Accept).
  • New call site at poll_descriptor.rs:1237-1252: when disposition == LocalDelivery && apply_lo0_filter_action(...) == true → bump telemetry counters, scratch_recycle.push, continue. Terminal BEFORE the LocalDelivery cache install at :1242+ should_cache_local_delivery_session_on_miss.
  • Old emit call at :2379 removed (the post-decision LocalDelivery dispatch no longer logs because the new gate already logged + dropped).
  • Single evaluation per packet: grep confirms only :275 calls evaluate_lo0_filter_counted; no double-count.

Test poll_descriptor_lo0_filter_discard_drops_without_reinject asserts:

  • event.kind=FilterLog, reason=Lo0.wire_reason(), action=0 (DENY)
  • ingress_zone_id == TEST_LAN_ZONE_ID (proves correct zone)
  • scratch_forwards.is_empty() AND sessions.len() == 0 AND slow_path_drops == 0 AND recent_exceptions.is_empty() — proves packet was dropped at the new gate, NOT reinjected to slow path, NOT installed as a session.

Reject action codec pinned (M2-part-1).

  • New RT_FLOW_ACTION_REJECT = 2 const in event_stream/codec.rs:48 (was unnamed).
  • New test_filter_log_can_encode_reject_action (codec_tests.rs:286-301): builds payload with action = RT_FLOW_ACTION_REJECT, asserts wire byte payload[54] == 2 AND decoded action == RT_FLOW_ACTION_REJECT.
  • New filter_log_event_emit_preserves_reject_action (event_emit.rs:433-460): calls emit_filter_log_event with FilterAction::Reject, asserts decoded event action == RT_FLOW_ACTION_REJECT.

M2 — Go syslog test for source=input.

  • New input-filter-log-reject frame in TestEventStreamRawDataplaneEventsFeedSyslogFanout (eventstream_test.go:903-924): dataplane.ActionReject, source byte 2 (Input), filter_id=42, term_id=9.
  • Assertion at :951-953 adds: frame.label != "input-filter-log-reject" || strings.Contains(msg, "action=reject") — proves the syslog text correctly includes BOTH source=input AND action=reject end-to-end through Rust emit → wire → Go EventReader → syslog.
  • Channel buffer increased 3→4; seen map keyed by frame.label (was frame.want) to disambiguate two FILTER_LOG entries; FilterLogEvents counter check == 1== 2.

ringbuf.go comment correction. [7] Action (0=permit, 1=deny, 2=reject)(0=deny, 1=permit, 2=reject) matching the actual encoding (deny=0 per RT_FLOW_ACTION_DENY). This was an unrelated doc bug fixed in passing.

Plan-1379 doc updated. Now explicitly states lo0 discard/reject is terminal and prevents slow-path reinjection.

Hostile observations (non-blocking)

H1 — Reject still silent drop; enum docstring divergence remains.
The fix pins the WIRE codec for Reject (logs correctly say action=reject), and routes Reject through the same action != Accept drop gate as Discard. But the enum docstring at filter/mod.rs:35-41:

/// Drop with ICMP unreachable.
Reject,

…still claims ICMP-unreachable behavior that the dataplane does NOT implement. The fix is partial: logs now correctly say reject, but the wire behavior is silent drop, NOT ICMP-back. Either:

  • (a) implement ICMP unreachable for Reject (proper Junos parity), OR
  • (b) update enum docstring to /// Drop (ICMP unreachable not yet implemented). and document explicitly in plan-1379

Follow-up issue, not blocking.

H2 — Lo0 test only covers discard, not reject. Symmetric coverage would add a second lo0 test with then reject log to verify the gate handles Reject identically. Not blocking; the codec/emit tests for Reject already cover the wire side.

H3 — telemetry.dbg.policy_deny on filter drops. apply_lo0_filter_action triggers telemetry.dbg.policy_deny += 1 when a lo0 filter discards. Filter drops are semantically distinct from policy denies; operators reading the policy_deny counter would over-count by lo0-filter rejections. Worth either a separate filter_drop counter or documenting that policy_deny includes filter rejections.

H4 — Cached LocalDelivery flow bypass. The cache fast path at :420+ emits cached input/output filter logs but does NOT call apply_lo0_filter_action. If a LocalDelivery flow is cached when lo0 was Accept, then config changes lo0 to Discard, would cached subsequent packets still pass to the kernel? Flow cache rg_epochs invalidation should catch config changes — worth verifying that lo0 filter changes bump the relevant epoch.

Recommendation

MERGE-READY with two minor follow-ups suggested:

  • H1: update Reject enum docstring to match actual silent-drop behavior, OR file a follow-up to implement ICMP-back.
  • H2: add a lo0 reject test for symmetric coverage.

H3 (counter naming) and H4 (cached path) are operational observations, not blockers.

Awaiting Codex (task-mpct579q-l3ges5) and Gemini Pro 3 (task-mpct5w6y-8a5q18).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-1 synthesis on a911d147 — Claude self-correction; Codex 2 MAJORs (Gemini pending)

Reviewer Verdict
Claude r1 (self-corrected) MERGE-NEEDS-MAJOR (was MERGE-READY)
Codex r1 MERGE-NEEDS-MAJOR (2 MAJORs)
Gemini Pro 3 pending (still in tool_call at 5m+; task-mpct5w6y-8a5q18)

Self-correction (Claude r1 → MAJOR)

My round-1 Claude review marked this MERGE-READY with H1 (Reject contract divergence) and H4 (cached bypass) as non-blocking observations. Codex r1 elevated both to MAJOR, and reframed H4 from "userspace flow cache bypass" (which Codex correctly notes is a non-issue — LocalDelivery isn't cacheable in the flow cache) to "BPF session-map PASS_TO_KERNEL bypass" — a deeper issue I should have probed.

Verified Codex r1 MAJORs

M1 — BPF session-map PASS_TO_KERNEL bypass for existing LocalDelivery sessions.

The new apply_lo0_filter_action gate runs BEFORE session install — correct for NEW sessions. But the install path explicitly publishes a BPF session-map entry so subsequent packets bypass userspace entirely:

poll_descriptor.rs:1271-1274 (verified)
// Keep firewall-local sessions in the helper only for HA
// state. Publish only the exact observed key back into the
// BPF session map so subsequent established packets bypass
// userspace and return directly to the kernel.

And forwarding/mod.rs:1106-1107 confirms:

let _ = publish_session_map_entry_for_session(session_map_fd, key, decision, &local_entry.metadata);

No mechanism flushes the BPF session map on lo0 filter config change. I grepped userspace-dp/src/afxdp/ for session_map.*delete, clear_session_map, flush_session_map, invalidate.*session and found nothing. coordinator/worker_manager.rs:55-75 (worker stop) clears only XSK and heartbeat slots, not session-map entries.

Real consequence: if an operator establishes a flow when lo0 was Accept, then changes config to add a lo0 Discard filter, existing long-lived sessions (BGP, SSH, established TCP) continue to bypass userspace via the BPF session map. The new gate is never reached. The PR's plan-1379 update claims "discard/reject actions are terminal and prevent slow-path reinjection" — true only for NEW sessions, not existing ones.

M2 — Reject is still a contract lie.

filter/mod.rs:40 enum docstring /// Drop with ICMP unreachable. is unchanged. The new lo0 path drops Reject identically to Discard (scratch_recycle + continue, no ICMP-unreachable generated). The new tests pin RT_FLOW_ACTION_REJECT in the wire codec, so the LOG correctly says reject, but the wire behavior is silent drop. Either implement ICMP-unreachable OR update enum doc + plan-1379 to say "currently silent drop, ICMP unreachable not yet implemented".

Codex r1 confirmed

  • M3 lo0 placement on miss path: fixed. Old removed call at :2368, new placement at :1240-1253 BEFORE LocalDelivery cache install.
  • No double-eval of lo0 filter; single call at poll_descriptor.rs:275.
  • Scope OK: 8 files, 321/53, all filter/log/codec/test related.

Codex r1 found additional MINORs (matching my Claude r1 H2/H3)

  • Go syslog test isolation gap: the PBR FILTER_LOG frame doesn't assert ABSENCE of action=reject, so a regression that mistakenly emits reject in the PBR path wouldn't fail this test.
  • No lo0 reject path test: only discard is tested (poll_descriptor_lo0_filter_discard_drops_without_reinject).
  • Counter taxonomy: telemetry.dbg.policy_deny += 1 for lo0 filter drops conflates filter rejections with policy denies in debug telemetry. Should be a separate filter_drop counter.

Recommendation

Block on M1 and M2:

  • M1 (BPF session-map bypass): critical. Either (a) flush PASS_TO_KERNEL session-map entries when the lo0 filter config changes, or (b) add an explicit "filter generation" check at session-map publish time so stale entries auto-invalidate, or (c) document the gap explicitly in plan-1379 (operators with long-lived local sessions need to know that filter config changes don't take effect on established flows). Option (c) is the minimum acceptable; (a)/(b) are the proper fix.
  • M2 (Reject contract): pick one — implement ICMP-unreachable OR update enum docstring + plan-1379 to say "silent drop, ICMP unreachable not yet implemented." Don't ship the log/wire divergence.

Strongly consider in this PR (MINORs):

  • Add a lo0 reject action test (symmetric coverage).
  • Tighten the Go syslog test isolation: PBR frame should assert ABSENCE of action=reject.
  • Rename or split telemetry.dbg.policy_deny to distinguish filter rejections from policy denies.

Codex task: task-mpct579q-l3ges5. Gemini Pro 3 still in tool_call at 5m+ (task-mpct5w6y-8a5q18); will update synthesis when it lands.

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-1 Gemini Pro 3 verification — all 2 Codex MAJORs INDEPENDENTLY VERIFIED

Gemini Pro 3 (task-mpct5w6y-8a5q18) landed and verified both Codex r1 MAJORs with quoted-line evidence at a911d147.

Triple-review fully converged on MERGE-NEEDS-MAJOR

Reviewer Verdict
Claude r1 (self-corrected) MERGE-NEEDS-MAJOR
Codex r1 MERGE-NEEDS-MAJOR (2 MAJORs)
Gemini Pro 3 r1 MERGE-NEEDS-MAJOR (both verified)

Gemini's quoted verification

M2 — Reject contract divergence (Gemini's "DIVERGENCE FOUND").

filter/mod.rs:39-40 defines Reject as /// Drop with ICMP unreachable. However, poll_descriptor.rs executes a silent recycle drop for lo0 regardless of whether the action is Discard or Reject. NO ICMP unreachable logic is triggered.

Identical to Codex M2.

M1 — Cached LocalDelivery bypass (Gemini's "BYPASS CONFIRMED").

Gemini reframes the same gap from a complementary angle:

By moving apply_lo0_filter_action exclusively into the [session-miss] block, you have created an architectural bypass. If a LocalDelivery session is already established and processed by userspace (e.g. BPF offload disabled or map evicted), the session hit path directly evaluates maybe_reinject_slow_path and never calls apply_lo0_filter_action.

Additionally, flow cache epoch invalidation does not save you here because LocalDelivery utilizes the stateful SessionTable — not the fast-path binding.flow.flow_cache (which explicitly limits entries to ForwardCandidate or FabricRedirect via FlowCacheEntry::from_forward_decision). If config changes to Discard, existing LocalDelivery userspace sessions will completely evade the new gate.

Codex framed the bypass via BPF session-map PASS_TO_KERNEL (kernel direct path). Gemini framed it via userspace SessionTable hit (when BPF map fails / is bypassed). Both describe the same root cause from different angles: the new gate runs only on session-miss; established sessions in EITHER bypass mechanism (BPF map OR userspace SessionTable hit) never re-evaluate the lo0 filter.

Gemini also confirmed

  • Lo0 placement (M3): correctly moved to miss path before LocalDelivery cache install and slow-path reinject.
  • evaluate_lo0_filter_counted: correctly used with action != Accept enforcement.
  • Go syslog test fix: channel buffer 3→4, seen map keyed by label, action=reject assertion correctly added.
  • No double-eval: apply_lo0_filter_action is the sole runtime caller of evaluate_lo0_filter_counted.
  • Counter overload: confirmed telemetry.dbg.policy_deny poisons policy enforcement metrics with filter drops.

Gemini also confirmed coverage gap

  • No lo0 reject test: poll_descriptor_lo0_filter_discard_drops_without_reinject uses discard only; no equivalent reject test.

Recommendation (unchanged from earlier synthesis)

Block on M1 and M2. Both reviewers agree on root cause + severity; the only divergence is which bypass mechanism each emphasizes (and both apply).

For M1, the fix needs to either:

  • (a) Invalidate sessions in BOTH the BPF session map AND the userspace SessionTable on filter config change (proper fix).
  • (b) Add a "filter generation" check at session hit time (BPF map and SessionTable) so stale entries get re-evaluated.
  • (c) Document the gap explicitly in plan-1379 — operators with long-lived local sessions need to know filter config changes don't take effect on established flows.

For M2, pick: (a) implement ICMP-unreachable for Reject, OR (b) update enum docstring + plan-1379 to say "currently silent drop, ICMP unreachable not yet implemented".

Strongly consider in this PR (Codex+Gemini both flagged):

  • Add lo0 reject path test
  • Fix counter taxonomy (policy_deny vs filter_drop)

Strongly consider (Codex only):

  • Tighten Go syslog test isolation: PBR frame should assert ABSENCE of action=reject.

Codex task: task-mpct579q-l3ges5. Gemini Pro 3 task: task-mpct5w6y-8a5q18.

Not merging — author's decision.

@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from a911d14 to e36ba81 Compare May 19, 2026 17:14
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-2 fix pushed on e36ba814.

Fixes addressed:

  • M1 cached local-delivery bypass: session-hit LocalDelivery now re-runs the current lo0 filter before reinjection. Terminal lo0 actions emit FILTER_LOG, delete stale session/BPF state, recycle the frame, and avoid slow-path reinjection. HA-published LocalDelivery entries publish helper-visible state instead of PASS_TO_KERNEL while a lo0 filter is configured for that address family.
  • M2 reject contract: filter-path Reject is now logged as RT_FLOW deny because this userspace path currently fails closed as a silent drop and does not synthesize ICMP/RST reject packets. Policy Reject still encodes RT_FLOW reject.

Validation:

  • RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test test_policy_event_can_encode_reject_action -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test poll_descriptor_lo0_filter_drops_cached_local_delivery_session_hit -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test session_key_has_lo0_filter_matches_packet_family -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
  • git diff --check HEAD^
  • commit-message line-length check: no lines over 72 chars

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-2 review on e36ba814

Verdict: MERGE-READY (pending Codex + Gemini r2 hostile verification, with one operational observation)

Fix-forward correctly addresses both r1 MAJORs with two-pronged M1 fix + fail-closed M2 semantic.

Verified fixes

M1 — Session-hit re-eval at poll_descriptor.rs:832-855. After session resolution sets session_ingress_zone and apply_nat_on_fabric, the new code runs BEFORE the TTL check:

if resolved.decision.resolution.disposition == ForwardingDisposition::LocalDelivery
    && apply_lo0_filter_action(
        worker_ctx.forwarding,
        worker_ctx.event_stream,
        Some(flow),
        meta,
        Some(resolved.metadata.ingress_zone),
        now_ns,
    )
{
    delete_session_map_entry_for_removed_session_with_origin(...);
    sessions.delete(&resolved.key);
    telemetry.dbg.local += 1;
    telemetry.dbg.policy_deny += 1;
    binding.scratch.scratch_recycle.push(desc.addr);
    continue;
}

ResolvedFlowSessionDecision extended with key and origin fields (shared_ops.rs:354-360) to enable the cleanup. Cleanup removes both the BPF session-map entry AND the userspace SessionTable entry — closes both Codex's and Gemini's r1 framings (BPF map vs userspace SessionTable hit).

M1 — Publish-time gate at session_glue/mod.rs:223-263. New helper:

fn session_key_has_lo0_filter(forwarding: &ForwardingState, key: &SessionKey) -> bool {
    match key.addr_family {
        family if family == libc::AF_INET as u8 =>
            forwarding.filter_state.lo0_filter_v4_fast.is_some(),
        family if family == libc::AF_INET6 as u8 =>
            forwarding.filter_state.lo0_filter_v6_fast.is_some(),
        _ => false,
    }
}

publish_worker_session_map_entry now takes forwarding: &ForwardingState. When the session would be PASS_TO_KERNEL AND session_key_has_lo0_filter is true, the entry is published as userspace-live (publish_live_session_entry) instead, forcing every packet through userspace where the session-hit re-eval can fire. All 3 caller sites in apply_worker_commands (lines 423, 509, 588) updated to thread forwarding.

M2 — Reject contract resolved as fail-closed at filter/mod.rs:39-42:

/// Request reject behavior.
///
/// Callers that cannot synthesize the reject packet must fail closed as a
/// silent drop and must not log that an ICMP/RST reject was generated.
Reject,

event_emit.rs:200-209 filter_action_to_rt_flow maps FilterAction::Reject → RT_FLOW_ACTION_DENY for filter logs (with explicit comment: "RT_FLOW must report deny rather than claiming an ICMP/RST reject happened"). Test filter_log_event_emit_fails_closed_reject_as_deny asserts the new behavior.

Codec test renamed test_filter_log_can_encode_reject_actiontest_policy_event_can_encode_reject_action with DataplaneEventKind::PolicyDeny — semantic split: filter-path Reject = silent drop logged as deny; policy-path Reject = real reject (synthesizes ICMP/RST) logged as reject. Wire byte RT_FLOW_ACTION_REJECT is now policy-only.

Go test eventstream_test.go updated: input-filter-log-rejectinput-filter-log-deny (uses dataplane.ActionDeny, asserts msg contains action=deny).

plan-1379 doc explicitly states:

"logged terminal actions emit source=input with deny RT_FLOW action because this userspace path fails closed without synthesizing an ICMP/RST reject packet yet."
"Local-delivery session hits also re-run lo0 enforcement, and HA-published local-delivery BPF map entries use the userspace-visible redirect path while a lo0 filter is configured so cached PASS_TO_KERNEL state cannot bypass the filter."

Hostile observation (non-blocking)

H1 — Retroactive PASS_TO_KERNEL flush on filter config change. The fix has two complementary paths:

  • Session-hit re-eval handles packets that reach userspace.
  • Publish-time gate prevents NEW PASS_TO_KERNEL when lo0 filter is configured at install time.

But: a session established BEFORE lo0 filter is configured publishes as PASS_TO_KERNEL. If the operator adds an lo0 Discard filter AFTER establishment, the existing PASS_TO_KERNEL entry remains. Packets for that session bypass userspace at the BPF XDP level and never trigger the session-hit re-eval. The new gate is never reached for these pre-existing sessions.

Mitigation paths:

  • Session refresh (HA sync or periodic re-publish) would re-trigger the publish-time gate and convert to userspace-live.
  • Long-lived local sessions without traffic refresh (idle BGP, idle SSH) remain bypassed until expiry.

Operationally, this is likely rare (filter config changes during established long-lived local sessions), but worth either:

  • (a) Adding a snapshot-apply handler that re-publishes existing sessions when lo0 filter config changes, OR
  • (b) Documenting the gap in plan-1379 as a known operator limitation.

I do not see a (a) implementation in this PR. The plan-1379 phrasing "while a lo0 filter is configured" is ambiguous — does it mean "configured at install time" (current behavior) or "configured at any time"? Tightening the doc would be the lowest-effort fix.

Test coverage gaps (non-blocking)

  • No test for publish-time gate: session_key_has_lo0_filter_matches_packet_family unit-tests only the helper. No test exercises publish_worker_session_map_entry with lo0 filter present to verify the userspace-live path is taken.
  • No lo0 reject path test: only discard is tested in both r1 and r2.

Recommendation

MERGE-READY with two minor follow-ups:

  • H1: tighten plan-1379 wording on "while a lo0 filter is configured" OR add retroactive re-publish on filter config change.
  • Add a lo0 reject path test for symmetric coverage.
  • Add an integration test exercising publish_worker_session_map_entry with lo0 filter present.

Awaiting Codex (task-mpcwbybx-2hrpuy) and Gemini Pro 3 (task-mpcwcn3d-7wkz7c).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-2 synthesis on e36ba814 — Claude self-correction; Codex found 3 MAJORs (Gemini pending)

Reviewer Verdict
Claude r2 (self-corrected) MERGE-NEEDS-MAJOR (was MERGE-READY)
Codex r2 MERGE-NEEDS-MAJOR (3 MAJORs + 1 MEDIUM)
Gemini Pro 3 pending (still in tool_call at 6m+; task-mpcwcn3d-7wkz7c)

Self-correction (Claude r2 → MAJOR)

My round-2 Claude review marked this MERGE-READY with the retroactive PASS_TO_KERNEL flush gap (H1) as non-blocking. Codex r2 elevates it to MAJOR and found two additional MAJORs I missed (shared_sessions cleanup, PolicyAction::Reject parallel divergence).

Verified Codex r2 MAJORs

M1 — Pre-existing PASS_TO_KERNEL not retroactively flushed (elevation of my H1).
refresh_runtime_snapshot() rotates the forwarding Arc but the worker update path only refreshes screen profiles and session timeouts. No code calls publish_worker_session_map_entry on filter config change, no session-map scan, no delete of old kernel-local entries. A LocalDelivery session installed BEFORE lo0 filter was configured continues bypassing userspace indefinitely until normal expiry.

The session-hit re-eval doesn't help because BPF PASS_TO_KERNEL packets never reach userspace. The publish-time gate only catches new sessions. Pre-existing sessions remain bypassed.

I called this "operationally rare" in my r2; Codex correctly treats it as a real security gap. Long-lived local sessions (BGP, SSH, idle TCP) are exactly the scenario where filter config changes happen during established sessions.

M2 — Session-hit lo0 drop doesn't clean shared_sessions (NEW).
At poll_descriptor.rs:843-852 the cleanup is:

delete_session_map_entry_for_removed_session_with_origin(...);
sessions.delete(&resolved.key);

SessionTable::delete() in session/mod.rs is literally pub fn delete(&mut self, key: &SessionKey) { self.remove_entry(key); } — just removes the local entry, no close delta emission, no shared_sessions removal. Shared cleanup is only in flush_session_deltas() or explicit remove_shared_session(...); the new path calls neither.

Consequence: lookup_session_across_scopes() can re-find the entry in shared_sessions and rematerialize it. A lo0 filter drop is not actually durable — the next packet can re-resolve the same session via the shared map.

M3 — PolicyAction::Reject has the same contract divergence as FilterAction::Reject had (NEW).
The M2 fix from r1 made FilterAction::Reject fail closed and log as deny. But PolicyAction::Reject is a separate type with the same issue:

  • policy.rs:38 defines PolicyAction::Reject
  • policy.rs:515 parses "reject" → PolicyAction::Reject
  • event_emit.rs:191-196 maps emit_policy_deny_event policy reject → RT_FLOW_ACTION_REJECT in syslog
  • poll_descriptor.rs:1944-1955 policy deny path just emits the event and sets disposition = PolicyDenied. No ICMP unreachable or TCP RST generated anywhere in the userspace policy path.

So a then reject policy rule emits RT_FLOW ... action=reject in syslog while the dataplane silently drops the packet. Same contract divergence as the filter path had pre-r1. Either:

  • (a) implement ICMP/RST synthesis on the policy reject branch, OR
  • (b) update PolicyAction enum doc + emit_policy_deny_event to map policy reject → DENY (parallel to the filter fix).

Verified Codex MEDIUM

Output filter terminal action: TxSelectionFilterResult drops the terminal action; output filters can log reject while forwarding unless a policer drops. The r1 fix closed input/lo0 filter terminals but didn't audit the output-filter terminal action.

Confirmed correct

  • Session-hit lo0 re-eval placement: correctly placed before TTL check on session-hit branch; cleans BPF map + local SessionTable.
  • Publish-time gate coverage: all 3 sites in apply_worker_commands (session_glue/mod.rs:422, 507, 586) thread forwarding and use session_key_has_lo0_filter.
  • FilterAction::Reject fix: filter path correctly maps Reject → DENY in syslog; tests updated.
  • Codec wire: RT_FLOW_ACTION_REJECT reserved for policy events; filter events forced to DENY when action == Reject.

Codex coverage gaps (matching my r2 observations)

  • No integration test for publish-time gate (only the unit test for session_key_has_lo0_filter helper).
  • The new poll_descriptor_lo0_filter_drops_cached_local_delivery_session_hit preinstalls only a SessionOrigin::LocalMiss session, not a BPF PASS_TO_KERNEL peer-synced local session.
  • No lo0 reject path test.
  • No Go-side end-to-end test for policy reject.

Codex scope correction

Codex reports actual diff size is 12 files, +645/-55, not the +335/-13 I quoted in the prompt. Files are still all filter/event/session-glue/docs/tests/log-formatting related.

Recommendation

Block on M1, M2, M3.

For M1 (retroactive flush): add a snapshot-apply handler that re-publishes existing kernel-local sessions when lo0 filter config changes (or any filter config change). OR document the gap as a known operator limitation in plan-1379 with explicit operator guidance (e.g., "filter config changes may not take effect on existing long-lived local sessions until session expiry; flush sessions manually via clear flow session").

For M2 (shared_sessions cleanup): the lo0 drop path needs remove_shared_session(...) in addition to sessions.delete(...), and should emit a close delta so HA peers see the removal. Otherwise the drop is not durable.

For M3 (policy reject parity): pick (a) or (b) per the filter-path precedent. Simplest is (b) — map policy reject → DENY in emit_policy_deny_event until ICMP/RST synthesis is wired.

Strongly consider (MEDIUM): audit output-filter terminal action handling.

Codex task: task-mpcwbybx-2hrpuy. Gemini Pro 3 still in tool_call at 6m+; will update synthesis when it lands.

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-2 Gemini Pro 3 verification — 2 Codex MAJORs INDEPENDENTLY VERIFIED, 1 not covered

Gemini Pro 3 (task-mpcwcn3d-7wkz7c) landed and verified the two most critical Codex r2 MAJORs with quoted-line evidence. Triple-review converged on MERGE-NEEDS-MAJOR.

Reviewer Verdict
Claude r2 (self-corrected) MERGE-NEEDS-MAJOR
Codex r2 MERGE-NEEDS-MAJOR (3 MAJORs + 1 MEDIUM)
Gemini Pro 3 r2 MERGE-NEEDS-MAJOR (2 FAILs, agreeing on top 2 MAJORs)

Gemini quoted verification

M1 — Retroactive PASS_TO_KERNEL flush gap (Gemini "FAIL (Blocker)").

The PR only prevents PASS_TO_KERNEL from being written while a filter is configured. If an active session is published to the BPF map as PASS_TO_KERNEL before the user configures the lo0 filter, no code sweeps or refreshes the session map on config change. The BPF map retains the PASS_TO_KERNEL action, meaning the kernel continues to return XDP_PASS for these packets. They indefinitely bypass the new lo0 filter without ever hitting the new userspace enforcement path.

Gemini directly refutes the commit message claim: "'BPF cached state cannot bypass userspace lo0 enforcement after a filter change.' This logic is flawed." Matches Codex's MAJOR #1 framing.

M2 — shared_sessions cleanup missing (Gemini "FAIL (Minor)").

The code calls sessions.delete(&resolved.key); which only removes the session from the local worker's cache. It completely neglects to remove the session from shared_sessions (or shared_nat_sessions), resulting in a split-brain state. The HA peer/background GC still holds the session as active while the local worker drops it.

Codex labeled this MAJOR (rematerialization via lookup_session_across_scopes); Gemini frames it as Minor (split-brain state). Both agree on the bug. The HA-split-brain framing is sharper than Codex's framing.

Gemini didn't probe (Codex unique)

M3 — PolicyAction::Reject contract divergence. Codex independently found that policy reject has the same log-vs-wire divergence the original M2 fix closed for the filter path. Gemini's hostile checks focused on the filter side and confirmed PolicyAction::Reject → RT_FLOW_ACTION_REJECT "as expected" — Gemini didn't drill into whether the policy path actually synthesizes the reject packet. Codex did and found no ICMP/RST generation at poll_descriptor.rs:1944-1955.

This finding stands on Codex's single-reviewer verification only, but it's structurally verifiable: emit_policy_deny_event maps policy reject to RT_FLOW_ACTION_REJECT, and grep finds no ICMP/RST synthesis on the PolicyDenied branch. Same class of bug as the original M2; same fix pattern applies.

MEDIUM — output filter terminal action. Codex flagged that TxSelectionFilterResult drops the terminal action; not covered by Gemini.

Gemini also confirmed

  • Session-hit re-eval placement: correctly positioned before TTL check, intercepts all LocalDelivery hits (flow cache fast-path doesn't handle LocalDelivery).
  • Publish-time gate coverage: all 3 sites in apply_worker_commands correctly threaded with forwarding.
  • NAT64 cross-family: gate and re-eval both consistently test pre-translation family.
  • FilterAction::Reject fix: only filter path responsible for synthesis confirms RT_FLOW_ACTION_DENY mapping.
  • Test exercises HIT path: poll_descriptor_lo0_filter_drops_cached_local_delivery_session_hit pre-installs the session before pushing the descriptor.

Both reviewers flag test gaps

  • No integration test verifies the publish-time gate actually drops PASS_TO_KERNEL when lo0 is configured.
  • No lo0 reject path test (only discard).

Note on Codex/Gemini scope difference

Codex measured the PR vs parent commit (a911d147^..e36ba814) = 12 files, +645/-55. Gemini matched my prompt's incremental r1→r2 numbers (+335/-13). Both are correct measurements of different bases. The r2-only delta is +335/-13; the full PR delta is +645/-55. No scope concern either way.

Recommendation (unchanged)

Block on M1, M2, M3:

  • M1 (retroactive PASS_TO_KERNEL flush): add a snapshot-apply hook that re-publishes existing kernel-local sessions when lo0 filter config transitions from empty → configured. OR document the gap explicitly in plan-1379 as a known operator limitation (e.g., "filter config changes do not take effect on existing established local sessions; use clear flow session to force re-evaluation").
  • M2 (shared_sessions cleanup): session-hit lo0 drop must remove from shared_sessions (and shared_nat_sessions) in addition to local SessionTable, and emit a close delta so HA peers see the removal. Otherwise the drop is not durable across worker boundaries / HA.
  • M3 (policy reject contract): mirror the filter fix — map PolicyAction::Reject → RT_FLOW_ACTION_DENY in emit_policy_deny_event until ICMP/RST synthesis is wired, OR implement reject packet synthesis on the PolicyDenied branch.

Strongly consider:

  • MEDIUM: audit output-filter terminal action handling.
  • Add integration test for publish-time gate.
  • Add lo0 reject path test.

Codex task: task-mpcwbybx-2hrpuy. Gemini Pro 3 task: task-mpcwcn3d-7wkz7c.

Not merging — author's decision.

Copilot AI review requested due to automatic review settings May 19, 2026 17:38
@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from e36ba81 to 053537c Compare May 19, 2026 17:38
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-3 fix-forward on 053537ce

Addressed the three round-2 MAJOR findings from Codex/Claude.

Fixes

  1. Retroactive LocalDelivery PASS_TO_KERNEL sessions

    • Added a forwarding-Arc reload hook that scans existing LocalDelivery sessions when lo0 filters are present and republishes matching sessions into the helper-visible session map.
    • This closes the stale-session gap where sessions created before a lo0 filter was added could keep bypassing forever.
  2. Shared-session cleanup on cached lo0 drops

    • Replaced the bare local sessions.delete() path with terminal filtered-session cleanup.
    • The drop path now removes the local session, BPF/session-map state, shared session/NAT/forward-wire indexes, peer state, and emits the close delta.
    • Added regression coverage proving the cached hit removes shared_sessions, shared_nat_sessions, shared_forward_wire_sessions, and emits the matching close delta.
  3. PolicyAction::Reject log/dataplane contract divergence

Validation

Passed focused and related coverage:

RUSTFLAGS='-Awarnings' cargo test policy_deny_event_emit_fails_closed_reject_as_deny -- --nocapture
RUSTFLAGS='-Awarnings' cargo test republish_local_delivery_sessions_for_lo0_filter_selects_existing_hits -- --nocapture
RUSTFLAGS='-Awarnings' cargo test poll_descriptor_lo0_filter_drops_cached_local_delivery_session_hit -- --nocapture
RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
RUSTFLAGS='-Awarnings' cargo test policy_deny_event_emit -- --nocapture
RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'

Final hygiene:

git diff --check HEAD^
commit message line-length check <= 72 chars

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comment on lines 1279 to +1284
forwarding = new_forwarding;
let republished = republish_local_delivery_sessions_for_lo0_filter(
&sessions,
session_map_fd,
&forwarding,
);
resolved.origin,
);
telemetry.dbg.local += 1;
telemetry.dbg.policy_deny += 1;
)
{
telemetry.dbg.local += 1;
telemetry.dbg.policy_deny += 1;
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-3 review on 053537ce

Verdict: MERGE-READY (pending Codex + Gemini r3 hostile verification)

All 3 r2 MAJORs have direct, surgical fixes with end-to-end test coverage.

Verified fixes

M1 — Retroactive PASS_TO_KERNEL republish on filter config change.

New republish_local_delivery_sessions_for_lo0_filter at session_glue/mod.rs:235-258:

sessions.iter_with_origin(|key, decision, metadata, _origin| {
    if metadata.is_reverse
        || decision.resolution.disposition != ForwardingDisposition::LocalDelivery
        || !session_key_has_lo0_filter(forwarding, key)
    {
        return;
    }
    if session_map_fd >= 0 {
        let _ = publish_live_session_entry(...);
    }
    republished += 1;
});

Called from worker/mod.rs:1278-1290 inside the if let Some(new_forwarding) = load_arc_if_changed(...) branch — fires whenever the forwarding Arc rotates (which includes filter state changes). For every LocalDelivery+forward-direction session with lo0 filter present in the new forwarding state, the BPF map entry is re-published as userspace-live, overwriting any PASS_TO_KERNEL entry.

New test republish_local_delivery_sessions_for_lo0_filter_selects_existing_hits verifies the helper selects 1 session when filter is configured and 0 after the filter is removed. session_key_has_lo0_filter is now pub(super) so tests can call it.

M2 — Full terminal cleanup on session-hit lo0 drop.

New delete_terminal_filtered_session at session_glue/mod.rs:303-345 performs all 5 cleanup steps r2 reviewers cited:

  1. delete_session_map_entry_for_removed_session_with_origin — BPF map
  2. sessions.delete(key) — local SessionTable
  3. remove_shared_session(...) — shared_sessions, shared_nat_sessions, shared_forward_wire_sessions, shared_owner_rg_indexes
  4. replicate_session_delete(peer_worker_commands, key) — peer worker delete
  5. sessions.emit_close_delta_with_origin(...) — close delta for HA sync

New helper SessionTable::emit_close_delta_with_origin at session/mod.rs:951-967 correctly skips is_reverse sessions (matches existing push_delta patterns).

poll_descriptor.rs:843-859 now calls delete_terminal_filtered_session instead of the partial delete_session_map_entry + sessions.delete combo.

Enhanced test poll_descriptor_lo0_filter_drops_cached_local_delivery_session_hit at tests.rs:3392-3460:

  • Pre-installs session in BOTH local SessionTable AND shared_sessions (via publish_shared_session)
  • Drains the initial open delta to clear state
  • Triggers poll_descriptor → filter discard
  • Asserts: sessions.len() == 0, shared_sessions empty, shared_nat_sessions empty, shared_forward_wire_sessions empty, drain_deltas returns 1 Close delta with the right key

This proves the full cleanup path end-to-end, addressing exactly Gemini's r2 "split-brain" finding.

M3 — Policy reject fail-closed.

event_emit.rs:192-197 policy_action_to_rt_flow now maps:

PolicyAction::Reject => RT_FLOW_ACTION_DENY,  // fail-closed comment

Mirrors the filter-path fix from r1. New test policy_deny_event_emit_fails_closed_reject_as_deny asserts event.action == RT_FLOW_ACTION_DENY for a PolicyAction::Reject emission.

plan-1379 updated:

"Policy reject is logged as deny until the userspace PolicyDenied path synthesizes ICMP/RST reject packets."

Matches the existing filter-path doc — both contracts now explicitly fail-closed with deferred ICMP/RST synthesis.

Hostile observations (non-blocking)

H1 — Republish triggers on every forwarding-Arc rotation, not just lo0 filter transitions. The load_arc_if_changed check fires on any forwarding change (FIB updates, zone updates, policy updates, filter updates). Each rotation triggers an O(N) scan of sessions. For typical operator scenarios (few config changes, modest LocalDelivery session count) this is fine. With many local-delivery sessions and frequent forwarding rotation, the cost could become noticeable. Minor optimization opportunity: track lo0_filter_v4_fast/v6_fast separately from full forwarding Arc and only republish when those specifically change.

H2 — Reverse direction not handled. If lo0 filter is configured then later removed, existing LocalDelivery sessions stay userspace-live (the function only publishes when session_key_has_lo0_filter == true). No code demotes back to PASS_TO_KERNEL. This is a permanent userspace-routing cost for those sessions — minor performance regression for filter-add-then-remove scenarios, not a correctness issue.

H3 — MEDIUM from r2 not addressed. Codex's r2 MEDIUM finding ("output filter terminal action drops the action in TxSelectionFilterResult, can log reject while forwarding") is not touched in r3. Author chose to defer; reasonable since it's a separate path with different code structure. Worth filing as a follow-up.

H4 — publish_live_session_entry overwrite semantics. The fix relies on publish_live_session_entry overwriting an existing PASS_TO_KERNEL entry at the same key. BPF maps typically use bpf_map_update_elem with BPF_ANY which overwrites. Worth Codex/Gemini verification that this is actually the case for the session map.

Recommendation

MERGE-READY pending Codex + Gemini r3 verification of:

  • M1 retroactive republish actually overwrites PASS_TO_KERNEL (H4).
  • M2 cleanup path test coverage exercises all 5 steps (verified by my reading; reviewers should independently confirm).
  • M3 PolicyAction::Reject is now consistent everywhere (no other code path overrides).

Strongly consider (non-blocking):

  • File a follow-up for MEDIUM (output filter terminal action).
  • Optimize republish trigger to filter-state-transition only (H1).

Awaiting Codex (task-mpcxo2ir-cm9ttp) and Gemini Pro 3 (task-mpcxot2g-rsz0l3).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-3 partial synthesis on 053537ce — Claude self-correction; Gemini flags 3 MAJORs (Codex pending)

Reviewer Verdict
Claude r3 (self-corrected) MERGE-NEEDS-MAJOR (was MERGE-READY)
Codex r3 running 6m+ (task-mpcxo2ir-cm9ttp)
Gemini Pro 3 r3 MERGE-NEEDS-MAJOR (3 FAILs/MAJORs)

Self-correction (Claude r3 → MAJOR)

My round-3 review marked this MERGE-READY with H1 (republish triggers on every forwarding change), H2 (reverse direction not handled), and H4 (overwrite semantics) as non-blocking observations. Gemini r3 elevated H1 and H2 to MAJOR, confirmed H4 (verified via libbpf BPF_ANY), and found NAT64 cross-family edge case I didn't probe.

Verified Gemini r3 findings

Verified PASS (5/10):

  • publish_live_session_entry overwrite: cascades to libbpf_sys::bpf_map_update_elem with BPF_ANY — reliably overwrites PASS_TO_KERNEL.
  • M2 cleanup: all 5 components present and called in the right order at session_glue/mod.rs:303-345.
  • M3 PolicyAction::Reject: consistent everywhere; policy.rs parses "reject" → PolicyAction::Reject; poll_descriptor.rs treats Deny+Reject identically via if let PolicyAction::Permit ....
  • Test cleanup assertions: pre-installs to both local + shared, asserts all 4 maps cleared and close delta emitted.
  • emit_close_delta_with_origin skips is_reverse: guard at line 956.
  • HA peer race: mitigated by apply_lo0_filter_action re-evaluating on session-hit per worker — even if peer B doesn't see delete command yet, it independently drops + cleans up.

Verified MAJOR (3):

Gemini M1 (republish O(N) on every forwarding-Arc rotation). Quote at worker/mod.rs:1278-1290 confirms placement inside load_arc_if_changed branch. Forwarding rotates on snapshot install, FIB updates, HA config — not just lo0 filter changes. With many LocalDelivery sessions, every unrelated rotation incurs an O(N) iteration. Gemini calls this a "jitter bomb under dynamic conditions."

My read: forwarding rotation is typically rare (config events, batched netlink), but the over-broad trigger is real. Worth either:

  • (a) Tracking lo0_filter_v4_fast / lo0_filter_v6_fast separately and only iterating when those change.
  • (b) Documenting the operational expectation that forwarding rotations are rare.

Reasonable to flag as MAJOR if operator-environment forwarding rotates frequently; downgradeable to MINOR if it doesn't.

Gemini M3 (reverse direction not handled — lo0 filter removal). Quote at session_glue/mod.rs:242 shows the early return when !session_key_has_lo0_filter. If filter is removed after sessions were demoted to userspace-live, those sessions stay userspace-live forever. Permanent performance regression for long-lived local sessions during filter-add-then-remove cycles.

My read: real but bounded. Operators rarely add-then-remove lo0 filters; sessions eventually age out. Author can either implement the reverse (publish_kernel_local_session_key on filter removal) or document as accepted permanent demotion.

Gemini M10 (NAT64 cross-family bypass) — NEW MAJOR I didn't probe. session_key_has_lo0_filter checks addr_family of the pre-translation session key. A NAT64 session arrives v6 (key=AF_INET6), gets translated to v4 for local delivery. If operator only configured v4 lo0 filter (typical since host lo0 is v4), lo0_filter_v6_fast.is_some() returns false → no lo0 detected → PASS_TO_KERNEL bypass for the v6 session despite the v4 lo0 filter being configured for the post-translation destination.

My read: this is operator-surprising but arguably expected Junos semantics (filters are per-family, NAT64 changes family). However, if the threat model is "block all local-delivery via lo0 filter", a single-family filter doesn't catch NAT64. Either:

  • (a) Apply lo0 filter check to the POST-translation family in the NAT64 path.
  • (b) Document explicitly that NAT64 + single-family lo0 filter has a bypass — operator must configure both families.

If choice is (b), I'd recommend a commit warning or status message when the operator configures a v4-only lo0 filter while NAT64 is active.

Recommendation

Block on Gemini's 3 MAJORs:

  • Gemini M1 (republish trigger scope): narrow trigger to lo0-filter-state-transition only, OR document operational cost expectation.
  • Gemini M3 (reverse demotion): implement reverse republish on filter removal, OR document permanent demotion.
  • Gemini M10 (NAT64 cross-family): either fix the filter check to consider post-translation family, OR add operator-facing warning + plan-1379 doc.

Strongly consider (Gemini caught): add integration test exercising the real BPF map update (currently unit test uses fd=-1).

Codex r3 still running; will update synthesis when it lands. Self-correction documented above.

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-3 final synthesis on 053537ce — Codex MINOR vs Gemini MAJOR split on residuals

Reviewer Verdict
Claude r3 (self-corrected) MERGE-NEEDS-MAJOR
Codex r3 MERGE-NEEDS-MINOR
Gemini Pro 3 r3 MERGE-NEEDS-MAJOR

All reviewers converge: 3 r2 MAJORs materially fixed

Codex and Gemini both verify with quoted-line evidence:

  • M2 cleanup (full terminal cleanup across local + 3 shared maps + owner_rg + peer + close delta): complete at session_glue/mod.rs:303-345.
  • M3 PolicyAction::Reject fail-closed: event_emit.rs:191 maps Reject → RT_FLOW_ACTION_DENY. RT_FLOW_ACTION_REJECT remains only as a producer-side const + codec compat test.
  • M1 retroactive republish overwrites PASS_TO_KERNEL: publish_live_session_entry → bpf_map_update_elem(BPF_ANY) reliably overwrites the BPF map entry.
  • emit_close_delta_with_origin skips reverse: verified.
  • HA peer race: mitigated by per-worker apply_lo0_filter_action on session-hit.

Codex vs Gemini severity split on residuals

Residual gap Codex r3 Gemini r3 Claude read
M1 republish scope (every forwarding rotation) MINOR MAJOR MINOR-leaning
Reverse demotion (lo0 filter removal) MINOR/MEDIUM MAJOR MINOR
Output filter terminal action (r2 MEDIUM open) MEDIUM not flagged real bug class
NAT64 cross-family bypass not flagged MAJOR doc-required

M1 republish scope. Codex: "fires on every forwarding snapshot change observed by the worker, not every loop tick." Gemini: "jitter bomb under dynamic conditions." Both correct framings. Forwarding rotations are typically rare (config events, batched netlink), but environments with frequent FIB updates or HA transitions would see O(N) scans more often. I lean MINOR for the default case with a guidance note for pathological scenarios.

Reverse demotion. Codex: "no bulk demotion for existing live entries after filter removal." Gemini: "permanent userspace helper traversal until session ages out naturally." Both correct. Performance regression, not correctness. MINOR with optional follow-up.

Output filter terminal action (Codex's MEDIUM). This is a REAL bug class still open from r2:

Output tx evaluation records log_match and policer_drop, but does not carry FilterAction::Discard/Reject into drop; build_live_forward_request_from_frame emits the output filter log, then only returns None when cos.drop is true. That means a logged output discard/reject can produce a deny-looking event while the packet still forwards.

Same class of bug as the original M2/M3 was for filter input + lo0. This means there's still ONE filter path where syslog says "deny" while wire behavior is "forward". Cited files: filter/engine.rs:379, tx/cos_classify.rs:321, forward_request.rs:152.

NAT64 cross-family bypass (Gemini-only). session_key_has_lo0_filter checks pre-translation addr_family. A v6 NAT64 session destined for v4 lo0 with v4-only filter configured → no lo0 detected → PASS_TO_KERNEL bypass. Operator-surprising. Either:

  • (a) Apply filter check to post-translation family in NAT64 path.
  • (b) Document as known limitation in plan-1379 + commit warning when v4-only lo0 filter is configured while NAT64 rules exist.

Codex didn't probe this; Gemini found it on hostile check #10. Doc-required at minimum.

Test gaps flagged by both reviewers

  • No integration test exercising real BPF map fd for the republish helper (unit test uses fd=-1).
  • Enhanced cached-hit test doesn't verify peer command replication, owner-RG removal with owner > 0, or BPF map deletion (peer commands empty, owner_rg = 0, fds = -1 in the test).

Recommendation

I'm splitting Codex/Gemini's verdicts — neither extreme is right.

Block on the output filter terminal action gap (Codex's MEDIUM): this is a real same-class bug as the original M3 was. Should be fixed in this PR for completeness, OR explicitly tracked as a follow-up with a plan-1379 doc note that says output-filter logged-deny does NOT imply drop until that path is fixed. Without the doc note, operators reading the new RT_FLOW_FILTER_LOG events will misinterpret output-filter deny logs.

Strongly consider (Gemini-flagged):

  • NAT64 cross-family: at minimum, plan-1379 doc note explaining single-family lo0 filters + NAT64 produces bypass; operator must configure both families.
  • Reverse demotion: either implement publish_kernel_local_session_key on filter removal OR doc as accepted permanent demotion (low operational impact).

Strongly consider (both reviewers flagged):

  • Real-BPF-map integration test for republish.
  • Owner_rg > 0 / peer command / BPF map deletion coverage in the cached-hit test.

Defer-acceptable (Gemini-elevated to MAJOR but Codex-minor):

  • Republish scope (forwarding-rotation trigger). Optimize only if operational evidence shows it.

Codex task: task-mpcxo2ir-cm9ttp. Gemini Pro 3 task: task-mpcxot2g-rsz0l3.

Not merging — author's decision.

@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from 053537c to d0f863b Compare May 19, 2026 18:22
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-4 fix-forward on d0f863b3

Addressed the round-3 residual blocker and documented the remaining lo0 caveats.

Fixes

  1. Output filter terminal actions now actually drop

    • Added has_terminal_action_terms to compiled filters.
    • Output filters with discard or reject terms now force TX-path evaluation even when the term has no log, count, policer, forwarding-class, or DSCP rewrite side effect.
    • TX-selection results now carry the matched FilterAction; live and cached TX descriptors drop before enqueue when the output term is terminal.
    • build_live_forward_request_from_frame now respects a precomputed cached drop bit instead of treating cached TX selection as always forwardable.
  2. Regression coverage

    • Added selector-level coverage for a no-log output discard term. This pins the case that previously skipped TX evaluation completely.
    • Added live forward-request coverage proving a logged output discard emits a FILTER_LOG deny and returns None instead of forwarding.
  3. Docs aligned with behavior

    • Updated userspace-dp/src/afxdp/README.md and userspace-dp/src/filter/README.md for terminal output-filter enforcement.
    • Updated plan-1379-dataplane-events.md to state that output discard/reject is terminal in TX selection.
    • Documented the r3 lo0 residuals explicitly: filter removal can leave already-demoted sessions helper-visible until ageout, and NAT64 deployments need both inet and inet6 lo0 filters for uniform local-delivery filtering.

Validation

Passed:

RUSTFLAGS='-Awarnings' cargo test resolve_cos_tx_selection_drops_terminal_output_filter_without_log -- --nocapture
RUSTFLAGS='-Awarnings' cargo test build_live_forward_request_from_frame_drops_logged_output_filter_discard -- --nocapture
RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
RUSTFLAGS='-Awarnings' cargo test output_filter -- --nocapture
RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
git diff --check HEAD^
commit message line-length check <= 72 chars

Note: I intentionally avoided keeping broad cargo fmt churn; the pushed diff is scoped to the behavior, docs, and tests above.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-4 review on d0f863b3

Verdict: MERGE-READY (pending Codex + Gemini r4 hostile verification, with two doc-only resolutions noted)

Addresses the Codex r3 MEDIUM (output filter terminal action). Reverse demotion + NAT64 cross-family resolved by explicit doc acknowledgment rather than code fixes — both are acceptable trade-offs if operator-facing guidance is clear.

Verified fixes

Output filter terminal action enforcement (Codex r3 MEDIUM). Full plumbing:

  1. Filter struct gains has_terminal_action_terms: bool (filter/mod.rs:113-118) — computed by compiler as terms.iter().any(|t| t.action != FilterAction::Accept) at filter/compiler.rs:107-110.

  2. Eligibility check at 4 sites now includes the flag:

    • filter/compiler.rs:147+ (input filter registration)
    • filter/compiler.rs:191+ (output filter registration)
    • tx/cos_classify.rs:81 (cached eval eligibility)
    • tx/cos_classify.rs:294 (live eval eligibility)

    Filters with ONLY a terminal action (no log, no counter, no policer) no longer get short-circuited as non-tx-affecting.

  3. TxSelectionFilterResult + CachedTxSelectionFilterResult carry action: FilterAction (filter/mod.rs:484-510). Default impls explicit (no longer #[derive(Default)] since FilterAction isn't Default).

  4. resolve_cos_tx_selection at tx/cos_classify.rs:155 (cached path) and :323 (live path):

    drop: output_result.action != crate::filter::FilterAction::Accept
    // or, on live path:
    let mut drop = output_result.policer_drop || output_result.action != crate::filter::FilterAction::Accept;

    Drop ORs the action terminal-ness with policer drop. All CoSTxSelection return sites carry the drop field.

  5. flow_cache.rs:30 adds CachedTxSelectionDescriptor.drop: bool. Cache install at poll_descriptor.rs:716 stamps cached_descriptor.tx_selection.drop into the descriptor. Cached fast path at :493:

    if cached_descriptor.tx_selection.drop || policer_action.drop {
        binding.scratch.scratch_recycle.push(desc.addr);
        continue;
    }
  6. forward_request.rs:125 propagates selection.drop (was hardcoded false):

    .map(|selection| CoSTxSelection {
        queue_id: selection.queue_id,
        dscp_rewrite: selection.dscp_rewrite,
        drop: selection.drop,  // was: false
        filter_log: selection.filter_log,
    })

Verified tests

build_live_forward_request_from_frame_drops_logged_output_filter_discard (tests.rs +116 LOC):

  • Builds production scenario: WAN egress (reth0.0) with output filter wan-drop containing then discard log for TCP port 443.
  • Calls build_live_forward_request_from_frame with a packet matching the filter.
  • Asserts req.is_none() — packet does NOT enqueue for TX.
  • Decodes event: kind == FilterLog, action == 0 (DENY), reason == FilterLogSource::Output.wire_reason() — log correctly emitted with output source.

resolve_cos_tx_selection_drops_terminal_output_filter_without_log (tx/cos_classify_tests.rs +52 LOC):

  • Filter with then discard (no log) — important because without the new has_terminal_action_terms flag, this filter wouldn't have passed the eligibility check.
  • Asserts selection.drop == true, queue_id == None, filter_log == None.

Doc-only resolutions (acceptable trade-offs)

Reverse demotion (plan-1379:73-78):

"If a lo0 filter is removed, already-demoted local-delivery sessions can stay on the userspace-visible helper path until they age out; that is a bounded performance cost, not a forwarding bypass."

Option (b) from my r3 synthesis: accept permanent demotion until session expiry, document operator impact. Reasonable for a typical environment where filter-add-then-remove is rare. Operators with long-lived local sessions (BGP, SSH) may want a clear flow session workaround — not in this PR.

NAT64 cross-family (plan-1379:78-79):

"NAT64 local-delivery traffic is evaluated using the observed packet family, so operators that use NAT64 and want uniform local-delivery filtering must configure both inet and inet6 lo0 filters."

Option (b) from my r3 synthesis: doc-required known limitation. Junos semantics (per-family filters) are preserved. Operators get a doc note instead of an automatic cross-family enforcement.

Hostile observations (non-blocking)

H1 — Output Reject coverage. Tests cover output then discard. No test for output then reject. Since both go through the same action != Accept gate, behavior should be identical, but symmetric coverage would close the gap.

H2 — has_terminal_action_terms field on Filter struct. Filter is presumably userspace-only (compiled at filter::compiler runtime), not cross-language wire-format. If it IS serialized to Go side, the new field needs Go-side mirror. Worth a quick verification.

H3 — Republish scope still O(N) on every forwarding rotation. Gemini r3 elevated this to MAJOR; r4 did NOT touch the trigger. PR description says "documented" — but doc covers reverse demotion, not republish scope. The republish triggers on every forwarding Arc rotation (not just lo0 filter transitions). Either accept Codex's MINOR framing OR add the predicate.

Recommendation

MERGE-READY with the understanding that:

  • The doc-only resolutions for reverse demotion + NAT64 are operator-facing limitations, not bugs. Plan-1379 + filter README + afxdp README clearly note them.
  • Gemini's republish-scope MAJOR from r3 was not addressed; Codex's MINOR framing prevails.

Strongly consider in this PR (non-blocking):

  • H1: add output reject path test for symmetric coverage.
  • H2: verify Filter struct serialization scope (purely Rust-internal? cross-language?).

Defer: republish scope optimization (only worth it if operational evidence shows it).

Awaiting Codex (task-mpd0onlk-s1dqk0) and Gemini (task-mpd0pd2g-0nt6m9, on gemini-3.1-pro-preview).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-4 partial synthesis on d0f863b3 — Claude self-correction; Codex 1 MAJOR (Gemini empty output)

Reviewer Verdict
Claude r4 (self-corrected) MERGE-NEEDS-MAJOR (was MERGE-READY)
Codex r4 MERGE-NEEDS-MAJOR (1 MAJOR — cached DSCP bypass)
Gemini Pro 3.1 r4 empty output (only "Task completed."; re-dispatching)

Self-correction (Claude r4 → MAJOR)

My round-4 Claude review marked this MERGE-READY focused on whether the output filter terminal action enforcement was plumbed through TX selection. I verified the plumbing but did NOT probe the cached-flow invariants. Codex r4 found a sharp cached-flow bypass specific to DSCP-match terminal filters.

Verified Codex r4 MAJOR

Cached output-filter terminal decisions are stale for DSCP-changing flows.

Verified against the code at d0f863b3:

  • Filter engine matches on DSCP: filter/engine.rs:17,32,45,58,... — eval functions take dscp: u8 and use it in term_matches.
  • SessionKey / flow cache key does NOT include DSCP: session/key.rs:10-16 shows fields are only addr_family, protocol, src_ip, dst_ip, src_port, dst_port.
  • Cache install at tx/cos_classify.rs:158:
    drop: output_result.action != crate::filter::FilterAction::Accept,
    output_result was evaluated using the FIRST packet's meta.dscp. The cached drop value is frozen against that DSCP.
  • Cached fast path at poll_descriptor.rs:496:
    if cached_descriptor.tx_selection.drop || policer_action.drop {
        ...
    }
    No DSCP re-evaluation — just trusts the stored drop bit.

Concrete bypass scenario:

  1. Operator config: output filter wan-drop term drop-ef from { dscp ef; destination-port 443; } then discard log;
  2. First packet of flow has DSCP=0 → does NOT match (dscp != ef=46) → cache stamped drop=false.
  3. Later packet on same 5-tuple has DSCP=46 → SHOULD hit terminal discard → but cached path reads drop=falsepacket forwards.

This is the SAME class of bug the r4 fix closed for non-DSCP-conditional output discard, except for DSCP-conditional terms. Not a CoS misclassification anymore — security terminal-action bypass on cached flows.

Same root cause applies to ANY per-packet-variable filter match input not in the cache key: TOS, IP options, fragment flag, possibly others. Need an audit of term_matches_v4/v6 for all match fields and which ones aren't in the cache key.

Codex recommended fixes

  1. Make filters with DSCP-conditional terminal terms non-cacheable.
  2. Include DSCP in cache validity key.
  3. Re-evaluate terminal/log-relevant output filter terms on cached hits when ANY packet-varying match input is present.

Option (3) is the most general — restores the input filter's eval-every-packet semantic for cached output filters that have variable-input terms.

Codex confirmed correct

  • CoSTxSelection drop propagation: all 6 return sites at cos_classify.rs:362-415 carry the drop field correctly.
  • Cached install + replay: works for the non-DSCP-conditional case (the obvious-bug case the r4 fix closed).
  • has_terminal_action_terms gate: updated at all 4 sites (compiler.rs:108-110, 150-154, 195-199; cos_classify.rs:78-82, 291-295; forwarding_build.rs:421-436).
  • Input vs output race: input terminal action wins (runs at poll_descriptor.rs:1071-1088 before route/policy/output TX).
  • Policer OR: correct (policer_drop || action != Accept).
  • Filter struct compat: Filter is internal compiled Rust, not the serialized snapshot. FirewallTermSnapshot.action defaults via serde + parse maps unknown/empty to Accept.

Codex confirmed gaps from my r4

  • Output Reject test missing: only Discard is tested. (My Claude r4 H1.)
  • Reverse demotion doc: acceptable as documented perf caveat; manual clear flow session CLI would be operationally useful but not blocking.

Codex scope correction

Diff is 23 files, +1094/-75. The extra files are still in the event/filter/lo0/session cleanup area; I did not see unrelated scope creep.

I quoted 14 files +245/-15 (r3→r4 incremental). Codex measured PR vs parent commit. Both correct, different bases.

NAT64 doc wording

Codex notes the plan-1379 phrase "NAT64 local-delivery traffic is evaluated using the observed packet family" — actual code at session_glue/mod.rs:223-230 keys off SessionKey.addr_family (pre-translation). Codex: "The doc is acceptable only if 'observed' means pre-NAT64; I would clarify that wording."

Recommend tightening to: "evaluated using the pre-translation (wire-observed) packet family."

Gemini Pro 3.1 status

task-mpd0pd2g-0nt6m9 ran for 2m 10s with gemini-3.1-pro-preview model but produced only "Task completed." with no substantive review content. The 5 tool calls in the log suggest the model did work but the final output was minimal. Re-dispatching with a tighter prompt asking for explicit verdict + per-check quotes.

Recommendation

Block on the cached DSCP bypass. Pick one of Codex's three fixes:

  • Recommended: re-evaluate terminal-action output filter terms on cached hits when DSCP (or other per-packet-variable inputs) is part of any term's match criteria. Lowest invasiveness.
  • Alternative: make filters with DSCP-conditional terminal terms non-cacheable (mark at compile time, skip cache install).
  • Alternative: extend SessionKey/flow cache key with DSCP.

Strongly consider (Codex + Claude flagged):

  • Audit term_matches_v4/v6 for all match fields that vary per-packet but aren't in cache key (DSCP, possibly TOS, fragment, IP options).
  • Add output Reject path test.
  • Tighten NAT64 doc wording to clarify "observed" = "pre-translation".

Codex task: task-mpd0onlk-s1dqk0. Gemini re-dispatch pending.

Not merging — author's decision.

Copilot AI review requested due to automatic review settings May 19, 2026 19:37
@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from d0f863b to 4567c9f Compare May 19, 2026 19:37
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-5 update on 4567c9f4

Closed the cached-DSCP output-filter bypass Codex found in round 4.

What changed

  • Added a compiled-filter capability bit for DSCP match terms.
  • Flow-cache insertion now refuses caching when the egress output filter has DSCP match terms.
  • This keeps DSCP-sensitive output filters on the live TX-selection path because DSCP is per-packet metadata and is not part of the 5-tuple session/flow-cache key.
  • Added a regression test proving a DSCP-matched output filter declines cache insertion.
  • Updated AF_XDP/filter module docs and the Feature gap: dataplane events (PolicyDeny, ScreenDrop, FilterLog) not emitted by userspace-dp #1379 plan notes.
  • Tightened the NAT64 lo0 caveat wording from observed family to pre-translation packet family.

Validation

  • RUSTFLAGS='-Awarnings' cargo test from_forward_decision_skips_cache_for_dscp_matched_output_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test output_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
  • git diff --check HEAD^
  • Commit-message line-length audit: no lines over 72 chars.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated no new comments.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-5 review on 4567c9f4

Verdict: MERGE-NEEDS-MAJOR (pending Codex + Gemini r5 hostile verification)

The r4 cached-output-DSCP bypass is correctly fixed via cache-decline. But the SAME class of bug exists for INPUT filters with DSCP-match terms — r5 only patched the output side.

Verified fixes (output side)

Cached output-filter DSCP bypass closed.

  • filter/mod.rs:117 — new Filter.has_dscp_match_terms: bool.
  • filter/compiler.rs:111has_dscp_match_terms: terms.iter().any(|t| t.dscp_match_enabled).
  • filter/engine.rs:1030-1041 — new interface_output_filter_has_dscp_match(state, ifindex, is_v6) helper.
  • flow_cache.rs:281-291from_forward_decision early-returns None when the egress output filter has DSCP-match terms.
  • New regression test from_forward_decision_skips_cache_for_dscp_matched_output_filter at flow_cache_tests.rs:643-689: builds a filter with dscp_values: vec![46] + then discard, asserts entry.is_none().

Doc updates:

  • plan-1379:58-63 documents DSCP-matched output filters bypassing the flow cache deliberately.
  • NAT64 wording tightened from "observed packet family" → "pre-translation packet family" (plan-1379:81-83).
  • afxdp/README and filter/README updated.

NEW MAJOR — INPUT filter DSCP cache bypass (same class as the r4 bug)

The r5 fix only covers OUTPUT filters. The same bypass exists on the INPUT side and is not addressed:

Verified at 4567c9f4:

  • Input filter terminal action ONLY runs on slow path (poll_descriptor.rs:1071-1088):

    let input_filter_eval = evaluate_non_pbr_input_filter(...);
    if input_filter_eval.action != crate::filter::FilterAction::Accept {
        ...
        binding.scratch.scratch_recycle.push(desc.addr);
        continue;
    }

    This runs ONLY when a packet reaches the slow path (cache miss).

  • Cached fast path at poll_descriptor.rs:460-499 does:

    1. record_filter_counter
    2. apply_cached_three_color_policers
    3. emit_cached_input_filter_log (just emits a STORED FilterLogMatch from cached_descriptor.input_filter_log — does NOT re-evaluate the input filter against the current packet)
    4. emit_cached_output_filter_log
    5. drop check: cached_descriptor.tx_selection.drop || policer_action.drop — only output-side drop, NO input-side drop
  • cached_descriptor.input_filter_log is frozen against the first-packet's DSCP at cache install time (poll_descriptor.rs:2289 calls evaluate_non_pbr_input_filter_log_only once).

Concrete bypass scenario (input):

  1. Input filter from dscp ef destination-port 443 then discard log on ingress interface.
  2. First packet of TCP flow has DSCP=0 → does NOT match (dscp=0 ≠ ef=46) → cache install with input_filter_log = None, no terminal drop captured.
  3. Second packet on same 5-tuple has DSCP=46 → cache hit → reads input_filter_log = None → NO log emit, NO drop → packet forwards.

Same class of bug as the r4 output bug. Same security implications. The r5 fix needs to mirror the output-side cache-decline for INPUT filters too.

Hostile observations

H1 — Where to apply the symmetric input fix.

  • Add Filter.has_dscp_match_terms check for INPUT filters in FlowCacheEntry::from_forward_decision (extend flow_cache.rs:281-291 to also check interface_input_filter_has_dscp_match(state, ingress_ifindex, is_v6)).
  • OR re-evaluate input filter on cache hit when DSCP-match terms are present (more invasive but preserves cache benefit for non-DSCP terms).

The cache-decline approach matches r5's pattern for output and is the lowest-risk fix.

H2 — Conservatism question. The output fix declines caching for any output filter with ANY DSCP-match term, even if the matching term has no terminal action (just log or counter). For pure-log DSCP filters, the cached input/output log identity is frozen anyway, so cache-decline is a defensible default. But could narrow to has_dscp_match_terms && (has_terminal_action_terms || has_log_terms). Not blocking; conservative is correct here.

H3 — Other per-packet variable match fields. What about TOS, IP options, fragment flag, MF/DF bits, IHL — are any of these match fields in firewall filter? If so, they have the same cache-stale class. Hostile question for Codex/Gemini to probe via term_matches_v4/v6 grep.

H4 — Output Reject path test STILL missing. From r4 → r5 not addressed. Tests only cover Discard.

H5 — lo0 filter cached DSCP: lo0 enforcement uses apply_lo0_filter_action which calls evaluate_lo0_filter_counted per packet — so lo0 IS evaluated per-packet on session hit, DSCP included. lo0 is correct. (Different design than the input/output cache stamping.)

Recommendation

Block on the input filter DSCP cache bypass (H1). Same class of bug as r4 output; r5 needs the symmetric fix.

Strongly consider:

  • H3: audit term_matches_v4/v6 for other per-packet-variable match fields not in cache key (TOS, fragment, IHL, IP options).
  • H4: add output Reject path test for symmetric coverage.

Codex (task-mpd5b9r5-mo1dff) and Gemini Pro 3.1 (task-mpd5buaq-j9e1gp) running with the input-DSCP hostile question highlighted in their prompts.

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-5 triple-review consolidated synthesis on 4567c9f4

Reviewer Verdict
Claude r5 MERGE-NEEDS-MAJOR (input-DSCP cache bypass)
Codex r5 MERGE-NEEDS-MAJOR (input parity + session-hit gap)
Gemini Pro 3.1 r5 MERGE-NEEDS-MAJOR (input parity confirmed as CRITICAL)

Triple-converge: r4 output-DSCP fix is correct; r5 has input-side gap

All three reviewers verify the r5 output fix at flow_cache.rs:281-291 is structurally correct and that lo0 enforcement is correctly per-packet via apply_lo0_filter_action (passes meta.dscp directly into evaluate_lo0_filter_counted).

All three reviewers also verify the same bypass class is open for INPUT filters.

Codex sharpens my finding — session-hit path also needs input re-eval

My Claude r5 said "add symmetric cache-decline for input filters" (mirroring the output fix). Codex r5 says that's necessary BUT INCOMPLETE:

Worse, a cache-decline-only input fix would still be incomplete unless the session-hit path also re-runs DSCP-sensitive input filters, because established session hits at poll_descriptor.rs:769-895 do not call evaluate_non_pbr_input_filter.

So the full input-side fix needs THREE pieces, mirroring how M1 (PASS_TO_KERNEL) was eventually fixed in r2/r3:

  1. Cache-install gate: decline from_forward_decision when input filter has DSCP-match terms. Add interface_input_filter_has_dscp_match helper.
  2. Session-hit re-eval: at poll_descriptor.rs:829+ (the session resolution branch), re-run evaluate_non_pbr_input_filter when input has DSCP-match terms — analogous to how apply_lo0_filter_action is called for LocalDelivery on session hit.
  3. Retroactive flush: when input-DSCP filter is added at runtime (filter config rotation), iterate existing sessions to either drop them or re-evaluate (analogous to republish_local_delivery_sessions_for_lo0_filter).

Without piece (2), even if (1) is added, EXISTING cached/installed sessions (installed when filter wasn't configured) keep bypassing on session hits. Without piece (3), filter config changes don't take effect on pre-existing sessions until expiry.

Gemini r3.1 — back to substantive output with the new format demand

Gemini r5 returned full quoted-line evidence per check on gemini-3.1-pro-preview:

Check 4 — FAIL (CRITICAL)

INPUT filter parity: The OUTPUT fix is incomplete as the same bypass exists for INPUT filters. Input filters are not re-evaluated on cache hits, and FlowCacheEntry::from_forward_decision has no check to decline cache insertion for DSCP-dependent input filters. A flow with a first-packet DSCP of 0 bypassing an input filter's discard term will be cached, and subsequent packets with a matched DSCP of 46 will blindly follow the fast path.

Independently confirms Codex's reading.

Codex additional findings

Codex confirms (matching my Claude r5):

  • Output fix at flow_cache.rs:281-292 is correct.
  • has_dscp_match_terms is wired from dscp_values only (Codex notes: current Go snapshot generation normalizes named/numeric DSCP into DSCPValues only; no dscp_ranges / dscp_except fields exist).
  • lo0 path correct (apply_lo0_filter_action evaluates per-packet).
  • Conservatism (decline-on-any-DSCP-term) is correct — DSCP-dependent log/count/policer/forwarding-class/dscp-rewrite decisions are also stale if cached by 5-tuple only. Don't narrow to terminal+DSCP.
  • No other cache-install paths beyond from_forward_decision.
  • Output Reject regression test STILL missing.
  • NAT64 doc wording adequate.

Recommendation

Block on input filter DSCP cache bypass. Three-piece fix required (parallel to how r2/r3 fixed the lo0/PASS_TO_KERNEL story):

  1. Cache-install gate: add interface_input_filter_has_dscp_match helper + cache-decline in from_forward_decision. (Symmetric to r5's output fix.)
  2. Session-hit re-eval: invoke evaluate_non_pbr_input_filter in the session-resolution branch when input has DSCP-match terms. (Symmetric to r3's apply_lo0_filter_action for lo0.)
  3. Retroactive flush: when input filter rotation adds DSCP terms, iterate existing sessions and drop or re-evaluate. (Symmetric to r3's republish_local_delivery_sessions_for_lo0_filter.)

Strongly consider:

  • Add output Reject regression test (carrying from r3/r4/r5 — still open).
  • Audit term_matches_v4/v6 for OTHER per-packet-variable match fields not in cache key (TOS, fragment, IHL, IP options).

Codex task: task-mpd5b9r5-mo1dff. Gemini Pro 3.1 task: task-mpd5buaq-j9e1gp.

Not merging — author's decision.

@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from 4567c9f to 77d93ce Compare May 19, 2026 21:47
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-6 update on 77d93cec

Closed the round-5 input-filter DSCP cache bypass and folded in the
carried output-reject regression test.

What changed

  • Added input-filter DSCP capability tracking alongside the existing output
    DSCP tracking.
  • FlowCacheEntry::from_forward_decision now declines cache insertion when
    either the ingress input filter or egress output filter has DSCP match terms.
  • Session hits now re-run DSCP-sensitive input filters before forwarding, so
    established sessions cannot bypass a newly matching per-packet DSCP term.
  • Forwarding rotations that add DSCP-sensitive input filters purge existing
    sessions for the affected packet family. This is intentionally conservative:
    session metadata does not retain the original logical ingress ifindex needed
    for exact per-interface re-evaluation.
  • Added regression coverage for input cache-decline, session-hit re-eval,
    rotation purge, and output reject terminal handling.
  • Updated AF_XDP/filter module docs and the Feature gap: dataplane events (PolicyDeny, ScreenDrop, FilterLog) not emitted by userspace-dp #1379 plan notes for the input and
    output DSCP cache behavior.

Validation

  • RUSTFLAGS='-Awarnings' cargo test from_forward_decision_skips_cache_for_dscp_matched_input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test poll_descriptor_session_hit_rechecks_dscp_input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test purge_sessions_for_input_dscp_filter_revalidation_removes_family -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test resolve_cos_tx_selection_drops_reject_output_filter_without_log -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test output_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
  • git diff --check HEAD^
  • Commit-message line-length audit: no lines over 72 chars.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-6 review on 77d93cec

Verdict: MERGE-READY (pending Codex + Gemini r6 hostile verification)

All three required pieces of the input-DSCP fix landed, mirroring how lo0/PASS_TO_KERNEL was fixed in r2/r3. Plus output reject regression test (carrying open from r3/r4/r5). Comprehensive.

Verified fixes (three pieces)

Piece A — Cache-decline for input DSCP-match filters.

flow_cache.rs:282-300 now does symmetric checks:

let is_v6 = meta.addr_family as i32 == libc::AF_INET6;
let ingress_ifindex = resolve_ingress_logical_ifindex(
    forwarding,
    meta.ingress_ifindex as i32,
    meta.ingress_vlan_id,
).unwrap_or(meta.ingress_ifindex as i32);
if crate::filter::interface_input_filter_has_dscp_match(
    &forwarding.filter_state,
    ingress_ifindex,
    is_v6,
) {
    return None;
}
if crate::filter::interface_output_filter_has_dscp_match(...) {
    return None;
}

Uses resolve_ingress_logical_ifindex for VLAN handling (matches input filter eval path's lookup).

New interface_input_filter_has_dscp_match at filter/engine.rs:1018-1029 reads from per-family FxHashSet<i32> (iface_filter_v4_has_dscp_match / iface_filter_v6_has_dscp_match) populated by compiler at filter/compiler.rs:142-146 + :190-194.

Piece B — Session-hit re-eval for input DSCP filters.

poll_descriptor.rs:863-887 after the session resolution sets session_ingress_zone:

if let Some(input_filter_eval) =
    evaluate_dscp_sensitive_input_filter_on_session_hit(...)
{
    if let Some(cached_log) = input_filter_eval.cached_log {
        emit_input_filter_log_match(...);
    }
    if input_filter_eval.action != crate::filter::FilterAction::Accept {
        binding.scratch.scratch_recycle.push(desc.addr);
        continue;
    }
}

Helper at :191-218 short-circuits with None when interface has no DSCP-match terms — no per-packet overhead when DSCP filters aren't configured. When DSCP filter IS present, calls full evaluate_non_pbr_input_filter (per-packet eval including DSCP). On non-Accept, emits log + recycles + continues. This is the symmetric counterpart to how apply_lo0_filter_action was added in r3 for lo0 session-hit re-eval.

Piece C — Retroactive purge on filter config rotation.

New input_dscp_filter_families_added(old, new) -> (bool, bool) at filter/engine.rs:1058-1071:

let v4 = new
    .iface_filter_v4_has_dscp_match
    .iter()
    .any(|ifindex| !old.iface_filter_v4_has_dscp_match.contains(ifindex));

Detects any ifindex in NEW that wasn't in OLD's DSCP-match set. Importantly handles the "existing ifindex GAINED DSCP terms" case: if old had ifindex 10 without DSCP (not in has_dscp_match), new has 10 WITH DSCP (now in has_dscp_match), the helper returns true. Verified by reading.

worker/mod.rs:1273-1306 detects the family-scoped transitions BEFORE Arc rotation (needs both old + new), then after forwarding = new_forwarding; calls purge_sessions_for_input_dscp_filter_revalidation. The purge is family-scoped (only v4 sessions if only v4 family added).

session_glue/mod.rs:261-313 purge_sessions_for_input_dscp_filter_revalidation walks sessions.iter_with_origin, filters by family, calls full delete_terminal_filtered_session (the same 5-step cleanup as r3's lo0 fix) + release_source_nat_allocation. Family-scoped purge with SNAT cleanup.

Output reject regression test added. resolve_cos_tx_selection_drops_reject_output_filter_without_log at tx/cos_classify_tests.rs:1425-1474. Uses action: "reject" (was Discard before). Closes the test gap carrying from r3/r4/r5.

Hostile observations (non-blocking)

H1 — NAT64 dual-stack cross-family question. flow_cache.rs:280-300 uses is_v6 = meta.addr_family == AF_INET6 for BOTH input AND output filter checks. For NAT64 (v6 ingress → v4 egress), the output check would use is_v6=true looking up iface_filter_out_v6_fast, but egress is v4.

Mitigated by should_cache: NAT64 flows are not cacheable (decision.nat.nat64 == true rejected before reaching from_forward_decision). So this concern doesn't manifest in practice — NAT64 flows never hit the DSCP check.

H2 — Reverse direction. When DSCP filter is REMOVED on config rotation, existing sessions stay correctly cached. Purge only fires on ADD. Acceptable — removed DSCP doesn't introduce a bypass.

H3 — Family-scoped purge edge case. If both v4 AND v6 input DSCP filters are added in the same rotation, (purge_v4, purge_v6) = (true, true) triggers a full session-table walk filtering by family. With many sessions across both families, this is a large one-time cost. Bounded operationally.

H4 — Session-hit re-eval timing. Fires after session_ingress_zone = Some(resolved.metadata.ingress_zone) and before TTL/lo0 checks. Order is: session resolve → input filter DSCP re-eval → lo0 re-eval → TTL check → ... Looks correct; input filter terminal should win over output/lo0.

H5 — evaluate_dscp_sensitive_input_filter_on_session_hit consolidates calls. When DSCP filter is set, this helper calls evaluate_non_pbr_input_filter per packet on session-hit. For very long-lived flows on a DSCP-filtered interface, each packet incurs the full filter eval cost. The short-circuit when no DSCP-match terms makes this zero-cost in the typical case. Performance trade-off is acceptable.

Codex r5 confirmed remaining gaps (not in scope for r6)

The r5 review also flagged: "audit term_matches_v4/v6 for OTHER per-packet-variable match fields not in cache key (TOS, fragment, IHL, IP options)." Not addressed in r6. Worth a follow-up issue if those match fields are actually supported by the filter compiler. Could check by grep at SHA 77d93ce — defer to reviewers.

Recommendation

MERGE-READY. All three triple-verified pieces from the r5 finding landed correctly with comprehensive tests. Output reject regression test closes the r3/r4/r5 test gap.

Strongly consider in follow-up issue:

  • Audit term_matches_v4/v6 for other per-packet-variable match fields (TOS, fragment, IHL, IP options) and apply same cache-decline pattern if needed.

Awaiting Codex (task-mpd63nsc-100xjn) and Gemini Pro 3.1 (task-mpd64cnv-hf59kw).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-6 triple-review synthesis on 77d93cec — Codex caught a MAJOR Claude + Gemini missed

Reviewer Verdict
Claude r6 (self-corrected) MERGE-NEEDS-MAJOR (was MERGE-READY)
Codex r6 MERGE-NEEDS-MAJOR
Gemini Pro 3.1 r6 MERGE-READY (missed Codex's finding)

Self-correction (Claude r6 → MAJOR)

My r6 review verified the three-piece input-DSCP fix landed and explicitly stated input_dscp_filter_families_added correctly handles "existing ifindex GAINED DSCP terms." Codex r6 sharpened the analysis: the helper detects when an ifindex transitions from NOT-DSCP-sensitive → DSCP-sensitive (set membership change), but does NOT detect when a DSCP-sensitive ifindex stays DSCP-sensitive while its DSCP TERMS change.

I conflated "ifindex transitioning into the has_dscp_match set" with "DSCP-sensitive filter content changing on an ifindex already in the set." They are different cases. Codex's finding is real.

Verified Codex MAJOR

Purge trigger misses DSCP filter content changes on already-DSCP-sensitive ifindexes.

filter/engine.rs:1058-1071 (verified at the commit):

let v4 = new
    .iface_filter_v4_has_dscp_match
    .iter()
    .any(|ifindex| !old.iface_filter_v4_has_dscp_match.contains(ifindex));

This checks "is there an ifindex in NEW's DSCP set that's not in OLD's DSCP set?" — set membership transition only.

Concrete bypass scenario:

  • OLD config: reth1.0 has input filter A with from dscp 0 destination-port 443 then accept → ifindex 10 ∈ has_dscp_match set.
  • NEW config: reth1.0 has input filter A modified to from dscp 46 destination-port 443 then discard log → ifindex 10 still ∈ has_dscp_match set.
  • input_dscp_filter_families_added(old, new) returns (false, false). No purge runs.
  • Existing sessions installed under the OLD config (when DSCP=0 was the match criterion) keep their cached state.
  • The session-hit re-eval (evaluate_dscp_sensitive_input_filter_on_session_hit) WILL eventually re-evaluate using the NEW filter — but only for packets that ARRIVE at the worker. Already-published PASS_TO_KERNEL session-map entries bypass userspace entirely and don't trigger re-eval.

Codex's framing:

The session-hit re-eval only protects packets that reach userspace; it is not a substitute for deleting previously published session state on rotation.

This is the same class of gap r3 closed for lo0 (M1 retroactive PASS_TO_KERNEL flush). The r6 implementation handles ifindex-set transitions but not within-set content changes.

Codex recommended fixes

  1. Conservative purge (recommended): purge family when ANY DSCP-sensitive input filter on that family changes (not just ifindex set membership). Compare filter identity / content on each ifindex.
  2. Filter-content compare: compute a per-ifindex hash of DSCP-sensitive term content and compare old vs new.

Option (1) is simpler — could iterate old.iface_filter_v4_has_dscp_match ∪ new.iface_filter_v4_has_dscp_match and check if the filter Arc identity OR a content fingerprint changed.

Required regression test (Codex)

Add a regression for "same ifindex, old DSCP-sensitive filter, new DSCP-sensitive filter with stricter/new DSCP term"

Gemini r6 missed this finding

Gemini r6 returned MERGE-READY. Gemini's Check 4 stated:

Retroactive purge helper cleanly detects interfaces that gained a DSCP filter over the rotation.

True for the ifindex-set-transition case. Gemini didn't probe the orthogonal content-change case. This is a category of bug Codex tends to catch better — mechanical edge case in a comparison helper.

Both Codex and Gemini confirmed

  • Cache-decline order correct (input before output, ingress logical ifindex via resolve_ingress_logical_ifindex).
  • interface_input_filter_has_dscp_match reads from new FilterState sets directly.
  • Session-hit re-eval placement correct: after session_ingress_zone set, before lo0 check; falls through on Accept; emits log + recycles on non-Accept.
  • Helper short-circuits with None when interface has no DSCP-match terms — zero per-packet cost when filters aren't DSCP-sensitive.
  • purge_sessions_for_input_dscp_filter_revalidation is comprehensive for SELECTED sessions: snapshots matching family, calls release_source_nat_allocation + delete_terminal_filtered_session (5-step cleanup).
  • DSCP filter REMOVAL doesn't need symmetric purge (existing cache entries are correctly cached now that DSCP no longer matters).
  • NAT64 not reachable (rejected by should_cache).
  • Output reject regression test exercises FilterAction::Reject directly (no log/count).
  • Diff scope correct (14 files, +580/-15).

Codex additional minor note

The code is still fragile: output filter family selection elsewhere is based on pre-translation meta.addr_family, so NAT64 output-filter semantics need a dedicated test if this PR is claiming that surface.

Defer-acceptable since NAT64 isn't currently cacheable; flag for follow-up if cacheability is extended.

Recommendation

Block on the DSCP content-change purge gap. Either:

  • (a) Conservative purge — trigger purge when any DSCP-sensitive input filter on a family CHANGES (compare filter Arc identity OR content fingerprint across rotation), OR
  • (b) Identity-based compare — track per-ifindex DSCP-sensitive filter content hash in FilterState and compare in input_dscp_filter_families_added.

Required test: regression for "same ifindex, old DSCP filter, new DSCP filter with different DSCP terms" → purge fires.

Strongly consider in follow-up:

  • Codex r5 carry: audit term_matches_v4/v6 for OTHER per-packet-variable match fields not in cache key (TOS, fragment, IHL, IP options).
  • NAT64 output-filter-family test if NAT64 cacheability is ever extended.

Codex task: task-mpd63nsc-100xjn. Gemini Pro 3.1 task: task-mpd64cnv-hf59kw.

Not merging — author's decision.

Copilot AI review requested due to automatic review settings May 19, 2026 22:03
@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from 77d93ce to e9aa5bc Compare May 19, 2026 22:03
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-7 fix pushed on e9aa5bc6

Closed the r6 Codex MAJOR around same-ifindex DSCP-sensitive input filter changes.

What changed:

  • Replaced the add-only helper with input_dscp_filter_families_changed.
  • The reload path now purges sessions when DSCP-sensitive input filters are added, removed, or semantically changed for a packet family.
  • The comparison inspects the actual filter content, not just ifindex set membership, so this case is now covered:
    • old ifindex 10 filter: dscp 0 then accept
    • new ifindex 10 filter: dscp 46 then discard log
    • result: purge v4 sessions so established flows return to the miss/revalidation path
  • Added regression coverage for the same-ifindex content-change case and the unchanged-filter non-purge case.
  • Updated the Feature gap: dataplane events (PolicyDeny, ScreenDrop, FilterLog) not emitted by userspace-dp #1379 closeout plan text to match the add/remove/change behavior.

Validation run:

  • RUSTFLAGS='-Awarnings' cargo test input_dscp_filter_families_changed -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test from_forward_decision_skips_cache_for_dscp_matched_input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test poll_descriptor_session_hit_rechecks_dscp_input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test purge_sessions_for_input_dscp_filter_revalidation_removes_family -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test output_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
  • git diff --check

Commit-message line-length audit is also clean:

  • git show -s --format=%B HEAD | awk 'length($0)>72 { print }'

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-7 review on e9aa5bc6

Verdict: MERGE-READY (pending Codex + Gemini r7 hostile verification)

The r6 same-ifindex DSCP-filter content-change bypass is closed with a bidirectional deep-semantic compare and a directly-targeted regression test.

Verified fixes

r6 bypass closure. New helper input_dscp_filter_families_changed at filter/engine.rs:1124-1138 replaces the set-membership-only check with a bidirectional deep semantic compare:

fn input_dscp_filter_family_changed(
    old_filters: &rustc_hash::FxHashMap<i32, Arc<Filter>>,
    new_filters: &rustc_hash::FxHashMap<i32, Arc<Filter>>,
) -> bool {
    old_filters
        .iter()
        .filter(|(_, filter)| filter.has_dscp_match_terms)
        .any(|(ifindex, old)| {
            new_filters
                .get(ifindex)
                .is_none_or(|new| !dscp_sensitive_filter_semantics_match(old, new))
        })
        || new_filters
            .iter()
            .filter(|(_, filter)| filter.has_dscp_match_terms)
            .any(|(ifindex, new)| {
                old_filters
                    .get(ifindex)
                    .is_none_or(|old| !dscp_sensitive_filter_semantics_match(old, new))
            })
}

Bidirectional arms cover ALL transition cases:

Case OLD NEW Caught by
Filter added absent DSCP-sensitive Second arm
Filter removed DSCP-sensitive absent First arm
Same ifindex, content changed (r6 case) DSCP-sensitive A DSCP-sensitive B Both arms
Same ifindex, DSCP terms removed DSCP-sensitive non-DSCP First arm
Same ifindex, DSCP terms gained non-DSCP DSCP-sensitive Second arm

Deep semantic compare:

  • filter_term_semantics_match at engine.rs:1058-1080 compares 20 FilterTerm config fields (id, name, source_v4/v6, dest_v4/v6, protocol_bitmap, protocol_match_enabled, source_ports, dest_ports, dscp_bitmap, dscp_match_enabled, action, count, has_count, log, policer_name, routing_instance, forwarding_class, dscp_rewrite).
  • dscp_sensitive_filter_semantics_match at engine.rs:1082-1100 compares Filter struct fields (id, name, family, affects_, has__terms) + term-by-term via the above.

Worker rename: callsite at worker/mod.rs:1273 updated input_dscp_filter_families_addedinput_dscp_filter_families_changed. No other callers.

New tests at filter/tests.rs:

input_dscp_filter_families_changed_detects_same_ifindex_content_change — exactly the r6 bypass case. ifindex 10, OLD has filter with dscp 0 accept, NEW has filter (same name) with dscp 46 discard log. Asserts (true, false) — v4 family detected as changed.

input_dscp_filter_families_changed_ignores_unchanged_filter — same filter content reused across rotations. Asserts (false, false) — no spurious purges.

Doc update at plan-1379-dataplane-events.md:58-61: "add, remove, or change DSCP-sensitive input filters" (was just "add").

Hostile observations (non-blocking)

H1 — FilterTerm fields NOT compared. filter_term_semantics_match skips two FilterTerm fields:

  • three_color_policer: Option<Arc<ThreeColorPolicerRuntime>> — runtime policer state. Comparison would be Arc-pointer-only (no semantic deep compare available). Filter changes that swap the policer runtime but keep the term's policer_name would not trigger DSCP purge — acceptable since policer changes don't affect DSCP-conditional cache decisions.
  • counter: Arc<FilterTermCounter> — atomic counter values (change per packet). Correctly excluded.

Both exclusions are intentional and correct for the DSCP-cache-bypass concern. Worth a code comment explaining the exclusions.

H2 — Filter.id stability question. filter_term_semantics_match and dscp_sensitive_filter_semantics_match both compare old.id == new.id. If Filter.id is auto-incremented per recompile, identical filters across rotations would have DIFFERENT IDs → comparison always returns false → unnecessary purges on every rotation. If Filter.id is content-derived (hash) or name-derived (stable), comparison works correctly. Need to verify how Filter.id is assigned at filter/compiler.rs.

H3 — Performance. Helper now iterates BOTH old.iface_filter_v4_fast AND new.iface_filter_v4_fast (full per-interface maps), filtering by has_dscp_match_terms. Previous helper iterated only the smaller has_dscp_match set. For configs with many interfaces but few DSCP-sensitive filters, the new helper still scans all interfaces. Forwarding rotations are rare so cost is acceptable; would benefit from O(|set|) iteration if the iface_filter_v4_has_dscp_match index were preserved alongside the deep compare.

H4 — r5 carry still open. The Codex r5 finding about auditing term_matches_v4/v6 for other per-packet-variable match fields (TOS, fragment, IHL, IP options) is NOT addressed in r7. Defer-acceptable but worth tracking. Need to grep term_matches_v4 for any non-cache-key match input.

Recommendation

MERGE-READY for the r6 closure. The fix is structurally correct, comprehensive (5/5 transition cases), and has a directly-targeted regression test.

Strongly consider:

  • H2: verify Filter.id assignment is stable across recompiles (hash-derived or name-derived, not auto-increment). If not stable, the helper falsely reports "changed" on every rotation, triggering unnecessary purges.
  • H1: add a comment explaining why three_color_policer and counter are excluded from the term-semantic compare.
  • H4: file follow-up issue to audit term_matches_v4/v6 for other per-packet-variable match fields.

Awaiting Codex (task-mpd6m8cx-tf97ba) and Gemini Pro 3.1 (task-mpd6mw6a-l8o3k8).

Not merging — author's decision.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

userspace-dp/src/afxdp/poll_descriptor.rs:1339

  • telemetry.dbg.policy_deny is incremented for lo0 filter drops on the session-miss local-delivery path as well, which makes the policy_deny debug counter cover more than policy denies. Consider tracking lo0/filter terminal drops separately so debug output remains interpretable.
                                )
                            {
                                telemetry.dbg.local += 1;
                                telemetry.dbg.policy_deny += 1;
                                binding.scratch.scratch_recycle.push(desc.addr);
                                continue;

Comment on lines +280 to +311
sessions.iter_with_origin(|key, decision, metadata, origin| {
let family_matches =
(purge_v4 && key.addr_family == libc::AF_INET as u8)
|| (purge_v6 && key.addr_family == libc::AF_INET6 as u8);
if family_matches {
stale.push((key.clone(), decision, metadata.clone(), origin));
}
});
let purged = stale.len();
for (key, decision, metadata, origin) in stale {
release_source_nat_allocation(
&forwarding.source_nat_rules,
&key,
decision.nat,
metadata.is_reverse,
now_ns,
);
delete_terminal_filtered_session(
sessions,
session_map_fd,
conntrack_v4_fd,
conntrack_v6_fd,
shared_sessions,
shared_nat_sessions,
shared_forward_wire_sessions,
shared_owner_rg_indexes,
peer_worker_commands,
&key,
decision,
&metadata,
origin,
);
resolved.origin,
);
telemetry.dbg.local += 1;
telemetry.dbg.policy_deny += 1;
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-7 triple-review synthesis on e9aa5bc6

Reviewer Verdict
Claude r7 (self-corrected) MERGE-NEEDS-MINOR (was MERGE-READY)
Codex r7 MERGE-NEEDS-MINOR
Gemini Pro 3.1 r7 MERGE-NEEDS-MAJOR (Critical)

Self-correction (Claude r7 → MINOR)

My r7 review flagged Filter.id stability as H2 ("verify, may be auto-incremented") but kept verdict at MERGE-READY pending verification. Both Codex and Gemini independently verified IDs ARE positional (assigned by filter_idx as u32 at filter/compiler.rs:101), confirming the concern. I should have escalated to MINOR after raising the question.

Triple-converge: r6 bypass is closed

Both Codex and Gemini verify the r6 same-ifindex DSCP filter content-change bypass is structurally fixed:

Codex: "Confirmed exact case in e9aa5bc6:userspace-dp/src/filter/tests.rs:1733... Bidirectional Check engine.rs:1100-1119 catches all requested transitions"

Gemini: "The new deep content comparison and regression tests correctly identify when an unchanged interface modifies its DSCP-sensitive terms... Bidirectional check catches ADDED, REMOVED, GAINED-DSCP, LOST-DSCP, and CONTENT-CHANGED scenarios"

5/5 transition cases covered.

Codex/Gemini disagree on severity of two residuals

1. Filter.id is positional, not content-stable.

Both verify at filter/compiler.rs:90-99:

for (filter_idx, snap) in filters.iter().enumerate() {
    ...
    id: filter_idx as u32,

The deep comparison starts with old.id == new.id (engine.rs:1080). If operator prepends an unrelated filter to the snapshot, all subsequent filter IDs shift → comparison fails → purge triggers.

Codex framing (MINOR): "Reused IDs cannot hide content changes because the deep field compare still runs. Same content with reordered filter snapshot gets a different Filter.id and causes a conservative purge."

Gemini framing (CRITICAL): "An unrelated index shift will fail the match and trick the engine into thinking the DSCP filter changed, spuriously purging all sessions."

My calibration: MINOR. Codex is right on the substance. The behavior is conservative over-purge, not a security bypass or correctness regression. Sessions are still functionally correct after purge — they just hit the slow path one more time. Performance/operator-experience concern, not a correctness blocker. Gemini's "Critical" is overstatement.

Recommended fix (non-blocking): either remove id from the comparison (rely on name + term content), OR derive Filter.id from content-hash for stability across rotations.

2. three_color_policer field NOT in filter_term_semantics_match.

Both reviewers flag this; same severity split.

Codex framing (MINOR): "Three_color_policer can affect packet treatment through policer_drop / DSCP rewrite in TX selection, so the helper is not exhaustive if 'semantics' means full packet behavior. Either compare the resolved runtime identity/shape or document that this is 'filter snapshot content' equality."

Gemini framing (Critical): "Cached sessions hold an Arc to the policer runtime; if the runtime is replaced (e.g. changing bandwidth limits), the cached session will hold a stale pointer and enforce the old bandwidth limits."

My calibration: MINOR. Two reasons:

  • The helper is specifically about INPUT DSCP filter changes that drive cache-decline + purge for DSCP-cache-bypass correctness. Three-color policers are TX-selection (egress), not DSCP-match-conditional caching.
  • Three-color policer stale-state is a SEPARATE bug class — rate limits applying stale bandwidth limits. Not a bypass, not a DSCP-conditional correctness issue.

Worth either (a) adding a comment explaining the intentional exclusion, OR (b) filing a separate follow-up for cached three-color policer staleness.

Test gap (Gemini Check 6, valid)

Gemini correctly notes the "unchanged filter" test uses same single-element snapshot for old + new → doesn't exercise the index-shift over-purge scenario. To prove "unchanged DSCP filter ignored on unrelated config rotation," the test should add a sentinel filter to one side and verify still (false, false) — currently impossible because the ID-shift would fire a spurious purge.

This is a real test gap. Closes naturally if the Filter.id stability fix lands.

Codex r5 carry status

Codex r7 confirms the r5 carry is STILL deferred:

"term_matches_v4/v6 still only checks protocol, src/dst prefixes, src/dst ports, and DSCP. Config has fields like ICMP type/code, TCP flags, is-fragment, and flexible match; those are not in the Rust userspace filter snapshot/evaluator path. DSCP/TOS is covered; fragment/IHL/IP-options-style matching is not fixed by r7."

Gemini independently confirms:

"The FilterTerm AST does not currently parse or support these metadata fields, meaning there are no hidden cache bypass vectors for them in this PR."

Both agree: the carry items (TOS, fragment, IHL, IP options) are not yet in the parser/evaluator surface, so there's no equivalent cached-bypass for them right now. If those fields get added to FilterTerm later, the symmetric cache-decline pattern would need to apply.

Recommendation

MERGE-NEEDS-MINOR. The r6 bypass is closed. Two non-blocking observations:

  • Filter.id positional instability: either remove id from filter_term_semantics_match + dscp_sensitive_filter_semantics_match, OR derive Filter.id from content hash. Comment the choice.
  • three_color_policer exclusion: add a code comment explaining that the helper compares filter-SNAPSHOT semantics for DSCP-cache-decline purposes, not full TX-selection runtime equality. File separate follow-up for cached three-color policer staleness if that's a real concern.

Strongly consider:

  • Improve the "unchanged" test to add a sentinel non-DSCP filter to NEW (current test can't catch the ID-shift over-purge).
  • File follow-up issue for cached three-color policer runtime staleness on rotation.

Defer:

  • r5 carry items (TOS/fragment/IHL/IP options) — not yet in the FilterTerm surface.

Codex task: task-mpd6m8cx-tf97ba. Gemini Pro 3.1 task: task-mpd6mw6a-l8o3k8.

Not merging — author's decision.

Close the userspace RT_FLOW filter-log gaps that let logs describe a
security decision that the dataplane did not actually enforce.

Existing local-delivery sessions could keep bypassing a newly-added lo0
filter because the helper had already published PASS_TO_KERNEL state in
the BPF session map. Add a forwarding-snapshot reload hook that scans
current local-delivery sessions when lo0 filters are present and
republishes matching entries as helper-visible live sessions. This keeps
established local-delivery traffic visible to the userspace lo0 gate.

Cached local-delivery session hits also need terminal cleanup when the
current lo0 filter denies the packet. Replace the bare local session
remove with a helper that deletes BPF/session-map state, removes shared
session/NAT/forward-wire indexes, replicates the peer delete, and emits
the close delta. Extend the cached-hit regression test so it proves the
local table, shared indexes, and HA delta are all cleared together.

Policy reject events now fail closed consistently with the forwarding
behavior. The userspace PolicyDenied path does not synthesize ICMP or
TCP reset packets yet, so PolicyAction::Reject is emitted as RT_FLOW
deny instead of claiming reject-on-wire behavior that did not happen.

Output filters now carry terminal actions through TX selection. The
compiler marks output filters with discard/reject terms as needing TX
path evaluation even if the term has no log, count, policer,
forwarding-class, or DSCP rewrite side effect. Runtime and cached TX
descriptors preserve the matched action and drop before enqueue when the
output term is terminal. Logged output deny/reject events therefore no
longer describe packets that still forward.

DSCP-matched output filters deliberately decline flow-cache insertion.
The flow-cache key is only the 5-tuple, while DSCP is per-packet
metadata. Caching the first packet's output-filter decision would let a
later packet on the same tuple with a different DSCP bypass terminal
terms. Keep those flows on the live TX-selection path instead.

Apply the same invariant to DSCP-matched input filters. Cache insertion
now declines when the ingress input filter has DSCP match terms, session
hits re-run DSCP-sensitive input filters before forwarding, and config
rotations purge existing sessions when DSCP-sensitive input filters are
added, removed, or semantically changed. The purge is intentionally
conservative because session metadata does not retain the original
logical ingress ifindex needed for exact per-interface re-evaluation.

The rotation comparison now inspects actual DSCP-sensitive filter
content, not only ifindex set membership. It compares stable names, term
match/action semantics, and three-color policer runtime shape while
ignoring compiler-positional filter and term IDs. That closes the
same-ifindex bypass without over-purging after unrelated filter reorder,
and still purges when a DSCP-sensitive term points at a changed policer
shape.

Update the AF_XDP and filter module docs plus the #1379 closeout plan to
state the enforced input/output filter behavior and the remaining lo0
caveats. NAT64 local-delivery filtering is documented in terms of the
pre-translation packet family.

Validation:
- RUSTFLAGS='-Awarnings' cargo test input_dscp_filter_families_changed
  -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test \
  from_forward_decision_skips_cache_for_dscp_matched_input_filter \
  -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test \
  poll_descriptor_session_hit_rechecks_dscp_input_filter \
  -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test \
  purge_sessions_for_input_dscp_filter_revalidation_removes_family \
  -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test output_filter -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
- RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
- go test ./pkg/dataplane/userspace ./pkg/logging -run \
  TestEventStreamRawDataplaneEventsFeedSyslogFanout
- go test ./pkg/dataplane/userspace ./pkg/logging -run \
  TestDecodeDataplaneEventPolicyDenyRTFlow
- go test ./pkg/dataplane/userspace ./pkg/logging -run \
  TestFormatBinaryRecord_Basic
- git diff --check
@psaab psaab force-pushed the codex/d-1379-filter-log-fix branch from e9aa5bc to d2bb24b Compare May 19, 2026 22:25
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-8 cleanup pushed on d2bb24b7

Closed both r7 residuals instead of carrying them as MINORs.

What changed:

  • The DSCP-sensitive input filter comparison no longer uses compiler-positional Filter.id or FilterTerm.id, so unrelated filter reordering does not trigger a conservative purge.
  • The comparison now includes three_color_policer runtime semantics, not just policer_name. It treats reused Arc runtimes as unchanged and otherwise compares stable runtime identity plus runtime shape.
  • Added regression coverage for:
    • unrelated filter reorder shifting the positional filter ID but not purging;
    • a DSCP-sensitive term pointing at a changed three-color policer shape and purging;
    • the existing same-ifindex DSCP content-change and unchanged-filter cases.
  • Updated userspace-dp/src/filter/README.md and the Feature gap: dataplane events (PolicyDeny, ScreenDrop, FilterLog) not emitted by userspace-dp #1379 closeout plan to document the stable-content comparison contract.

Validation run:

  • RUSTFLAGS='-Awarnings' cargo test input_dscp_filter_families_changed -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test from_forward_decision_skips_cache_for_dscp_matched_input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test poll_descriptor_session_hit_rechecks_dscp_input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test purge_sessions_for_input_dscp_filter_revalidation_removes_family -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test input_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test output_filter -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test filter_log -- --nocapture
  • RUSTFLAGS='-Awarnings' cargo test lo0_filter -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging -run 'TestEventStreamRawDataplaneEventsFeedSyslogFanout|TestDecodeDataplaneEventPolicyDenyRTFlow|TestFormatBinaryRecord_Basic'
  • git diff --check

Commit-message line-length audit is clean:

  • git show -s --format=%B HEAD | awk 'length($0)>72 { print }'

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Claude round-8 review on d2bb24b7

Verdict: MERGE-READY (pending Codex + Gemini r8 hostile verification)

Both r7 MINORs closed with directly-targeted regression tests.

Verified fixes

Fix 1 — Positional ID exclusion. filter_term_semantics_match (engine.rs:1069+) and dscp_sensitive_filter_semantics_match (engine.rs:1093+) no longer compare id. Both now start with name:

old.name == new.name
    && old.source_v4 == new.source_v4
    && ...

Filter.id and FilterTerm.id are positional (filter/compiler.rs:101 uses filter_idx as u32). Removing them from the comparison eliminates spurious over-purge on filter reorder.

Fix 2 — Three-color policer runtime shape. New helper three_color_policer_semantics_match (engine.rs:1058-1067):

match (old.as_ref(), new.as_ref()) {
    (None, None) => true,
    (Some(old), Some(new)) => Arc::ptr_eq(old, new) || old.same_runtime_shape_as(new),
    _ => false,
}
  • (None, None) → match (correct; no change)
  • (Some, Some)Arc::ptr_eq fast path for unchanged Arc, else same_runtime_shape_as deep compare
  • Asymmetric (None, Some) / (Some, None) → false → triggers purge (correct: term gained/lost policer is a semantic change)

New ThreeColorPolicerRuntime::same_runtime_shape_as (mod.rs:271-285):

pub(crate) fn same_runtime_shape_as(&self, other: &Self) -> bool {
    if self.id != other.id || self.name != other.name {
        return false;
    }
    let Ok(state) = self.state.lock() else {
        return false;
    };
    other
        .state
        .lock()
        .ok()
        .is_some_and(|other| state.same_runtime_shape(&other))
}

Acquires both runtime mutexes briefly, delegates to inner ThreeColorPolicerState::same_runtime_shape (policer.rs) which compares 7 fields: mode, color_blind, committed_rate_bytes_per_sec, committed_burst_bytes, peak_or_excess_rate_bytes_per_sec, peak_or_excess_burst_bytes, treatments. Comprehensive coverage of policer shape.

filter_term_semantics_match now includes three_color_policer field via the new helper.

New regression tests

input_dscp_filter_families_changed_ignores_positional_filter_id_change at filter/tests.rs:1804-1857:

  • Builds OLD with [unrelated_filter, dscp_filter] order → dscp_filter gets filter_idx=1.
  • Builds NEW with [dscp_filter, unrelated_filter] order → dscp_filter gets filter_idx=0.
  • Asserts assert_ne!(old.iface_filter_v4_fast.get(&10).unwrap().id, new.iface_filter_v4_fast.get(&10).unwrap().id, ...) — proves test setup actually forces different IDs (not vacuous).
  • Asserts (false, false) — proves the ID exclusion works.

input_dscp_filter_families_changed_detects_three_color_shape_change at filter/tests.rs:1858-1919:

  • Builds OLD with committed_burst_bytes: 100.
  • Builds NEW with committed_burst_bytes: 200, uses parse_filter_state_with_three_color_preserving(..., Some(&old)) — the runtime-preservation path.
  • Asserts the policer Arc was NOT reused (!Arc::ptr_eq).
  • Asserts helper detects change.

Doc updates

plan-1379:60-64:

"The rotation comparison ignores compiler-positional filter IDs and includes three-color policer runtime shape so unrelated filter reordering does not over-purge while policer-shape changes still force revalidation."

filter/README.md:34-37:

"Forwarding rotations compare DSCP-sensitive input filter content by stable names, terms, and three-color policer runtime shape, not by compiler-positional filter IDs, before deciding whether existing sessions need a conservative packet-family purge."

Both align with implementation.

Hostile observations (non-blocking)

H1 — Mutex lock ordering. same_runtime_shape_as always acquires self.state.lock() first, then other.state.lock(). If two threads were to call a.same_runtime_shape_as(&b) and b.same_runtime_shape_as(&a) simultaneously, AB-BA deadlock. In practice, this is called from worker's config rotation handler, single-threaded per worker, with the OLD-then-NEW order, so no deadlock. Worth a code comment documenting the ordering invariant.

H2 — Mutex poison handling. Returns false on poison → conservative purge. Acceptable trade-off but worth noting in code comment.

H3 — r5 carry status. Still deferred. The FilterTerm AST doesn't yet support TOS/fragment/IHL/IP options, so no equivalent cache bypass currently exists for those fields. If any of those become FilterTerm fields later, the symmetric cache-decline + content-comparison pattern would need to apply.

Recommendation

MERGE-READY. Both r7 MINORs surgically closed with rigorous regression tests (positional-ID test asserts both that IDs differ AND that helper returns false; policer-shape test asserts Arc identity changed AND helper detects).

Strongly consider in follow-up:

  • Code comment in same_runtime_shape_as documenting lock-ordering invariant + poison-fail-closed convention.
  • File follow-up for r5 carry (TOS/fragment/IHL/IP options) when/if those FilterTerm fields are added.

Awaiting Codex (task-mpd7grdg-urepem) and Gemini Pro 3.1 (task-mpd7hbxx-07v7b5).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 19, 2026

Round-8 triple-review final synthesis on d2bb24b7 — CONVERGED MERGE-READY

Reviewer Verdict
Claude r8 MERGE-READY
Codex r8 MERGE-READY
Gemini Pro 3.1 r8 MERGE-READY (8/8 checks PASS)

After 8 rounds of hostile review with self-corrections and finding-iteration on every round, all three reviewers independently converged on MERGE-READY with quoted-line verification.

Triple-verified

Positional ID exclusion (r7 MINOR #1 closed):

  • Codex: "No old.id == new.id remains in either helper."
  • Gemini: "The id fields ... are correctly omitted from both filter_term_semantics_match and dscp_sensitive_filter_semantics_match."
  • Test asserts assert_ne! on old/new IDs to prove non-vacuous, then assert_eq! returns (false, false).

Three-color policer runtime semantics (r7 MINOR #2 closed):

  • All four cases verified: (None, None) → true, (Some, Some) → Arc::ptr_eq || same_runtime_shape_as, asymmetric (None, Some)/(Some, None) → false.
  • same_runtime_shape compares 7 fields: mode, color_blind, committed_rate, committed_burst, peak_or_excess_rate, peak_or_excess_burst, treatments. Codex: "All fields that dictate the runtime performance envelope or treatment ... are systematically compared."

Lock-ordering safety (Gemini's sharp framing):

self is the old, currently-live policer runtime which is contended by dataplane workers. other is the newly-parsed unexposed policer runtime (which no workers can access yet). Thus, other.state is uncontended. Dataplane workers only lock a single policer at a time (self.state), so a cyclic deadlock is impossible. If the mutex is poisoned, it safely returns false to trigger a conservative purge.

Better articulation than my Claude r8 lock-ordering note — explains why deadlock is impossible (not just "callers maintain order").

Tests rigorous: both new regression tests explicitly assert the test setup forces the relevant divergence (ID-shift OR new Arc), then assert the helper makes the correct decision. Not vacuous.

Performance: Codex framing: "Common unchanged rotations preserve Arcs and hit Arc::ptr_eq, so no mutexes. Worst case is roughly 4 * workers * DSCP-ifindexes * policer-terms mutex acquisitions per rotation when runtimes are not pointer-equal." Forwarding rotations are control-plane events, not packet path. Not a merge blocker.

Carried forward (out of scope for this PR)

r5 carry — audit other per-packet-variable filter match fields. Codex r8: "deferred/not addressed by r8. The r8 delta is filter semantic comparison only; it does not touch AF_XDP parser/event paths."

The FilterTerm AST does not currently support TOS, fragment, IHL, or IP options as match fields, so no equivalent cached-bypass exists for them in the userspace dataplane today. If those fields are added to the FilterTerm surface in a future PR, the symmetric pattern (cache-decline + content-comparison + session purge) will need to apply.

Recommend filing this as a follow-up issue tracker entry.

Recommendation

MERGE-READY converged across all three reviewers.

The complete fix chain delivered in this PR closes 6 bug classes across 8 rounds:

  1. r1: Output filter terminal action enforcement (Discard/Reject carried through TX selection, drop before enqueue)
  2. r1: Reject contract divergence (filter-path Reject fail-closed as deny; policy-path semantics preserved)
  3. r2: lo0 enforcement on cached LocalDelivery (re-eval on session hit + publish-time gate + retroactive republish)
  4. r4: Cached output-filter DSCP bypass (cache-decline for output filters with DSCP-match terms)
  5. r5/r6: Cached input-filter DSCP bypass (cache-decline + session-hit re-eval + retroactive purge with content-aware detection)
  6. r8: Filter.id positional instability and three-color policer runtime shape (proper semantic comparison for purge trigger)

Strongly consider in follow-up (non-blocking):

  • Code comment in same_runtime_shape_as documenting Gemini's lock-ordering analysis (self=live, other=unexposed → no contention).
  • File issue for r5 carry (TOS/fragment/IHL/IP options) when/if those FilterTerm fields are added.

Codex task: task-mpd7grdg-urepem. Gemini Pro 3.1 task: task-mpd7hbxx-07v7b5.

Not merging — author's decision.

@psaab psaab merged commit 6b89a9b into master May 19, 2026
psaab added a commit that referenced this pull request May 27, 2026
PR #1430 closed the immediate filter-log enforcement gaps by treating
DSCP as cache-sensitive metadata living outside the 5-tuple
session/flow-cache key. #1431 codifies the contract going forward so
the next per-packet match field (TOS lower bits/ECN, TCP flags,
fragment/IHL, IP options, ICMP type/code, flex_match, etc.) cannot
silently bypass flow-cache.

This PR is documentation plus tests — no runtime change.

Three deliverables:

  1. userspace-dp/src/filter/README.md grows a 'Cache-key invariants
     for per-packet match fields (#1431)' section: the invariant
     wording verbatim from the issue body, a classification table
     covering every match criterion on FirewallTermSnapshot and
     FilterTerm (including future candidates), a path (b) runbook
     listing the seven hooks (aggregate flag, per-interface set,
     lookup helpers, flow-cache gate, session-hit re-eval, rotation
     purge, tests) with file:line refs to the DSCP reference
     implementation, a path (a) pointer for promoting a field into
     SessionKey, and a lo0 note clarifying that LocalDelivery is
     non-cacheable (is_cacheable() returns false) so lo0 filters do
     not need a per-interface has_<X>_match set.

  2. In-source CACHE-KEY INVARIANT comment blocks above FilterTerm
     in userspace-dp/src/filter/mod.rs and above FirewallTermSnapshot
     in userspace-dp/src/protocol/security.rs. These are the loud
     reviewer-facing tripwires: anyone adding a match field to either
     surface will see the contract in the diff. Both blocks point at
     the README section for the full classification table and recipe.

  3. Two new runbook-shaped tests in
     userspace-dp/src/afxdp/flow_cache_tests.rs:
     - dscp_input_gate_blocks_flow_cache_insertion_via_runbook_pattern
     - dscp_output_gate_blocks_flow_cache_insertion_via_runbook_pattern
     These re-exercise the DSCP gate already covered by the bespoke
     tests at lines 644/696, but in an explicitly runbook-shaped
     layout with step-by-step comments. They serve as the canonical
     'clone this when adding a new cache-sensitive match field'
     references that the README cites.

Cited (not duplicated) existing coverage:
  - rotation purge positional-ID immunity:
    filter/tests.rs:1806 input_dscp_filter_families_changed_ignores_positional_filter_id_change
  - session-hit re-evaluation: afxdp/tests.rs:3184

Methodology: 4 rounds of Codex + AGY adversarial plan review.
  - r1: Codex PLAN-KILL / AGY PLAN-NEEDS-MAJOR — both rejected the
    v1 PER_PACKET_MATCH_FIELDS constant list + PerPacketMatchField
    trait + fake-field harness as compile-time theater (Rust has no
    struct-field reflection without proc-macros).
  - r2: Codex PLAN-NEEDS-MINOR / AGY PLAN-READY — kept the salvage
    path (README + harness as runbook reference); Codex caught
    pub(super) visibility on FlowCacheEntry::from_forward_decision
    forcing the tests into afxdp/ not filter/.
  - r3: both PLAN-NEEDS-MINOR on stale-from-v2 leftovers in v3.
  - r4: both PLAN-READY on v5.

Plan + reviewer task ids preserved under
docs/pr/1431-filter-cache-invariants/.

Validation:
  - cargo build --release clean (warnings unchanged from master).
  - 2 new tests pass 5/5 (flake check).
  - Full cargo --release passes except a pre-existing
    snat_contract_documents_current_fail_closed_runtime doc-guard
    that also fails on origin/master (docs/userspace-dataplane-gaps.md
    has 0 occurrences of 'fail-closed' on master) — unrelated to #1431.
  - Go suite clean.

Closes #1431
psaab added a commit that referenced this pull request May 27, 2026
Codex r1 returned MERGE-NEEDS-MINOR on PR #1604 with two findings:

  1. The CACHE-KEY INVARIANT block above FirewallTermSnapshot was a
     shortened version of the block above FilterTerm and dropped the
     concrete file:line hooks. For a doc-invariant PR, the two
     blocks should either be identical or both be intentionally
     short pointers to the README. Harmonized: protocol/security.rs
     now mirrors filter/mod.rs's block, including the extend-SessionKey
     prerequisites, the cache-sensitive runbook hooks with
     afxdp/flow_cache.rs:297-309 / afxdp/poll_descriptor/mod.rs:217-244
     / afxdp/worker/loop_body/mod.rs:295-330 references, the #1430
     precedent, and the README pointer.
  2. The 'Step 4: same gate, different per-interface set
     (iface_filter_out_v4_fast.has_dscp_match_terms)' comment in the
     output runbook test was sloppy — has_dscp_match_terms is an
     aggregate bool on Filter (looked up via the per-interface
     iface_filter_out_v{4,6}_fast map), not a HashSet. Reworded to
     'fast map lookup plus aggregate flag, not a HashSet' so the
     reference comment is accurate.

Copilot r1 (COMMENTED) flagged two inline issues:

  3. The README's 'parse_flow_ports stores the ICMP echo identifier
     in src_port' was over-specifying. parse_flow_ports
     (frame/inspect.rs:212-232) unconditionally reads bytes 4-6 of
     the ICMP header for ALL ICMP types — those bytes only carry an
     'identifier' for Echo Request/Reply, and are opaque for other
     ICMP types. Reworded the README explanation and the
     classification-table row accordingly.
  4. The 'lines 644 and 696' references in the new tests' header
     comment and the README runbook are off-by-one against current
     master (the bespoke tests are at 643/695) AND will drift over
     time. Replaced both occurrences with test names:
     from_forward_decision_skips_cache_for_dscp_matched_input_filter
     / _output_filter.

Per /triple-review Step 8.5, the Claude SMR code-review doc is now
on the branch at docs/pr/1431-filter-cache-invariants/claude-smr-code-r1.md
with MERGE-READY verdict. The doc covers diff-coverage check,
allocation/lock/numerical/HA invariants, test-coverage shape
verification, both Codex r1 minors and both Copilot inline findings
analysed in detail, ICMP + lo0 claims independently re-verified,
pre-existing snat_contract doc-guard failure proven master-broken,
and a self-correction section listing the misses Codex / AGY caught
in earlier plan rounds.

Build + 2 new tests still pass 5/5 flake. Go suite still clean.
psaab added a commit that referenced this pull request May 27, 2026
PR #1430 closed the immediate filter-log enforcement gaps by treating
DSCP as cache-sensitive metadata living outside the 5-tuple
session/flow-cache key. #1431 codifies the contract going forward so
the next per-packet match field (TOS lower bits/ECN, TCP flags,
fragment/IHL, IP options, ICMP type/code, flex_match, etc.) cannot
silently bypass flow-cache.

This PR is documentation plus tests — no runtime change.

Three deliverables:

  1. userspace-dp/src/filter/README.md grows a 'Cache-key invariants
     for per-packet match fields (#1431)' section: the invariant
     wording verbatim from the issue body, a classification table
     covering every match criterion on FirewallTermSnapshot and
     FilterTerm (including future candidates), a path (b) runbook
     listing the seven hooks (aggregate flag, per-interface set,
     lookup helpers, flow-cache gate, session-hit re-eval, rotation
     purge, tests) with file:line refs to the DSCP reference
     implementation, a path (a) pointer for promoting a field into
     SessionKey, and a lo0 note clarifying that LocalDelivery is
     non-cacheable (is_cacheable() returns false) so lo0 filters do
     not need a per-interface has_<X>_match set.

  2. In-source CACHE-KEY INVARIANT comment blocks above FilterTerm
     in userspace-dp/src/filter/mod.rs and above FirewallTermSnapshot
     in userspace-dp/src/protocol/security.rs. These are the loud
     reviewer-facing tripwires: anyone adding a match field to either
     surface will see the contract in the diff. Both blocks point at
     the README section for the full classification table and recipe.

  3. Two new runbook-shaped tests in
     userspace-dp/src/afxdp/flow_cache_tests.rs:
     - dscp_input_gate_blocks_flow_cache_insertion_via_runbook_pattern
     - dscp_output_gate_blocks_flow_cache_insertion_via_runbook_pattern
     These re-exercise the DSCP gate already covered by the bespoke
     tests at lines 644/696, but in an explicitly runbook-shaped
     layout with step-by-step comments. They serve as the canonical
     'clone this when adding a new cache-sensitive match field'
     references that the README cites.

Cited (not duplicated) existing coverage:
  - rotation purge positional-ID immunity:
    filter/tests.rs:1806 input_dscp_filter_families_changed_ignores_positional_filter_id_change
  - session-hit re-evaluation: afxdp/tests.rs:3184

Methodology: 4 rounds of Codex + AGY adversarial plan review.
  - r1: Codex PLAN-KILL / AGY PLAN-NEEDS-MAJOR — both rejected the
    v1 PER_PACKET_MATCH_FIELDS constant list + PerPacketMatchField
    trait + fake-field harness as compile-time theater (Rust has no
    struct-field reflection without proc-macros).
  - r2: Codex PLAN-NEEDS-MINOR / AGY PLAN-READY — kept the salvage
    path (README + harness as runbook reference); Codex caught
    pub(super) visibility on FlowCacheEntry::from_forward_decision
    forcing the tests into afxdp/ not filter/.
  - r3: both PLAN-NEEDS-MINOR on stale-from-v2 leftovers in v3.
  - r4: both PLAN-READY on v5.

Plan + reviewer task ids preserved under
docs/pr/1431-filter-cache-invariants/.

Validation:
  - cargo build --release clean (warnings unchanged from master).
  - 2 new tests pass 5/5 (flake check).
  - Full cargo --release passes except a pre-existing
    snat_contract_documents_current_fail_closed_runtime doc-guard
    that also fails on origin/master (docs/userspace-dataplane-gaps.md
    has 0 occurrences of 'fail-closed' on master) — unrelated to #1431.
  - Go suite clean.

Closes #1431
psaab added a commit that referenced this pull request May 27, 2026
Codex r1 returned MERGE-NEEDS-MINOR on PR #1604 with two findings:

  1. The CACHE-KEY INVARIANT block above FirewallTermSnapshot was a
     shortened version of the block above FilterTerm and dropped the
     concrete file:line hooks. For a doc-invariant PR, the two
     blocks should either be identical or both be intentionally
     short pointers to the README. Harmonized: protocol/security.rs
     now mirrors filter/mod.rs's block, including the extend-SessionKey
     prerequisites, the cache-sensitive runbook hooks with
     afxdp/flow_cache.rs:297-309 / afxdp/poll_descriptor/mod.rs:217-244
     / afxdp/worker/loop_body/mod.rs:295-330 references, the #1430
     precedent, and the README pointer.
  2. The 'Step 4: same gate, different per-interface set
     (iface_filter_out_v4_fast.has_dscp_match_terms)' comment in the
     output runbook test was sloppy — has_dscp_match_terms is an
     aggregate bool on Filter (looked up via the per-interface
     iface_filter_out_v{4,6}_fast map), not a HashSet. Reworded to
     'fast map lookup plus aggregate flag, not a HashSet' so the
     reference comment is accurate.

Copilot r1 (COMMENTED) flagged two inline issues:

  3. The README's 'parse_flow_ports stores the ICMP echo identifier
     in src_port' was over-specifying. parse_flow_ports
     (frame/inspect.rs:212-232) unconditionally reads bytes 4-6 of
     the ICMP header for ALL ICMP types — those bytes only carry an
     'identifier' for Echo Request/Reply, and are opaque for other
     ICMP types. Reworded the README explanation and the
     classification-table row accordingly.
  4. The 'lines 644 and 696' references in the new tests' header
     comment and the README runbook are off-by-one against current
     master (the bespoke tests are at 643/695) AND will drift over
     time. Replaced both occurrences with test names:
     from_forward_decision_skips_cache_for_dscp_matched_input_filter
     / _output_filter.

Per /triple-review Step 8.5, the Claude SMR code-review doc is now
on the branch at docs/pr/1431-filter-cache-invariants/claude-smr-code-r1.md
with MERGE-READY verdict. The doc covers diff-coverage check,
allocation/lock/numerical/HA invariants, test-coverage shape
verification, both Codex r1 minors and both Copilot inline findings
analysed in detail, ICMP + lo0 claims independently re-verified,
pre-existing snat_contract doc-guard failure proven master-broken,
and a self-correction section listing the misses Codex / AGY caught
in earlier plan rounds.

Build + 2 new tests still pass 5/5 flake. Go suite still clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants