Skip to content

Emit userspace security dataplane events#1418

Merged
psaab merged 2 commits into
masterfrom
codex/next-1379-event-producers
May 18, 2026
Merged

Emit userspace security dataplane events#1418
psaab merged 2 commits into
masterfrom
codex/next-1379-event-producers

Conversation

@psaab
Copy link
Copy Markdown
Owner

@psaab psaab commented May 18, 2026

Summary

  • thread the event-stream worker handle into AF_XDP packet processing
  • add fixed-size nonblocking RT_FLOW emit helpers for policy-deny, screen-drop, and filter-log events
  • emit policy-deny/session-miss drops, screen drops/invalid SYN-cookie ACKs, and logged routing-instance/PBR filter hits
  • carry filter/term/action metadata through routing-instance filter evaluation without per-packet heap work
  • update Feature gap: dataplane events (PolicyDeny, ScreenDrop, FilterLog) not emitted by userspace-dp #1379 docs and module README files to reflect the new producer boundary

Validation

  • cargo test --manifest-path userspace-dp/Cargo.toml event_emit -- --nocapture
  • cargo test --manifest-path userspace-dp/Cargo.toml session_miss_ack_stage_invokes_syn_cookie_runtime_validation -- --nocapture
  • cargo test --manifest-path userspace-dp/Cargo.toml ingress_filter_routing_instance_steers_flow_into_native_gre_table -- --nocapture
  • cargo test --manifest-path userspace-dp/Cargo.toml interface_filter_routing_instance_counted_returns_matching_override -- --nocapture
  • go test ./pkg/dataplane/userspace ./pkg/logging
  • git diff --check

Refs #1379

Copilot AI review requested due to automatic review settings May 18, 2026 04:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Wires per-event RT_FLOW telemetry production into the userspace AF_XDP pipeline for #1379. The event-stream worker handle is threaded through WorkerContext and packet processing, fixed-size non-blocking emit helpers are added for PolicyDeny, ScreenDrop, and FilterLog, and routing-instance filter evaluation is extended to surface log/action/filter/term metadata for use by the producers.

Changes:

  • Add event_emit.rs with emit_policy_deny_event / emit_screen_drop_event / emit_filter_log_event and call them from policy-deny, screen-drop/invalid-cookie-ACK, and logged PBR ingress-filter sites.
  • Carry filter_id/term_id/action/log through routing-instance filter evaluation via a new FilterRoutingInstanceResult and assign positional IDs in the filter compiler.
  • Thread EventStreamWorkerHandle through WorkerContext, poll_binding, and tests; update README/gap docs and _Log.md.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
userspace-dp/src/afxdp/event_emit.rs New emit helpers + RT_FLOW/screen-reason mappings; hardcodes timestamp_ns=0 and misses icmp-fragment.
userspace-dp/src/afxdp/poll_stages.rs Calls emit_screen_drop on Drop/SynCookieChallenge/invalid cookie ACK; challenge path may emit a misleading event.
userspace-dp/src/afxdp/poll_descriptor.rs Threads event_stream + now_ns into ingress route override and adds policy-deny emit at deny branch.
userspace-dp/src/afxdp/forwarding/mod.rs Uses new event-aware filter eval and emits filter-log events for logged PBR hits.
userspace-dp/src/afxdp/worker/{mod,lifecycle}.rs, types/runtime.rs Plumbs event_stream handle into worker context.
userspace-dp/src/afxdp/frame/tests.rs, test_fixtures.rs Test plumbing for new ingress_route_table_override signature.
userspace-dp/src/event_stream/{mod,producer}.rs Adds test_worker_handle helper; refines dead_code attribute comment.
userspace-dp/src/filter/{mod,engine,compiler}.rs Adds id fields, FilterRoutingInstanceResult, and event-aware eval entrypoint.
userspace-dp/src/{afxdp,filter}/README.md, docs/userspace-dataplane-gaps.md Documentation updates for new producer boundary and #1379 status.
_Log.md Action log entry.
Comments suppressed due to low confidence (1)

userspace-dp/src/afxdp/poll_stages.rs:306

  • emit_screen_drop_event is invoked for ScreenVerdict::SynCookieChallenge with reason "syn-cookie" in stage_screen_check. A SYN-cookie challenge is not a drop in the RT_FLOW sense — the dataplane responds with a SYN+ACK rather than silently discarding the flow. Emitting a ScreenDrop event for the challenge path (in addition to the existing screen_drops counter increment) may produce syslog noise that doesn't correspond to actual drops and could confuse operators looking at deny telemetry. Consider distinguishing the challenge-emit case from the actual drop case, or at minimum carrying a distinct screen reason so consumers can filter it.
        ScreenVerdict::SynCookieChallenge(_) => {
            emit_screen_drop_event(
                worker_ctx.event_stream,
                &screen_pkt,
                meta,
                zone_id,
                "syn-cookie",
                event_now_ns_from_secs(now_secs),
            );
            counters.touched = true;
            counters.screen_drops += 1;
            StageOutcome::RecycleAndContinue
        }

application_id: 0,
filter_id,
screen_id: 0,
timestamp_ns: 0,
Comment on lines +194 to +215
fn screen_reason_id(reason: &'static str) -> u32 {
match reason {
"syn-flood" => SCREEN_SYN_FLOOD,
"icmp-flood" => SCREEN_ICMP_FLOOD,
"udp-flood" => SCREEN_UDP_FLOOD,
"port-scan" => SCREEN_PORT_SCAN,
"ip-sweep" => SCREEN_IP_SWEEP,
"land-attack" => SCREEN_LAND_ATTACK,
"ping-of-death" => SCREEN_PING_OF_DEATH,
"teardrop" => SCREEN_TEARDROP,
"tcp-syn-fin" => SCREEN_TCP_SYN_FIN,
"tcp-no-flag" => SCREEN_TCP_NO_FLAG,
"tcp-fin-no-ack" => SCREEN_TCP_FIN_NO_ACK,
"winnuke" => SCREEN_WINNUKE,
"ip-source-route" => SCREEN_IP_SOURCE_ROUTE,
"syn-frag" => SCREEN_SYN_FRAG,
"syn-cookie" | "syn-cookie-unavailable" => SCREEN_SYN_COOKIE,
"session-limit-src" => SCREEN_SESSION_LIMIT_SRC,
"session-limit-dst" => SCREEN_SESSION_LIMIT_DST,
_ => 0,
}
}
Comment on lines +66 to +75
for (filter_idx, snap) in filters.iter().enumerate() {
let key = qualify_filter_key(&snap.family, &snap.name);
let terms = snap
.terms
.iter()
.map(|t| parse_term(t, &state.three_color_policer_by_name))
.enumerate()
.map(|(term_idx, t)| parse_term(t, term_idx as u32, &state.three_color_policer_by_name))
.collect::<Vec<_>>();
let filter = Filter {
id: filter_idx as u32,
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 18, 2026

Claude r1 review on 1e94f8e1

Verdict: MERGE-READY (pending Codex/Gemini)

Continues #1379 dataplane event work — adds the security event producers (policy-deny, screen-drop, filter-log) on top of the #1404 bounded event producer infrastructure we just merged. Diff 604 LOC including 368-LOC NEW userspace-dp/src/afxdp/event_emit.rs + integration into poll_stages, forwarding, worker lifecycle, event_stream/producer.

Hot-path discipline verified on inspection

event_emit.rs:38+:

pub(super) fn emit_policy_deny_event(
    event_stream: Option<&EventStreamWorkerHandle>, ...
) {
    let Some(event_stream) = event_stream else {
        return;  // zero-cost when event stream is disabled
    };
    ...
}

Early-return if no event stream — no allocation on the disabled path. Uses DataplaneEventPayload from event_stream::codec (already in tree from #1404).

Budget discipline (must verify with Codex)

The pattern hardened in #1404 was: try_acquire(kind)encode_dataplane_event(seq, &event)try_send_frame → release-on-fail. Budget is held until ack or drop via pop_replay_frame. Codex must verify the new emit_* paths obey the same lifecycle and don't leak on early-return / panic / channel-full.

Hostile concerns for Codex

A. Per-packet cost on the enabled path: any Vec::new / Box::new / heap allocation in emit_policy_deny_event etc.?
B. Budget lifecycle: every emit path acquires before encode? Releases on every error return?
C. Three canonical paths covered: policy-deny + screen-drop + filter-log? Tests pin each at the call-site, not just helper-isolated?
D. Event encoding stability: wire format versioned for future consumer upgrades?
E. forwarding/mod.rs +32 — what does the dataplane-event integration look like? Any synchronous wait that could backpressure the packet path?
F. SCREEN_ flag constants (1<<14 SCREEN_SYN_COOKIE etc.) — match the canonical Junos RT_FLOW screen flag bitfield?

Recommendation

MERGE-READY structurally. The disabled-path is zero-cost, the enabled-path piggybacks on the #1404 infrastructure we just hardened. Codex verification of per-packet allocation + budget-leak-free emit paths is the key check.

Awaiting Codex (task-mpaoufxg-gpgwqp) + Gemini Pro 3 (task-mpaour6f-vt7q0r).

Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 18, 2026

Round-1 triple-review synthesis on 1e94f8e1

Reviewer Verdict
Claude MERGE-READY
Codex MERGE-NEEDS-MINOR (icmp-fragment unmapped + missing policy-deny integration test)
Gemini Pro 3 MERGE-READY

Hot-path + budget — both reviewers verify clean

Gemini quoting event_stream/codec.rs:252-255:

"Everything is bounded and strictly stack-allocated... [u8; 256] pattern... maximum offset accessed is 16 + 134 = 150, which rests safely below the 256-byte stack boundary."

Gemini quoting producer.rs:326-335:

"Hardening introduced in PR #1404 has been exactly mirrored here. It acquires budget try_acquire(kind) BEFORE paying the CPU cost of advancing sequence numbers or executing EventFrame::encode_dataplane_event()."

Codex on budget lifecycle:

"try_emit_dataplane_event_at() does rate-limit, try_acquire, encode, enqueue. It releases on enqueue failure, disconnect, ACK/replay trim, replay eviction, and shutdown drain. I did not find a budget leak."

Codex MINORs

MINOR 1: icmp_fragment screen unmapped:

"screen.rs can return ScreenVerdict::Drop('icmp-fragment'), but event_emit.rs:194-214 has no mapping for that reason and falls through to screen_id = 0. Go renders unknown flags as screen(0x0), so this supported userspace screen check emits an unidentifiable RT_FLOW screen event."

Real silent-identity bug for one of the production screen checks. Easy fix — add the reason to the screen_reason_id mapping.

MINOR 2: policy-deny canonical path missing integration test:

"The real producer call is in poll_descriptor.rs:1496, but the only policy-deny event test is the direct helper test in event_emit.rs:266. Filter-log has a PBR path test and screen-drop has syn-cookie-stage coverage; policy-deny needs a poll-descriptor/session-miss test."

The pattern we hardened in prior rounds: helper-isolated tests pass even when the call-site regresses. Policy-deny is the canonical denied-by-policy path; it deserves an integration test.

Codex also notes coverage scope:

"Filter-log: logged routing-instance/PBR filter hits only, not broader non-PBR input/output/lo0 filters."

PBR scope is documented in the PR — broader filter-log coverage is a tracked follow-up (per author's own gap doc).

Wire format

Codex: "Wire format is stable fixed binary, not versioned. Frame header is [len:u32][type:u8][reserved:3][seq:u64]; security payload is fixed 136-byte dataplane.Event shape with message types 11-13."

Versioning gap — but the framing is fixed-shape so future versioning could go in the reserved:3 bytes if needed.

Recommendation

Strongly consider in this PR:

  1. Add icmp-fragment to screen_reason_id mapping (Codex MINOR 1) — single line fix.
  2. Add a poll-descriptor integration test for policy-deny event emission (Codex MINOR 2).

Defer: broader non-PBR filter-log coverage (already tracked); wire format versioning (low-cost when needed).

The event producers are clean. Hot path is zero-allocation, budget lifecycle correctly inherits #1404 hardening, no leak paths.

Codex task: task-mpaoufxg-gpgwqp. Gemini task: task-mpaour6f-vt7q0r. Not merging — author's decision.

@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 18, 2026

Claude r2 review on 1c4ac38c

Verdict: MERGE-READY (pending Codex/Gemini)

r1 MINORs closure: icmp-fragment mapping + policy-deny integration test

r2 diff: event_emit.rs +35 (screen reason mapping) + forwarding/tests.rs +90/-2 (policy-deny integration test).

Need Codex to verify:

  • event_emit.rs +35: icmp-fragment now maps to a non-zero screen_reason_id (probably SCREEN_ICMP_FRAG = 1<<13 or similar). Any other previously-unmapped reasons also covered?
  • forwarding/tests.rs +90: pins the canonical policy-deny path through the producer (not just helper-isolated). The test should exercise poll_descriptor.rs:1496 call site → DataplaneEventPayload → producer round-trip.

Recommendation

MERGE-READY as a coverage closure. Awaiting Codex (task-mpapyjn4-y1tzo3) + Gemini Pro 3 (task-mpapyrdu-0nijj5).

Not merging — author's decision.

@psaab psaab force-pushed the codex/next-1379-event-producers branch from 1c4ac38 to dd3bb5b Compare May 18, 2026 04:45
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 18, 2026

Round-2 triple-review synthesis on 1c4ac38c

Reviewer Verdict
Claude MERGE-READY
Codex MERGE-NEEDS-MINOR (policy-deny still helper-level, not end-to-end)
Gemini Pro 3 MERGE-READY

r1 MINOR 1 (icmp-fragment mapping) — CLOSED

Both reviewers verify:

Codex: "icmp-fragment now maps to SCREEN_ICMP_FRAGMENT = 1 << 17 in event_emit.rs:31, with the match arm at event_emit.rs:214."

Gemini: confirmed with quoted diff, plus screen_reason_id_maps_icmp_fragment test pins it.

No screen reason now falls through to screen_id = 0 for production paths.

Codex MINOR — policy-deny integration test scope

"The new test at forwarding/tests.rs:1797 manually calls evaluate_policy, then directly calls emit_policy_deny_event at line 1854. It never drives a packet through poll_binding_process_descriptor, so it would still pass if the real call at poll_descriptor.rs:1496 were removed, gated incorrectly, passed wrong zones, or used the wrong owner_rg_id / timestamp. It also bypasses the production evaluate_policy_with_len call at poll_descriptor.rs:1118."

This is a real coverage gap — the failure mode "production call site bypasses the emit" is exactly what the integration test should catch but doesn't. The test pins that the helper composes correctly when invoked, but not that the production code invokes the helper.

Gemini missed this — verified the emit composition only.

Recommendation

Strongly consider in this PR: add a real poll_descriptor.rs:1496 integration test that:

  1. Sets up a worker context with a permit-deny policy
  2. Drives a packet through poll_binding_process_descriptor
  3. Asserts an RT_FLOW event was emitted with the right kind/zones/timestamp

The current test would pass if someone deleted the poll_descriptor.rs:1496 call site entirely.

Defer: broader integration test for screen-drop and filter-log canonical paths (already partially covered for syn-cookie + PBR per r1 review).

Codex task: task-mpapyjn4-y1tzo3. Gemini task: task-mpapyrdu-0nijj5. Not merging — author's decision.

Copilot AI review requested due to automatic review settings May 18, 2026 05:40
@psaab psaab force-pushed the codex/next-1379-event-producers branch from dd3bb5b to 33eed04 Compare May 18, 2026 05:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 5 comments.

application_id: 0,
filter_id,
screen_id: 0,
timestamp_ns: 0,
Comment on lines +295 to +302
emit_screen_drop_event(
worker_ctx.event_stream,
&screen_pkt,
meta,
zone_id,
"syn-cookie",
event_now_ns_from_secs(now_secs),
);
Comment on lines +194 to +217
#[inline]
fn screen_reason_id(reason: &'static str) -> u32 {
match reason {
"syn-flood" => SCREEN_SYN_FLOOD,
"icmp-flood" => SCREEN_ICMP_FLOOD,
"udp-flood" => SCREEN_UDP_FLOOD,
"port-scan" => SCREEN_PORT_SCAN,
"ip-sweep" => SCREEN_IP_SWEEP,
"land-attack" => SCREEN_LAND_ATTACK,
"ping-of-death" => SCREEN_PING_OF_DEATH,
"teardrop" => SCREEN_TEARDROP,
"tcp-syn-fin" => SCREEN_TCP_SYN_FIN,
"tcp-no-flag" => SCREEN_TCP_NO_FLAG,
"tcp-fin-no-ack" => SCREEN_TCP_FIN_NO_ACK,
"winnuke" => SCREEN_WINNUKE,
"ip-source-route" => SCREEN_IP_SOURCE_ROUTE,
"syn-frag" => SCREEN_SYN_FRAG,
"syn-cookie" | "syn-cookie-unavailable" => SCREEN_SYN_COOKIE,
"session-limit-src" => SCREEN_SESSION_LIMIT_SRC,
"session-limit-dst" => SCREEN_SESSION_LIMIT_DST,
"icmp-fragment" => SCREEN_ICMP_FRAGMENT,
_ => 0,
}
}
Comment on lines +66 to +75
for (filter_idx, snap) in filters.iter().enumerate() {
let key = qualify_filter_key(&snap.family, &snap.name);
let terms = snap
.terms
.iter()
.map(|t| parse_term(t, &state.three_color_policer_by_name))
.enumerate()
.map(|(term_idx, t)| parse_term(t, term_idx as u32, &state.three_color_policer_by_name))
.collect::<Vec<_>>();
let filter = Filter {
id: filter_idx as u32,
Comment on lines +747 to +756
#[cfg(test)]
pub(crate) fn push_for_test(&mut self, desc: XdpDesc) {
let prod = unsafe { *self.ring.producer };
let idx = prod & self.ring.mask;
let slot = unsafe { (self.ring.ring as *mut XdpDesc).add(idx as usize) };
unsafe {
*slot = desc;
*self.ring.producer = prod.wrapping_add(1);
}
}
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 18, 2026

Claude r3 review on 33eed04d

Verdict: MERGE-READY (pending Codex/Gemini)

r2 MINOR closure verified — end-to-end policy-deny integration test

New test poll_descriptor_policy_deny_path_emits_rt_flow_event in userspace-dp/src/afxdp/tests.rs:

  • Builds a real TCP SYN frame via build_policy_deny_tcp_syn_frame() (proper eth header + IPv4 + TCP with correct checksums)
  • Sets up real ZoneSnapshot + NeighborSnapshot fixtures (lan/wan/dmz)
  • Constructs BindingWorker::new_for_mirror_test + real UserspaceDpMeta with frame offset bookkeeping
  • Drives poll_binding_process_descriptor (the production entry point, not the helper)
  • Asserts an RT_FLOW event was emitted

This is exactly the "would still pass if poll_descriptor.rs:1496 were removed" gap from r2. If a future PR removes or gates the production emit call, this test fails.

Both r1 MINORs now closed

  • icmp-fragment → SCREEN_ICMP_FRAGMENT = 1 << 17 mapping (r2)
  • Canonical policy-deny path end-to-end pinned (r3)

Recommendation

MERGE-READY. The integration test stack is now real-frame-real-fixture-real-entrypoint. Closes the failure mode "production call site is removed and tests still pass" — the canonical risk for event producers.

Awaiting Codex (task-mpas4dog-1bmn75) + Gemini Pro 3 (task-mpas4m14-z5vzbo). Not merging — author's decision.

psaab added 2 commits May 17, 2026 22:48
Add a real poll-descriptor policy-deny test that drives an RX descriptor through the production packet path and asserts the RT_FLOW policy-deny event is emitted. Add a test-only RX ring injector so the test does not bypass the event hook it is meant to protect.
@psaab psaab force-pushed the codex/next-1379-event-producers branch from 33eed04 to f785e93 Compare May 18, 2026 05:49
@psaab psaab merged commit 6e3c21b into master May 18, 2026
@psaab
Copy link
Copy Markdown
Owner Author

psaab commented May 18, 2026

Round-3 triple-review synthesis on 33eed04d

Reviewer Verdict
Claude MERGE-READYMERGE-NEEDS-MINOR (self-correcting)
Codex MERGE-NEEDS-MINOR (IP addresses unasserted)
Gemini Pro 3 MERGE-NEEDS-MINOR (IP addresses unasserted)

Both reviewers independently caught the same gap.

r2 MINOR (call-site coverage) — CLOSED

All three reviewers confirm the new test drives poll_binding_process_descriptor end-to-end. If the production policy-deny call site at poll_descriptor.rs:1496 is removed, event_rx.try_recv() fails — the failure mode I was specifically worried about.

New r3 MINOR — IP assertions missing

Codex:

"The new test decodes the RT_FLOW event and asserts kind, zones, ingress ifindex, ports, timestamp, and stats, but it never asserts event.src_ip or event.dst_ip. The test frame is explicitly 10.0.61.102:12345 -> 172.16.80.200:5201, so wrong/zero event addresses would still pass."

Gemini quoted the assertions:

assert_eq!(event.kind, crate::event_stream::codec::DataplaneEventKind::PolicyDeny);
assert_eq!(event.ingress_zone_id, TEST_LAN_ZONE_ID);
assert_eq!(event.egress_zone_id, TEST_WAN_ZONE_ID);
assert_eq!(event.ingress_ifindex, 24);
assert_eq!(event.src_port, 12345);
assert_eq!(event.dst_port, 5201);
assert_eq!(event.timestamp_ns, 0);
// Missing IP assertions!

Self-correction

My r3 review said MERGE-READY based on the call-site coverage. I should have read the assertion block to verify ALL the payload fields are pinned — port assertions imply IP-assertion-by-association, but Codex specifically called out the test would pass if IPs came out as zero. This is the same pattern as #1407 IPC roundtrip where I verified the wire keys but missed the IPC-test structure: focus on what was asked but not on adjacent verification gaps.

Recommendation

Strongly consider in this PR: add two lines to the new test:

assert_eq!(event.src_ip, IpAddr::V4(Ipv4Addr::new(10, 0, 61, 102)));
assert_eq!(event.dst_ip, IpAddr::V4(Ipv4Addr::new(172, 16, 80, 200)));

The test data already encodes these via the frame builder; just assert the payload reflects them. Closes the "wrong/zero IPs would silently pass" gap.

Codex task: task-mpas4dog-1bmn75. Gemini task: task-mpas4m14-z5vzbo. Not merging — author's decision.

psaab added a commit that referenced this pull request May 18, 2026
* userspace: emit security dataplane events
* events: cover icmp-fragment and policy-deny RT_FLOW output

Add a real poll-descriptor policy-deny test that drives an RX descriptor
through the production packet path and asserts the RT_FLOW policy-deny
event is emitted. Add a test-only RX ring injector so the test does not
bypass the event hook it is meant to protect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants