diff --git a/_Log.md b/_Log.md index 5afbd86c6..9d08f0d40 100644 --- a/_Log.md +++ b/_Log.md @@ -742,3 +742,6 @@ - **Timestamp**: 2026-05-17T09:34:59-07:00 - **Action**: #1373 Phase 1 documentation migration: mark Rust AF_XDP userspace as the primary/default dataplane development and validation target, demote eBPF wording to legacy compatibility/regression context, and preserve explicit retirement blockers for #1374-#1381 without claiming unresolved gaps closed. - **File(s)**: `README.md`, `CLAUDE.md`, `docs/testing.md`, `docs/development-workflow.md`, `docs/test_env.md`, `docs/userspace-dataplane-gaps.md`, `docs/feature-gaps.md`, `docs/userspace-dataplane-architecture.md`, `docs/afxdp-packet-processing.md`, `docs/ha-cluster-test-plan.md`, `testing-docs/README.md`, `bpf/README.md`, `userspace-dp/README.md`, `pkg/dataplane/README.md`, `userspace-dp/src/afxdp/README.md`, `_Log.md` +- **Timestamp**: 2026-05-17T18:57:46Z + - **Action**: Adversarial PR #1374 follow-up: optimize SYN-cookie validated-client cache with map+queue state so insert/take are O(1), while expiry/eviction drain stale queue tokens from the head without whole-cache scans. + - **File(s)**: `userspace-dp/src/screen.rs`, `userspace-dp/src/screen_tests.rs`, `_Log.md` diff --git a/docs/pr/1373-retire-ebpf-dataplane/plan-1374-syn-cookies.md b/docs/pr/1373-retire-ebpf-dataplane/plan-1374-syn-cookies.md index 3514e4af9..435f0c613 100644 --- a/docs/pr/1373-retire-ebpf-dataplane/plan-1374-syn-cookies.md +++ b/docs/pr/1373-retire-ebpf-dataplane/plan-1374-syn-cookies.md @@ -16,6 +16,23 @@ SYN cookie behavior in `userspace-dp`. profile the actual TX completion cost instead of assuming in-place RX-to-TX bounce is required. +## Current Slice Status (2026-05-17) + +- #1393 landed the deterministic userspace cookie codec/layout and codec tests. +- This runtime slice carries `syn_cookie` through Go and Rust screen snapshots, + adds a fail-closed screen challenge verdict when no HA-safe secret is + published, uses a fixed-size keyed validated-client table for + attacker-controlled tuples, and validates returning ACKs only after normal + session lookup misses. +- The AF_XDP hook currently consumes valid cookie ACKs and drops invalid cookie + ACKs while cookie mode is active. `SynCookieChallenge` is still accounted as a + screen drop instead of transmitting a SYN-ACK. +- This PR is therefore a SYN-cookie validation/admission runtime slice, not the + full SYN-ACK/RST TX implementation. +- The userspace capability gate remains in place until bounded SYN-ACK TX, ACK + RST emission, HA-safe secret publication, counters/status, and integration + validation land. + ## Design Use SipHash, not HMAC-SHA1/SHA256. Linux SYN cookies and the current kernel @@ -63,10 +80,13 @@ On returning ACK: - No heap allocation while deciding SYN cookie mint or ACK validation. - SipHash key lookup is per-zone and read-only on the published snapshot. +- Validated-client state is a fixed-size keyed table; attacker-controlled + tuples do not enter `FxHashMap` or an unbounded queue. +- Per-zone SYN-cookie active/counter state is config-bound and prepopulated on + profile updates, so packet processing does not allocate zone strings. - Cookie reply frame allocation is bounded; normal forwarding frame ownership takes priority over diagnostic/flood replies. - Random ACKs never install sessions. -- Validated-client state uses a bounded LRU or equivalent capped table. ## State and HA Behavior @@ -105,7 +125,20 @@ On returning ACK: - Cargo: `screen::syn_cookie_epoch_low_bits_wrap_rejects_32_epoch_old_cookie`. - Cargo: `screen::syn_cookie_validation_tries_current_and_previous_full_epoch`. - Cargo: `screen::syn_cookie_chosen_when_threshold_exceeded`. -- Cargo: `screen::syn_cookie_budget_drop_does_not_starve_tx`. +- Cargo: `screen::syn_cookie_without_published_secret_fails_closed`. +- Cargo: `screen::syn_cookie_ack_validation_marks_next_syn_bypass_without_session_creation`. +- Cargo: `screen::syn_cookie_validated_syn_still_runs_later_screen_checks`. +- Cargo: `screen::syn_cookie_invalid_ack_does_not_validate_client`. +- Cargo: `screen::syn_cookie_ack_fin_is_invalid_while_cookie_mode_is_active`. +- Cargo: `screen::syn_cookie_validated_cache_is_bounded`. +- Cargo: `screen::syn_cookie_validated_cache_index_is_keyed`. +- Cargo: `screen::syn_cookie_invalid_ack_flood_does_not_grow_validated_cache`. +- Cargo: `screen::syn_cookie_master_key_rotation_clears_validated_cache`. +- Cargo: `screen::update_profiles_prepopulates_syn_cookie_active_state`. +- Cargo: `screen::syn_cookie_validated_cache_refresh_extends_ttl`. +- Go: while the gate remains, keep the `SynFloodProtectionMode == "syn-cookie"` + capability rejection pinned and verify screen snapshots carry `syn_cookie` for + the runtime path. - Go: remove/update the `SynFloodProtectionMode == "syn-cookie"` capability rejection and the manager test that pins it. - Integration: hping3 SYN flood against the userspace HA cluster with diff --git a/docs/userspace-dataplane-gaps.md b/docs/userspace-dataplane-gaps.md index 2f8e0fbbd..21f84e2d2 100644 --- a/docs/userspace-dataplane-gaps.md +++ b/docs/userspace-dataplane-gaps.md @@ -51,7 +51,7 @@ These are the remaining explicit configuration gates in | Feature/config shape | Gate status | Retirement blocker | |----------------------|-------------|--------------------| | Unsupported policy shapes | Gated | Address/application expansion must succeed for userspace | -| Screen behavior requiring SYN cookies | Gated | #1374 | +| Screen behavior requiring SYN cookies | Gated; userspace screen runtime has fail-closed cookie challenge/ACK-validation/cache scaffolding, but no HA key publication or SYN-ACK/RST TX yet | #1374 | | Three-color policers | Gated | #1375 | | Port mirroring | Gated | #1376 | @@ -61,7 +61,7 @@ These are not "missing", but they are not pure userspace forwarding either: | Area | Current boundary | |------|------------------| -| SYN cookie flood protection | Legacy eBPF fallback | +| SYN cookie flood protection | Legacy eBPF fallback until #1374 wires HA-safe secrets, bounded SYN-ACK/RST TX, counters/status, and removes the userspace capability gate | | Kernel-owned traffic (ARP, local delivery, management, some non-IP) | cpumap or kernel pass-through from XDP | | GRE / ESP / explicit early filters | Tail-call back into the legacy XDP pipeline | | IPsec / XFRM handling | Userspace detects and punts to kernel/slow-path as needed | @@ -79,7 +79,7 @@ The current #1373 audit produced these tracked blockers: | #1377 | Preserve address-persistent SNAT pool selection with an approved cross-backend contract. #1385 landed userspace-v1 deterministic selection and fail-closed pool admission, but does not close cross-backend parity by itself. | Phase 4 BPF source removal | | #1378 | Finish the policy-scheduler retirement contract after #1396 userspace propagation: hit-counter survival across scheduler snapshot rebuilds, strict missing-scheduler commit behavior, and integration/failover validation | Phase 4 BPF source removal | | #1379 | Emit policy-deny, screen-drop, and filter-log dataplane events from userspace | Phase 4 BPF source removal | -| #1374 | Implement userspace SYN-cookie flood protection or an approved equivalent | Phase 4 BPF source removal | +| #1374 | Implement userspace SYN-cookie flood protection or an approved equivalent. #1393 and the 2026-05-17 runtime slice cover deterministic cookie codec/layout, snapshot propagation, fail-closed screen challenge selection, session-miss ACK validation, and a bounded validated-client cache. Lower-layer coverage in `userspace-dp/src/screen_tests.rs` pins 4-way validated-client cache replacement; poll-stage tests only pin the operational invalid-ACK drop/bypass semantics. Remaining: validated-client cache expiration semantics, secret-epoch rotation, bounded SYN-ACK TX, ACK RST emission, HA-safe secret publication/cache survivability, counters/status, integration/failover validation, and userspace capability gate removal. | Phase 4 BPF source removal | | #1375 | Implement userspace RFC 2697/2698 three-color policers | Phase 4 BPF source removal | | #1376 | Implement userspace port mirroring or explicitly retire the feature | Phase 4 BPF source removal | | #1380 | Retire the remaining BPF-map-oriented `show system buffers` operator surface now that #1386 provides userspace helper-status reporting. | Phase 5 CLI / observability cleanup | diff --git a/pkg/dataplane/userspace/manager_test.go b/pkg/dataplane/userspace/manager_test.go index 3c702371b..2b91f2283 100644 --- a/pkg/dataplane/userspace/manager_test.go +++ b/pkg/dataplane/userspace/manager_test.go @@ -3169,6 +3169,28 @@ func TestBuildScreenSnapshotsMatchesZoneToProfile(t *testing.T) { } } +func TestBuildScreenSnapshotsMarksSynCookieMode(t *testing.T) { + cfg := &config.Config{} + cfg.Security.Flow.SynFloodProtectionMode = "syn-cookie" + cfg.Security.Zones = map[string]*config.ZoneConfig{ + "trust": {Name: "trust", ScreenProfile: "flood"}, + } + cfg.Security.Screen = map[string]*config.ScreenProfile{ + "flood": { + Name: "flood", + TCP: config.TCPScreen{SynFlood: &config.SynFloodConfig{AttackThreshold: 100}}, + }, + } + + snaps := buildScreenSnapshots(cfg) + if len(snaps) != 1 { + t.Fatalf("len(snaps) = %d, want 1", len(snaps)) + } + if !snaps[0].SYNCookie { + t.Fatalf("SYNCookie = false, want true: %+v", snaps[0]) + } +} + func TestDeriveUserspaceCapabilitiesAllowsSessionTimeouts(t *testing.T) { cfg := &config.Config{} cfg.Security.Flow.TCPSession = &config.TCPSessionConfig{ diff --git a/pkg/dataplane/userspace/protocol.go b/pkg/dataplane/userspace/protocol.go index 86c517f65..d983ba7f3 100644 --- a/pkg/dataplane/userspace/protocol.go +++ b/pkg/dataplane/userspace/protocol.go @@ -304,6 +304,7 @@ type ScreenProfileSnapshot struct { ICMPFloodThreshold uint32 `json:"icmp_flood_threshold,omitempty"` UDPFloodThreshold uint32 `json:"udp_flood_threshold,omitempty"` SYNFloodThreshold uint32 `json:"syn_flood_threshold,omitempty"` + SYNCookie bool `json:"syn_cookie,omitempty"` // Advanced screen features for userspace dataplane SessionLimitSrc uint32 `json:"session_limit_src,omitempty"` SessionLimitDst uint32 `json:"session_limit_dst,omitempty"` diff --git a/pkg/dataplane/userspace/snapshot.go b/pkg/dataplane/userspace/snapshot.go index cab6c053f..e69aa7187 100644 --- a/pkg/dataplane/userspace/snapshot.go +++ b/pkg/dataplane/userspace/snapshot.go @@ -1614,6 +1614,7 @@ func buildScreenSnapshots(cfg *config.Config) []ScreenProfileSnapshot { } if sp.TCP.SynFlood != nil && sp.TCP.SynFlood.AttackThreshold > 0 { snap.SYNFloodThreshold = uint32(sp.TCP.SynFlood.AttackThreshold) + snap.SYNCookie = cfg.Security.Flow.SynFloodProtectionMode == "syn-cookie" } if sp.LimitSession.SourceIPBased > 0 { snap.SessionLimitSrc = uint32(sp.LimitSession.SourceIPBased) diff --git a/userspace-dp/src/FEATURES.md b/userspace-dp/src/FEATURES.md index 3d4a4f073..a6901b5e7 100644 --- a/userspace-dp/src/FEATURES.md +++ b/userspace-dp/src/FEATURES.md @@ -20,7 +20,7 @@ ordering. | File | Stage | What it does | |------|-------|--------------| -| `screen.rs` | screen | Pre-session attack-protection checks (land, TCP SYN+FIN, no-flag, FIN-without-ACK, ICMP frag, plus rate-limits). Mirrors `bpf/xdp/xdp_screen.c`. Also contains the #1374 userspace SYN-cookie mint/validate core; the Go `syn-cookie` capability gate must stay closed until SYN-ACK TX replies, returning-ACK handling, HA-secret publication, bounded validated-client state, and status counters are wired into the worker path. | +| `screen.rs` | screen | Pre-session attack-protection checks (land, TCP SYN+FIN, no-flag, FIN-without-ACK, ICMP frag, plus rate-limits). Mirrors `bpf/xdp/xdp_screen.c`. Also contains the #1374 userspace SYN-cookie mint/validate core, fixed-size validated-client admission table, and session-miss ACK validation hook. The lower screen tests cover bounded 4-way validated-client cache replacement; validated-client cache expiration, secret-epoch rotation, and HA-safe secret/cache survivability are still deferred #1374 work. The Go `syn-cookie` capability gate must stay closed until SYN-ACK TX replies, ACK RST emission, HA-secret publication, status counters, and integration validation are wired into the worker path. | | `policy.rs` | policy | Zone-pair → permit/deny + forwarding-class + DSCP-rewrite + filter chaining. `ZonePairKey` is a `u32` (`from_id << 16 \| to_id`); `JUNOS_GLOBAL_ZONE_ID = u16::MAX` is the sentinel for the global zone. | | `nat.rs` | NAT44 | Source / destination / static NAT decisions. `NatDecision` carries `rewrite_src` and `rewrite_dst` Options the TX path consumes. | | `nat64.rs` | NAT64 | RFC 6052 IPv4↔IPv6 translation. `Nat64Prefix` is the 96-bit + IPv4-pool config; `Nat64ReverseInfo` carries the original IPv6 tuple for reverse translation. | diff --git a/userspace-dp/src/afxdp/forwarding_build.rs b/userspace-dp/src/afxdp/forwarding_build.rs index 57578d44e..7a244308f 100644 --- a/userspace-dp/src/afxdp/forwarding_build.rs +++ b/userspace-dp/src/afxdp/forwarding_build.rs @@ -23,6 +23,7 @@ pub(super) fn build_screen_profiles(snapshot: &ConfigSnapshot) -> FxHashMap 0 { 18 } else { 14 }; + let screen_pkt = extract_screen_info( + packet_frame, + meta.addr_family, + meta.protocol, + meta.tcp_flags, + meta.pkt_len, + flow.src_ip, + flow.dst_ip, + flow.forward_key.src_port, + flow.forward_key.dst_port, + l3_off, + ); + match screen.check_packet_with_zone_id(zone_name, zone_id, &screen_pkt, now_secs) { + ScreenVerdict::Pass => StageOutcome::Continue(()), + ScreenVerdict::Drop(_) | ScreenVerdict::SynCookieChallenge(_) => { + counters.touched = true; + counters.screen_drops += 1; + StageOutcome::RecycleAndContinue + } + } +} + +/// SYN-cookie returning ACK validation on the session-miss path. +/// +/// This runs after normal session lookup has failed, so established ACK traffic +/// keeps its normal fast/session path. A valid cookie ACK is consumed without +/// creating a session; the validated-client cache lets the client's next SYN +/// traverse the ordinary policy/NAT/session path. Invalid cookie ACKs are +/// dropped while cookie mode is active. Poll-stage coverage intentionally pins +/// only the operational drop/bypass behavior here; the lower screen runtime +/// owns cache mechanics. `screen_tests.rs` covers bounded 4-way validated-client +/// cache replacement, while cache expiration, secret-epoch rotation, and +/// HA-safe secret/cache survivability remain explicit #1374 follow-ups. +#[inline] +pub(super) fn stage_screen_syn_cookie_ack_on_session_miss( + flow: Option<&SessionFlow>, + packet_frame: &[u8], + meta: UserspaceDpMeta, + ingress_zone_override: Option, + now_secs: u64, + screen: &mut ScreenState, + counters: &mut BatchCounters, + worker_ctx: &WorkerContext, +) -> StageOutcome<()> { + if !screen.has_profiles() { + return StageOutcome::Continue(()); + } + let Some(flow) = flow else { + return StageOutcome::Continue(()); + }; + let zone_id = ingress_zone_override + .filter(|id| worker_ctx.forwarding.zone_id_to_name.contains_key(id)) .or_else(|| { worker_ctx .forwarding .ifindex_to_zone_id .get(&(meta.ingress_ifindex as i32)) - .and_then(|id| worker_ctx.forwarding.zone_id_to_name.get(id)) - .map(|s| s.as_str()) + .copied() }); - let Some(zone_name) = zone_name else { + let Some(zone_id) = zone_id else { + return StageOutcome::Continue(()); + }; + let Some(zone_name) = worker_ctx + .forwarding + .zone_id_to_name + .get(&zone_id) + .map(|s| s.as_str()) + else { return StageOutcome::Continue(()); }; let l3_off = if meta.ingress_vlan_id > 0 { 18 } else { 14 }; @@ -274,12 +347,16 @@ pub(super) fn stage_screen_check( flow.forward_key.dst_port, l3_off, ); - if let ScreenVerdict::Drop(_reason) = screen.check_packet(zone_name, &screen_pkt, now_secs) { - counters.touched = true; - counters.screen_drops += 1; - return StageOutcome::RecycleAndContinue; + match screen.validate_syn_cookie_ack_on_session_miss(zone_name, zone_id, &screen_pkt, now_secs) + { + SynCookieAckVerdict::NotApplicable => StageOutcome::Continue(()), + SynCookieAckVerdict::Validated => StageOutcome::RecycleAndContinue, + SynCookieAckVerdict::Invalid => { + counters.touched = true; + counters.screen_drops += 1; + StageOutcome::RecycleAndContinue + } } - StageOutcome::Continue(()) } /// Stage 11 — IPsec passthrough. @@ -333,3 +410,245 @@ pub(super) fn stage_ipsec_passthrough_check( ); StageOutcome::RecycleAndContinue } + +#[cfg(test)] +mod tests { + use super::*; + use crate::test_zone_ids::TEST_LAN_ZONE_ID; + + const TEST_NOW_SECS: u64 = 128; + const TCP_FLAG_ACK: u8 = 0x10; + + fn tcp_v4_frame( + src: Ipv4Addr, + dst: Ipv4Addr, + src_port: u16, + dst_port: u16, + flags: u8, + seq: u32, + ack: u32, + ) -> Vec { + let mut frame = Vec::new(); + write_eth_header( + &mut frame, + [0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff], + [0x00, 0x25, 0x90, 0x12, 0x34, 0x56], + 0, + 0x0800, + ); + frame.extend_from_slice(&[ + 0x45, 0x00, 0x00, 0x28, 0x00, 0x01, 0x00, 0x00, 64, PROTO_TCP, 0x00, 0x00, + ]); + frame.extend_from_slice(&src.octets()); + frame.extend_from_slice(&dst.octets()); + let ip_csum = checksum16(&frame[14..34]); + frame[24..26].copy_from_slice(&ip_csum.to_be_bytes()); + frame.extend_from_slice(&src_port.to_be_bytes()); + frame.extend_from_slice(&dst_port.to_be_bytes()); + frame.extend_from_slice(&seq.to_be_bytes()); + frame.extend_from_slice(&ack.to_be_bytes()); + frame.extend_from_slice(&[0x50, flags, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00]); + recompute_l4_checksum_ipv4(&mut frame[14..], 20, PROTO_TCP, false).expect("tcp checksum"); + frame + } + + fn tcp_v4_meta(frame: &[u8], flags: u8) -> UserspaceDpMeta { + UserspaceDpMeta { + magic: USERSPACE_META_MAGIC, + version: USERSPACE_META_VERSION, + length: std::mem::size_of::() as u16, + ingress_ifindex: 24, + l3_offset: 14, + l4_offset: 34, + payload_offset: 54, + pkt_len: (frame.len() - 14) as u16, + addr_family: libc::AF_INET as u8, + protocol: PROTO_TCP, + tcp_flags: flags, + ..UserspaceDpMeta::default() + } + } + + fn syn_cookie_screen() -> ScreenState { + let mut profiles = FxHashMap::default(); + profiles.insert( + "lan".to_string(), + ScreenProfile { + syn_flood_threshold: 1, + syn_cookie: true, + ..ScreenProfile::default() + }, + ); + let mut screen = ScreenState::new(); + screen.update_profiles(profiles); + screen.update_syn_cookie_master_key(Some([0x42; 16])); + screen + } + + #[test] + fn session_miss_ack_stage_invokes_syn_cookie_runtime_validation() { + let mut screen = syn_cookie_screen(); + let forwarding = build_forwarding_state(&super::super::test_fixtures::nat_snapshot()); + let ident = BindingIdentity { + slot: 0, + queue_id: 0, + worker_id: 0, + interface: Arc::::from("reth1.0"), + ifindex: 24, + }; + let binding_lookup = WorkerBindingLookup::default(); + let ha_state = BTreeMap::new(); + let dynamic_neighbors = Arc::new(ShardedNeighborMap::default()); + let shared_sessions = Arc::new(Mutex::new(FastMap::default())); + let shared_nat_sessions = Arc::new(Mutex::new(FastMap::default())); + let shared_forward_wire_sessions = Arc::new(Mutex::new(FastMap::default())); + let shared_owner_rg_indexes = SharedSessionOwnerRgIndexes::default(); + let local_tunnel_deliveries = Arc::new(ArcSwap::from_pointee(BTreeMap::new())); + let recent_exceptions = Arc::new(Mutex::new(VecDeque::new())); + let last_resolution = Arc::new(Mutex::new(None)); + let peer_worker_commands = Vec::new(); + let dnat_fds = DnatTableFds::default(); + let rg_epochs = std::array::from_fn(|_| AtomicU32::new(0)); + let worker_ctx = WorkerContext { + ident: &ident, + binding_lookup: &binding_lookup, + forwarding: &forwarding, + ha_state: &ha_state, + dynamic_neighbors: &dynamic_neighbors, + shared_sessions: &shared_sessions, + shared_nat_sessions: &shared_nat_sessions, + shared_forward_wire_sessions: &shared_forward_wire_sessions, + shared_owner_rg_indexes: &shared_owner_rg_indexes, + slow_path: None, + local_tunnel_deliveries: &local_tunnel_deliveries, + recent_exceptions: &recent_exceptions, + last_resolution: &last_resolution, + peer_worker_commands: &peer_worker_commands, + dnat_fds: &dnat_fds, + rg_epochs: &rg_epochs, + }; + + let client = Ipv4Addr::new(192, 0, 2, 10); + let server = Ipv4Addr::new(198, 51, 100, 20); + let syn_frame = tcp_v4_frame(client, server, 49152, 443, TCP_FLAG_SYN, 1, 0); + let syn_meta = tcp_v4_meta(&syn_frame, TCP_FLAG_SYN); + let syn_flow = + parse_session_flow_from_bytes(&syn_frame, syn_meta).expect("session flow from SYN"); + let syn_info = extract_screen_info( + &syn_frame, + syn_meta.addr_family, + syn_meta.protocol, + syn_meta.tcp_flags, + syn_meta.pkt_len, + syn_flow.src_ip, + syn_flow.dst_ip, + syn_flow.forward_key.src_port, + syn_flow.forward_key.dst_port, + syn_meta.l3_offset as usize, + ); + + assert_eq!( + screen.check_packet_with_zone_id("lan", TEST_LAN_ZONE_ID, &syn_info, TEST_NOW_SECS), + ScreenVerdict::Pass + ); + let _challenge = match screen.check_packet_with_zone_id( + "lan", + TEST_LAN_ZONE_ID, + &syn_info, + TEST_NOW_SECS, + ) { + ScreenVerdict::SynCookieChallenge(challenge) => challenge, + other => panic!("expected SYN-cookie challenge, got {other:?}"), + }; + + let invalid_ack_frame = tcp_v4_frame( + client, + server, + 49152, + 443, + TCP_FLAG_ACK, + 2, + 0xdead_beef, + ); + let invalid_ack_meta = tcp_v4_meta(&invalid_ack_frame, TCP_FLAG_ACK); + let invalid_ack_flow = + parse_session_flow_from_bytes(&invalid_ack_frame, invalid_ack_meta) + .expect("session flow from invalid ACK"); + let mut invalid_counters = BatchCounters::default(); + + assert!(matches!( + stage_screen_syn_cookie_ack_on_session_miss( + Some(&invalid_ack_flow), + &invalid_ack_frame, + invalid_ack_meta, + None, + TEST_NOW_SECS, + &mut screen, + &mut invalid_counters, + &worker_ctx, + ), + StageOutcome::RecycleAndContinue + )); + assert!( + invalid_counters.touched, + "invalid cookie ACK must be counted as a screen drop" + ); + assert_eq!(invalid_counters.screen_drops, 1); + + let challenge = match screen.check_packet_with_zone_id( + "lan", + TEST_LAN_ZONE_ID, + &syn_info, + TEST_NOW_SECS, + ) { + ScreenVerdict::SynCookieChallenge(challenge) => challenge, + other => panic!("invalid ACK must not install SYN-cookie bypass, got {other:?}"), + }; + + let ack_frame = tcp_v4_frame( + client, + server, + 49152, + 443, + TCP_FLAG_ACK, + 2, + challenge.cookie_isn.wrapping_add(1), + ); + let ack_meta = tcp_v4_meta(&ack_frame, TCP_FLAG_ACK); + let ack_flow = + parse_session_flow_from_bytes(&ack_frame, ack_meta).expect("session flow from ACK"); + let mut counters = BatchCounters::default(); + + assert!(matches!( + stage_screen_syn_cookie_ack_on_session_miss( + Some(&ack_flow), + &ack_frame, + ack_meta, + None, + TEST_NOW_SECS, + &mut screen, + &mut counters, + &worker_ctx, + ), + StageOutcome::RecycleAndContinue + )); + assert!( + !counters.touched, + "valid cookie ACK is consumed without counting a screen drop" + ); + assert_eq!(counters.screen_drops, 0); + + assert_eq!( + screen.check_packet_with_zone_id("lan", TEST_LAN_ZONE_ID, &syn_info, TEST_NOW_SECS), + ScreenVerdict::Pass, + "poll-stage session-miss ACK handling must invoke SYN-cookie validation" + ); + assert!( + matches!( + screen.check_packet_with_zone_id("lan", TEST_LAN_ZONE_ID, &syn_info, TEST_NOW_SECS), + ScreenVerdict::SynCookieChallenge(_) + ), + "validated SYN-cookie bypass must be single-use" + ); + } +} diff --git a/userspace-dp/src/protocol.rs b/userspace-dp/src/protocol.rs index 6b0b98222..232989b93 100644 --- a/userspace-dp/src/protocol.rs +++ b/userspace-dp/src/protocol.rs @@ -516,6 +516,8 @@ pub(crate) struct ScreenProfileSnapshot { pub udp_flood_threshold: u32, #[serde(rename = "syn_flood_threshold", default)] pub syn_flood_threshold: u32, + #[serde(rename = "syn_cookie", default)] + pub syn_cookie: bool, #[serde(rename = "session_limit_src", default)] pub session_limit_src: u32, #[serde(rename = "session_limit_dst", default)] diff --git a/userspace-dp/src/screen.rs b/userspace-dp/src/screen.rs index 48c6a7788..767f644ad 100644 --- a/userspace-dp/src/screen.rs +++ b/userspace-dp/src/screen.rs @@ -27,6 +27,7 @@ const PROTO_ICMPV6: u8 = 58; // TCP flag bits (matching BPF layout: FIN=0x01, SYN=0x02, RST=0x04, PSH=0x08, ACK=0x10, URG=0x20) const TCP_FIN: u8 = 0x01; const TCP_SYN: u8 = 0x02; +const TCP_RST: u8 = 0x04; const TCP_ACK: u8 = 0x10; const TCP_URG: u8 = 0x20; @@ -44,6 +45,11 @@ const SYN_COOKIE_MSS_SHIFT: u32 = SYN_COOKIE_MAC_BITS; const SYN_COOKIE_MAC_DOMAIN: u64 = u64::from_be_bytes(*b"xpf-sync"); const SYN_COOKIE_SECRET_LEFT_DOMAIN: u64 = u64::from_be_bytes(*b"xpf-sck0"); const SYN_COOKIE_SECRET_RIGHT_DOMAIN: u64 = u64::from_be_bytes(*b"xpf-sck1"); +const SYN_COOKIE_CACHE_LEFT_DOMAIN: u64 = u64::from_be_bytes(*b"xpf-scv0"); +const SYN_COOKIE_CACHE_RIGHT_DOMAIN: u64 = u64::from_be_bytes(*b"xpf-scv1"); +const SYN_COOKIE_VALIDATED_CACHE_CAPACITY: usize = 4096; +const SYN_COOKIE_VALIDATED_CACHE_WAYS: usize = 4; +const SYN_COOKIE_VALIDATED_CACHE_TTL_SECS: u64 = SynCookieCodec::EPOCH_SECS; const _: [(); SYN_COOKIE_ISN_BITS as usize] = [(); SYN_COOKIE_LAYOUT_BITS as usize]; /// Three-bit MSS table encoded in userspace SYN cookies. @@ -52,7 +58,7 @@ const _: [(); SYN_COOKIE_ISN_BITS as usize] = [(); SYN_COOKIE_LAYOUT_BITS as usi /// selection can choose the largest value not exceeding the peer-advertised MSS. pub(crate) const SYN_COOKIE_MSS_VALUES: [u16; 8] = [536, 1200, 1300, 1360, 1400, 1440, 1460, 8960]; -#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] #[cfg_attr(not(test), allow(dead_code))] pub(crate) struct SynCookieTuple { pub src_ip: IpAddr, @@ -81,6 +87,21 @@ pub(crate) struct SynCookieValidation { pub mss: u16, } +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[cfg_attr(not(test), allow(dead_code))] +pub(crate) struct SynCookieChallenge { + pub cookie_isn: u32, + pub peer_mss: u16, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[cfg_attr(not(test), allow(dead_code))] +pub(crate) enum SynCookieAckVerdict { + NotApplicable, + Validated, + Invalid, +} + #[derive(Debug, Clone, Copy)] #[cfg_attr(not(test), allow(dead_code))] pub(crate) struct SynCookieCodec { @@ -193,6 +214,19 @@ impl SynCookieCodec { [left.finish(), right.finish()] } + + fn cache_hash_keys(&self) -> [u64; 2] { + let k0 = u64::from_le_bytes(self.master_key[0..8].try_into().expect("fixed slice")); + let k1 = u64::from_le_bytes(self.master_key[8..16].try_into().expect("fixed slice")); + + let mut left = SipHash24::new(k0, k1); + left.write_u64(SYN_COOKIE_CACHE_LEFT_DOMAIN); + + let mut right = SipHash24::new(k0, k1); + right.write_u64(SYN_COOKIE_CACHE_RIGHT_DOMAIN); + + [left.finish(), right.finish()] + } } #[derive(Debug, Clone, Copy)] @@ -323,7 +357,10 @@ pub(crate) struct ScreenPacketInfo { pub dst_ip: IpAddr, pub src_port: u16, // host byte order pub dst_port: u16, // host byte order - pub pkt_len: u16, // total packet length from meta + pub tcp_seq: u32, + pub tcp_ack: u32, + pub tcp_mss: u16, + pub pkt_len: u16, // total packet length from meta pub is_fragment: bool, /// #1137: 1 = first fragment of a fragmented datagram (IPv4: MF=1 /// && offset==0; IPv6: MF=1 && offset==0). Mirrors the BPF @@ -354,10 +391,14 @@ pub(crate) struct ScreenProfile { pub icmp_flood_threshold: u32, // packets per second, 0 = disabled pub udp_flood_threshold: u32, // packets per second, 0 = disabled pub syn_flood_threshold: u32, // SYN packets per second per zone, 0 = disabled - pub session_limit_src: u32, // max sessions per source IP, 0 = disabled - pub session_limit_dst: u32, // max sessions per destination IP, 0 = disabled - pub port_scan_threshold: u32, // unique dst ports per src IP within window, 0 = disabled - pub ip_sweep_threshold: u32, // unique dst IPs per src IP within window, 0 = disabled + /// Enable SYN-cookie challenge/validation behavior for SYN flood threshold + /// crossings. Defaults false so rate-based SYN flood behavior remains a + /// plain drop until the control plane explicitly enables cookie mode. + pub syn_cookie: bool, + pub session_limit_src: u32, // max sessions per source IP, 0 = disabled + pub session_limit_dst: u32, // max sessions per destination IP, 0 = disabled + pub port_scan_threshold: u32, // unique dst ports per src IP within window, 0 = disabled + pub ip_sweep_threshold: u32, // unique dst IPs per src IP within window, 0 = disabled } /// Result of a screen check. @@ -365,6 +406,7 @@ pub(crate) struct ScreenProfile { pub(crate) enum ScreenVerdict { Pass, Drop(&'static str), + SynCookieChallenge(SynCookieChallenge), } /// Simple rate counter: counts events within a 1-second window. @@ -531,6 +573,185 @@ impl IpSweepTracker { } } +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] +struct SynCookieValidatedKey { + zone_id: u16, + tuple: SynCookieTuple, +} + +#[derive(Clone, Copy, Debug)] +struct SynCookieValidatedEntry { + key: SynCookieValidatedKey, + expires_secs: u64, + age: u64, +} + +#[derive(Clone, Copy, Debug)] +struct SynCookieValidatedSet { + entries: [Option; SYN_COOKIE_VALIDATED_CACHE_WAYS], +} + +impl Default for SynCookieValidatedSet { + fn default() -> Self { + Self { + entries: [None; SYN_COOKIE_VALIDATED_CACHE_WAYS], + } + } +} + +#[derive(Debug, Clone)] +struct SynCookieValidatedCache { + sets: Box<[SynCookieValidatedSet]>, + ttl_secs: u64, + hash_keys: [u64; 2], + len: usize, + clock: u64, +} + +impl Default for SynCookieValidatedCache { + fn default() -> Self { + Self::new( + SYN_COOKIE_VALIDATED_CACHE_CAPACITY, + SYN_COOKIE_VALIDATED_CACHE_TTL_SECS, + ) + } +} + +impl SynCookieValidatedCache { + fn new(capacity: usize, ttl_secs: u64) -> Self { + let set_count = capacity.div_ceil(SYN_COOKIE_VALIDATED_CACHE_WAYS); + Self { + sets: vec![SynCookieValidatedSet::default(); set_count].into_boxed_slice(), + ttl_secs, + hash_keys: [0x0706_0504_0302_0100, 0x0f0e_0d0c_0b0a_0908], + len: 0, + clock: 0, + } + } + + fn insert(&mut self, zone_id: u16, tuple: SynCookieTuple, now_secs: u64) { + if self.sets.is_empty() { + return; + } + let key = SynCookieValidatedKey { zone_id, tuple }; + let set_index = self.set_index(&key); + self.clock = self.clock.wrapping_add(1); + let set = &mut self.sets[set_index]; + let new_entry = SynCookieValidatedEntry { + key, + expires_secs: now_secs.saturating_add(self.ttl_secs), + age: self.clock, + }; + + let mut empty_or_expired = None; + let mut oldest_index = 0; + let mut oldest_age = u64::MAX; + + for index in 0..SYN_COOKIE_VALIDATED_CACHE_WAYS { + match set.entries[index] { + Some(entry) if entry.key == key => { + set.entries[index] = Some(new_entry); + return; + } + Some(entry) if entry.expires_secs <= now_secs => { + empty_or_expired.get_or_insert(index); + } + Some(entry) => { + if entry.age < oldest_age { + oldest_age = entry.age; + oldest_index = index; + } + } + None => { + empty_or_expired.get_or_insert(index); + } + } + } + + let replace_index = empty_or_expired.unwrap_or(oldest_index); + if set.entries[replace_index].is_none() { + self.len += 1; + } + set.entries[replace_index] = Some(new_entry); + } + + fn take_valid(&mut self, zone_id: u16, tuple: SynCookieTuple, now_secs: u64) -> bool { + if self.sets.is_empty() { + return false; + } + let key = SynCookieValidatedKey { zone_id, tuple }; + let set_index = self.set_index(&key); + let set = &mut self.sets[set_index]; + let mut valid = false; + + for index in 0..SYN_COOKIE_VALIDATED_CACHE_WAYS { + let Some(entry) = set.entries[index] else { + continue; + }; + if entry.key == key { + valid = entry.expires_secs > now_secs; + set.entries[index] = None; + self.len = self.len.saturating_sub(1); + break; + } + if entry.expires_secs <= now_secs { + set.entries[index] = None; + self.len = self.len.saturating_sub(1); + } + } + + valid + } + + fn set_hash_keys(&mut self, hash_keys: [u64; 2]) { + if self.hash_keys != hash_keys { + self.hash_keys = hash_keys; + self.clear(); + } + } + + fn clear(&mut self) { + for set in self.sets.iter_mut() { + *set = SynCookieValidatedSet::default(); + } + self.len = 0; + self.clock = 0; + } + + fn set_index(&self, key: &SynCookieValidatedKey) -> usize { + debug_assert!(!self.sets.is_empty()); + (self.key_hash(key) as usize) % self.sets.len() + } + + fn key_hash(&self, key: &SynCookieValidatedKey) -> u64 { + let mut sip = SipHash24::new(self.hash_keys[0], self.hash_keys[1]); + sip.write_u16(key.zone_id); + sip.write_ip(key.tuple.src_ip); + sip.write_ip(key.tuple.dst_ip); + sip.write_u16(key.tuple.src_port); + sip.write_u16(key.tuple.dst_port); + sip.finish() + } + + #[cfg(test)] + fn len(&self) -> usize { + self.len + } + + #[cfg(test)] + fn capacity(&self) -> usize { + self.sets.len() * SYN_COOKIE_VALIDATED_CACHE_WAYS + } + + #[cfg(test)] + fn debug_set_index(&self, zone_id: u16, tuple: SynCookieTuple) -> Option { + if self.sets.is_empty() { + return None; + } + Some(self.set_index(&SynCookieValidatedKey { zone_id, tuple })) + } +} + /// Per-zone screen state with mutable rate counters and advanced trackers. pub(crate) struct ScreenState { profiles: FxHashMap, // zone_name -> profile @@ -538,6 +759,9 @@ pub(crate) struct ScreenState { icmp_counters: FxHashMap, udp_counters: FxHashMap, syn_counters: FxHashMap, + syn_cookie_active_until_secs: FxHashMap, + syn_cookie_codec: Option, + syn_cookie_validated: SynCookieValidatedCache, // Advanced screen trackers (shared across all zones since they track per-IP) session_limits: SessionLimitTracker, port_scan: PortScanTracker, @@ -552,6 +776,9 @@ impl ScreenState { icmp_counters: FxHashMap::default(), udp_counters: FxHashMap::default(), syn_counters: FxHashMap::default(), + syn_cookie_active_until_secs: FxHashMap::default(), + syn_cookie_codec: None, + syn_cookie_validated: SynCookieValidatedCache::default(), session_limits: SessionLimitTracker::default(), port_scan: PortScanTracker::default(), ip_sweep: IpSweepTracker::default(), @@ -565,9 +792,36 @@ impl ScreenState { self.icmp_counters.retain(|k, _| profiles.contains_key(k)); self.udp_counters.retain(|k, _| profiles.contains_key(k)); self.syn_counters.retain(|k, _| profiles.contains_key(k)); + self.syn_cookie_active_until_secs + .retain(|k, _| profiles.contains_key(k)); + for zone in profiles.keys() { + self.icmp_counters.entry(zone.clone()).or_default(); + self.udp_counters.entry(zone.clone()).or_default(); + self.syn_counters.entry(zone.clone()).or_default(); + self.syn_cookie_active_until_secs + .entry(zone.clone()) + .or_insert(0); + } self.profiles = profiles; } + /// Publish the cluster-wide SYN-cookie master key into this worker's screen + /// state. Until HA-safe publication is wired, production snapshots leave this + /// unset and SYN-cookie mode fails closed instead of minting local-only + /// cookies. + #[cfg_attr(not(test), allow(dead_code))] + pub fn update_syn_cookie_master_key(&mut self, master_key: Option<[u8; 16]>) { + if let Some(master_key) = master_key { + let codec = SynCookieCodec::new(master_key); + self.syn_cookie_validated + .set_hash_keys(codec.cache_hash_keys()); + self.syn_cookie_codec = Some(codec); + } else { + self.syn_cookie_codec = None; + self.syn_cookie_validated.clear(); + } + } + /// Returns true if any zone has a screen profile configured. pub fn has_profiles(&self) -> bool { !self.profiles.is_empty() @@ -576,11 +830,25 @@ impl ScreenState { /// Run all screen checks for a packet arriving on the given zone. /// Returns `ScreenVerdict::Pass` if the packet is clean, or /// `ScreenVerdict::Drop(reason)` if it should be dropped. + #[cfg_attr(not(test), allow(dead_code))] pub fn check_packet( &mut self, zone: &str, pkt: &ScreenPacketInfo, now_secs: u64, + ) -> ScreenVerdict { + self.check_packet_with_zone_id(zone, 0, pkt, now_secs) + } + + /// Run all screen checks with the stable numeric zone id available to + /// SYN-cookie MACs. `check_packet` remains for callers/tests that do not + /// need cookie mode. + pub fn check_packet_with_zone_id( + &mut self, + zone: &str, + zone_id: u16, + pkt: &ScreenPacketInfo, + now_secs: u64, ) -> ScreenVerdict { let profile = match self.profiles.get(zone) { Some(p) => p.clone(), // clone to avoid borrow issues with &mut self @@ -679,17 +947,19 @@ impl ScreenState { if profile.icmp_flood_threshold > 0 && (pkt.protocol == PROTO_ICMP || pkt.protocol == PROTO_ICMPV6) { - let counter = self.icmp_counters.entry(zone.to_string()).or_default(); - if counter.increment(now_secs, profile.icmp_flood_threshold) { - return ScreenVerdict::Drop("icmp-flood"); + if let Some(counter) = self.icmp_counters.get_mut(zone) { + if counter.increment(now_secs, profile.icmp_flood_threshold) { + return ScreenVerdict::Drop("icmp-flood"); + } } } // UDP flood if profile.udp_flood_threshold > 0 && pkt.protocol == PROTO_UDP { - let counter = self.udp_counters.entry(zone.to_string()).or_default(); - if counter.increment(now_secs, profile.udp_flood_threshold) { - return ScreenVerdict::Drop("udp-flood"); + if let Some(counter) = self.udp_counters.get_mut(zone) { + if counter.increment(now_secs, profile.udp_flood_threshold) { + return ScreenVerdict::Drop("udp-flood"); + } } } @@ -697,9 +967,45 @@ impl ScreenState { if profile.syn_flood_threshold > 0 && pkt.protocol == PROTO_TCP { let tf = pkt.tcp_flags; if (tf & TCP_SYN) != 0 && (tf & TCP_ACK) == 0 { - let counter = self.syn_counters.entry(zone.to_string()).or_default(); - if counter.increment(now_secs, profile.syn_flood_threshold) { - return ScreenVerdict::Drop("syn-flood"); + let syn_cookie_validated = profile.syn_cookie + && self.syn_cookie_validated.take_valid( + zone_id, + SynCookieTuple::from_packet(pkt), + now_secs, + ); + if !syn_cookie_validated { + if let Some(counter) = self.syn_counters.get_mut(zone) + && counter.increment(now_secs, profile.syn_flood_threshold) + { + if profile.syn_cookie { + if let Some(active_until) = + self.syn_cookie_active_until_secs.get_mut(zone) + { + *active_until = now_secs.saturating_add(SynCookieCodec::EPOCH_SECS); + } else { + debug_assert!( + false, + "screen profile update prepopulates SYN-cookie active state" + ); + } + let Some(codec) = self.syn_cookie_codec else { + return ScreenVerdict::Drop("syn-cookie-unavailable"); + }; + let full_epoch = + SynCookieCodec::full_epoch_from_monotonic_secs(now_secs); + let cookie_isn = codec.mint_isn( + SynCookieTuple::from_packet(pkt), + zone_id, + full_epoch, + pkt.tcp_mss, + ); + return ScreenVerdict::SynCookieChallenge(SynCookieChallenge { + cookie_isn, + peer_mss: pkt.tcp_mss, + }); + } + return ScreenVerdict::Drop("syn-flood"); + } } } } @@ -762,6 +1068,65 @@ impl ScreenState { ScreenVerdict::Pass } + /// Validate a returning SYN-cookie ACK only after the caller has already + /// established that no normal session matched. This preserves established + /// ACK traffic and prevents random ACKs from installing sessions while a + /// cookie flood is active. + pub fn validate_syn_cookie_ack_on_session_miss( + &mut self, + zone: &str, + zone_id: u16, + pkt: &ScreenPacketInfo, + now_secs: u64, + ) -> SynCookieAckVerdict { + let Some(profile) = self.profiles.get(zone) else { + return SynCookieAckVerdict::NotApplicable; + }; + if !profile.syn_cookie || profile.syn_flood_threshold == 0 || pkt.protocol != PROTO_TCP { + return SynCookieAckVerdict::NotApplicable; + } + let flags = pkt.tcp_flags; + if (flags & TCP_ACK) == 0 || (flags & TCP_SYN) != 0 { + return SynCookieAckVerdict::NotApplicable; + } + if self + .syn_cookie_active_until_secs + .get(zone) + .copied() + .is_none_or(|until| until <= now_secs) + { + return SynCookieAckVerdict::NotApplicable; + } + if (flags & (TCP_FIN | TCP_RST)) != 0 { + return SynCookieAckVerdict::Invalid; + } + let Some(codec) = self.syn_cookie_codec else { + return SynCookieAckVerdict::Invalid; + }; + let cookie_isn = pkt.tcp_ack.wrapping_sub(1); + let current_epoch = SynCookieCodec::full_epoch_from_monotonic_secs(now_secs); + let tuple = SynCookieTuple::from_packet(pkt); + if codec + .validate_isn(tuple, zone_id, current_epoch, cookie_isn) + .is_some() + { + self.syn_cookie_validated.insert(zone_id, tuple, now_secs); + SynCookieAckVerdict::Validated + } else { + SynCookieAckVerdict::Invalid + } + } + + #[cfg(test)] + fn syn_cookie_validated_len(&self) -> usize { + self.syn_cookie_validated.len() + } + + #[cfg(test)] + fn syn_cookie_active_zone_count(&self) -> usize { + self.syn_cookie_active_until_secs.len() + } + /// Notify the screen state that a new session was created. This increments /// per-IP session counters for session limiting. #[cfg_attr(not(test), allow(dead_code))] @@ -810,6 +1175,9 @@ pub(crate) fn extract_screen_info( dst_ip, src_port, dst_port, + tcp_seq: 0, + tcp_ack: 0, + tcp_mss: 0, pkt_len, is_fragment: false, is_first_fragment: false, @@ -818,6 +1186,8 @@ pub(crate) fn extract_screen_info( ip_total_len: 0, }; + let mut tcp_offset: Option = None; + if addr_family == libc::AF_INET as u8 && l3_offset + 20 <= frame.len() { // IPv4: extract IHL, total_len, frag_off from the fixed 20-byte // base header. frag_off is bytes 6-7, big-endian. @@ -830,6 +1200,7 @@ pub(crate) fn extract_screen_info( info.is_fragment = (info.ip_frag_off & 0x3FFF) != 0; info.is_first_fragment = (info.ip_frag_off & 0x2000) != 0 && (info.ip_frag_off & 0x1FFF) == 0; + tcp_offset = Some(l3_offset + (info.ip_ihl as usize) * 4); } else if addr_family == libc::AF_INET6 as u8 && l3_offset + 40 <= frame.len() { // IPv6: walk the extension header chain looking for // NEXTHDR_FRAGMENT (44). Fixed IPv6 base header is 40 bytes. @@ -883,6 +1254,13 @@ pub(crate) fn extract_screen_info( info.ip_frag_off = frag_off; info.is_fragment = (frag_off & 0x1) != 0 || (frag_off & 0xFFF8) != 0; info.is_first_fragment = (frag_off & 0x1) != 0 && (frag_off & 0xFFF8) == 0; + if frame[offset] == PROTO_TCP { + tcp_offset = Some(offset + 8); + } + break; + } + PROTO_TCP => { + tcp_offset = Some(offset); break; } _ => break, @@ -890,6 +1268,42 @@ pub(crate) fn extract_screen_info( } } + if protocol == PROTO_TCP + && (!info.is_fragment || info.is_first_fragment) + && let Some(tcp_start) = tcp_offset + && tcp_start + 20 <= frame.len() + { + let tcp = &frame[tcp_start..]; + info.tcp_seq = u32::from_be_bytes([tcp[4], tcp[5], tcp[6], tcp[7]]); + info.tcp_ack = u32::from_be_bytes([tcp[8], tcp[9], tcp[10], tcp[11]]); + let data_offset = ((tcp[12] >> 4) as usize) * 4; + if data_offset >= 20 && tcp.len() >= data_offset { + let mut pos = 20; + while pos < data_offset { + let kind = tcp[pos]; + if kind == 0 { + break; + } + if kind == 1 { + pos += 1; + continue; + } + if pos + 2 > data_offset { + break; + } + let opt_len = tcp[pos + 1] as usize; + if opt_len < 2 || pos + opt_len > data_offset { + break; + } + if kind == 2 && opt_len == 4 { + info.tcp_mss = u16::from_be_bytes([tcp[pos + 2], tcp[pos + 3]]); + break; + } + pos += opt_len; + } + } + } + info } diff --git a/userspace-dp/src/screen_tests.rs b/userspace-dp/src/screen_tests.rs index 74425e2fd..6cc1e61b0 100644 --- a/userspace-dp/src/screen_tests.rs +++ b/userspace-dp/src/screen_tests.rs @@ -21,6 +21,7 @@ fn default_profile() -> ScreenProfile { icmp_flood_threshold: 0, udp_flood_threshold: 0, syn_flood_threshold: 0, + syn_cookie: false, session_limit_src: 0, session_limit_dst: 0, port_scan_threshold: 0, @@ -40,6 +41,9 @@ fn tcp_pkt(src: IpAddr, dst: IpAddr, src_port: u16, dst_port: u16, flags: u8) -> dst_ip: dst, src_port, dst_port, + tcp_seq: 1, + tcp_ack: 0, + tcp_mss: 1460, pkt_len: 60, is_fragment: false, is_first_fragment: false, @@ -65,6 +69,9 @@ fn icmp_pkt(src: IpAddr, dst: IpAddr, pkt_len: u16) -> ScreenPacketInfo { dst_ip: dst, src_port: 0, dst_port: 0, + tcp_seq: 0, + tcp_ack: 0, + tcp_mss: 0, pkt_len, is_fragment: false, is_first_fragment: false, @@ -86,6 +93,9 @@ fn udp_pkt(src: IpAddr, dst: IpAddr) -> ScreenPacketInfo { dst_ip: dst, src_port: 5000, dst_port: 5001, + tcp_seq: 0, + tcp_ack: 0, + tcp_mss: 0, pkt_len: 100, is_fragment: false, is_first_fragment: false, @@ -349,6 +359,9 @@ fn teardrop_drops() { dst_ip: IpAddr::V4(Ipv4Addr::new(10, 0, 2, 1)), src_port: 1234, dst_port: 80, + tcp_seq: 1, + tcp_ack: 0, + tcp_mss: 1460, pkt_len: 28, is_fragment: true, is_first_fragment: false, @@ -373,6 +386,9 @@ fn teardrop_first_fragment_passes() { dst_ip: IpAddr::V4(Ipv4Addr::new(10, 0, 2, 1)), src_port: 1234, dst_port: 80, + tcp_seq: 1, + tcp_ack: 0, + tcp_mss: 1460, pkt_len: 24, is_fragment: true, // #1137 Copilot review: ip_frag_off=0x2000 means MF=1 && @@ -914,10 +930,14 @@ fn syn_flood_disabled_passes() { // ================================================================ fn syn_cookie_codec() -> SynCookieCodec { - SynCookieCodec::new([ + SynCookieCodec::new(syn_cookie_key()) +} + +fn syn_cookie_key() -> [u8; 16] { + [ 0x10, 0x21, 0x32, 0x43, 0x54, 0x65, 0x76, 0x87, 0x98, 0xa9, 0xba, 0xcb, 0xdc, 0xed, 0xfe, 0x0f, - ]) + ] } fn syn_cookie_tuple() -> SynCookieTuple { @@ -1071,9 +1091,11 @@ fn syn_cookie_epoch_low_bits_wrap_rejects_32_epoch_old_cookie() { let cookie = codec.mint_isn(tuple, 7, old_epoch, 1460); assert_eq!(old_epoch & 0x1f, current_epoch & 0x1f); - assert!(codec - .validate_isn(tuple, 7, current_epoch, cookie) - .is_none()); + assert!( + codec + .validate_isn(tuple, 7, current_epoch, cookie) + .is_none() + ); } #[test] @@ -1101,6 +1123,373 @@ fn syn_cookie_validation_tries_current_and_previous_full_epoch() { assert!(codec.validate_isn(tuple, 7, 42, older_cookie).is_none()); } +#[test] +fn syn_cookie_chosen_when_threshold_exceeded() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 2; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let pkt = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &pkt, 128), + ScreenVerdict::Pass + ); + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &pkt, 128), + ScreenVerdict::Pass + ); + let expected_isn = syn_cookie_codec().mint_isn( + SynCookieTuple::from_packet(&pkt), + 7, + SynCookieCodec::full_epoch_from_monotonic_secs(128), + pkt.tcp_mss, + ); + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &pkt, 128), + ScreenVerdict::SynCookieChallenge(SynCookieChallenge { + cookie_isn: expected_isn, + peer_mss: 1460, + }) + ); +} + +#[test] +fn syn_cookie_without_published_secret_fails_closed() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + let pkt = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &pkt, 128), + ScreenVerdict::Pass + ); + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &pkt, 128), + ScreenVerdict::Drop("syn-cookie-unavailable") + ); +} + +#[test] +fn syn_cookie_ack_validation_marks_next_syn_bypass_without_session_creation() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let syn = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + let challenge = match state.check_packet_with_zone_id("trust", 7, &syn, 128) { + ScreenVerdict::SynCookieChallenge(challenge) => challenge, + other => panic!("expected SYN-cookie challenge, got {other:?}"), + }; + + let mut ack = syn.clone(); + ack.tcp_flags = TCP_ACK; + ack.tcp_seq = 2; + ack.tcp_ack = challenge.cookie_isn.wrapping_add(1); + assert_eq!( + state.validate_syn_cookie_ack_on_session_miss("trust", 7, &ack, 128), + SynCookieAckVerdict::Validated + ); + assert_eq!(state.syn_cookie_validated_len(), 1); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + assert_eq!( + state.syn_cookie_validated_len(), + 0, + "validated tuple is single-use" + ); +} + +#[test] +fn syn_cookie_validated_syn_still_runs_later_screen_checks() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + profile.session_limit_src = 1; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let syn = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + let challenge = match state.check_packet_with_zone_id("trust", 7, &syn, 128) { + ScreenVerdict::SynCookieChallenge(challenge) => challenge, + other => panic!("expected SYN-cookie challenge, got {other:?}"), + }; + + let mut ack = syn.clone(); + ack.tcp_flags = TCP_ACK; + ack.tcp_ack = challenge.cookie_isn.wrapping_add(1); + assert_eq!( + state.validate_syn_cookie_ack_on_session_miss("trust", 7, &ack, 128), + SynCookieAckVerdict::Validated + ); + + state.session_created(syn.src_ip, syn.dst_ip); + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Drop("session-limit-src") + ); + assert_eq!(state.syn_cookie_validated_len(), 0); +} + +#[test] +fn syn_cookie_invalid_ack_does_not_validate_client() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let syn = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + assert!(matches!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::SynCookieChallenge(_) + )); + + let mut ack = syn.clone(); + ack.tcp_flags = TCP_ACK; + ack.tcp_seq = 2; + ack.tcp_ack = 0xdead_beefu32; + assert_eq!( + state.validate_syn_cookie_ack_on_session_miss("trust", 7, &ack, 128), + SynCookieAckVerdict::Invalid + ); + assert_eq!(state.syn_cookie_validated_len(), 0); +} + +#[test] +fn syn_cookie_ack_fin_is_invalid_while_cookie_mode_is_active() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let syn = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + let challenge = match state.check_packet_with_zone_id("trust", 7, &syn, 128) { + ScreenVerdict::SynCookieChallenge(challenge) => challenge, + other => panic!("expected SYN-cookie challenge, got {other:?}"), + }; + + let mut ack_fin = syn.clone(); + ack_fin.tcp_flags = TCP_ACK | TCP_FIN; + ack_fin.tcp_ack = challenge.cookie_isn.wrapping_add(1); + assert_eq!( + state.validate_syn_cookie_ack_on_session_miss("trust", 7, &ack_fin, 128), + SynCookieAckVerdict::Invalid + ); + assert_eq!(state.syn_cookie_validated_len(), 0); +} + +#[test] +fn syn_cookie_validated_cache_is_bounded() { + let mut cache = SynCookieValidatedCache::new(4, 64); + assert_eq!(cache.capacity(), 4); + + let mut tuple = syn_cookie_tuple(); + for port in 40000..40032 { + tuple.src_port = port; + cache.insert(7, tuple, 100); + } + + assert_eq!(cache.len(), 4); + let mut evicted = syn_cookie_tuple(); + evicted.src_port = 40000; + assert!(!cache.take_valid(7, evicted, 100)); + evicted.src_port = 40027; + assert!(!cache.take_valid(7, evicted, 100)); + + let mut retained = syn_cookie_tuple(); + retained.src_port = 40028; + assert!(cache.take_valid(7, retained, 100)); + retained.src_port = 40031; + assert!(cache.take_valid(7, retained, 100)); +} + +#[test] +fn syn_cookie_validated_cache_index_is_keyed() { + let mut left = SynCookieValidatedCache::new(64, 64); + left.set_hash_keys([0x1111_2222_3333_4444, 0x5555_6666_7777_8888]); + let mut right = SynCookieValidatedCache::new(64, 64); + right.set_hash_keys([0x9999_aaaa_bbbb_cccc, 0xdddd_eeee_ffff_0000]); + + let mut tuple = syn_cookie_tuple(); + let differs = (0..1024).any(|offset| { + tuple.src_port = 30000 + offset; + left.debug_set_index(7, tuple) != right.debug_set_index(7, tuple) + }); + + assert!( + differs, + "cache slot selection must be keyed rather than attacker-predictable" + ); +} + +#[test] +fn syn_cookie_invalid_ack_flood_does_not_grow_validated_cache() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let syn = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + assert!(matches!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::SynCookieChallenge(_) + )); + + for offset in 0..1024 { + let mut ack = syn.clone(); + ack.tcp_flags = TCP_ACK; + ack.src_port = 30000 + offset; + ack.tcp_ack = 0xdead_0000u32.wrapping_add(offset as u32); + assert_eq!( + state.validate_syn_cookie_ack_on_session_miss("trust", 7, &ack, 128), + SynCookieAckVerdict::Invalid + ); + } + + assert_eq!(state.syn_cookie_validated_len(), 0); +} + +#[test] +fn syn_cookie_master_key_rotation_clears_validated_cache() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile); + state.update_syn_cookie_master_key(Some(syn_cookie_key())); + let syn = tcp_pkt( + IpAddr::V4(Ipv4Addr::new(192, 0, 2, 10)), + IpAddr::V4(Ipv4Addr::new(198, 51, 100, 20)), + 49152, + 443, + TCP_SYN, + ); + + assert_eq!( + state.check_packet_with_zone_id("trust", 7, &syn, 128), + ScreenVerdict::Pass + ); + let challenge = match state.check_packet_with_zone_id("trust", 7, &syn, 128) { + ScreenVerdict::SynCookieChallenge(challenge) => challenge, + other => panic!("expected SYN-cookie challenge, got {other:?}"), + }; + let mut ack = syn.clone(); + ack.tcp_flags = TCP_ACK; + ack.tcp_ack = challenge.cookie_isn.wrapping_add(1); + assert_eq!( + state.validate_syn_cookie_ack_on_session_miss("trust", 7, &ack, 128), + SynCookieAckVerdict::Validated + ); + assert_eq!(state.syn_cookie_validated_len(), 1); + + state.update_syn_cookie_master_key(None); + + assert_eq!(state.syn_cookie_validated_len(), 0); +} + +#[test] +fn update_profiles_prepopulates_syn_cookie_active_state() { + let mut profile = ScreenProfile::default(); + profile.syn_flood_threshold = 1; + profile.syn_cookie = true; + let mut state = make_state("trust", profile.clone()); + assert_eq!(state.syn_cookie_active_zone_count(), 1); + + let mut profiles = FxHashMap::default(); + profiles.insert("trust".to_string(), profile.clone()); + profiles.insert("untrust".to_string(), profile); + state.update_profiles(profiles); + assert_eq!(state.syn_cookie_active_zone_count(), 2); + + state.update_profiles(FxHashMap::default()); + assert_eq!(state.syn_cookie_active_zone_count(), 0); +} + +#[test] +fn syn_cookie_validated_cache_refresh_extends_ttl() { + let mut cache = SynCookieValidatedCache::new(4, 10); + let tuple_refreshed = syn_cookie_tuple(); + let mut tuple_old = syn_cookie_tuple(); + tuple_old.src_port += 1; + cache.insert(7, tuple_refreshed, 100); + cache.insert(7, tuple_old, 100); + cache.insert(7, tuple_refreshed, 109); + assert!(!cache.take_valid(7, tuple_old, 110)); + assert!(cache.take_valid(7, tuple_refreshed, 110)); +} + // ================================================================ // Profile update // ================================================================