Skip to content

perf(nn): amortise CompileIP/CompileDNS via per-container matcher cache#42

Open
entlein wants to merge 3 commits into
feat/network-wildcardsfrom
feat/network-wildcards-matcher-cache
Open

perf(nn): amortise CompileIP/CompileDNS via per-container matcher cache#42
entlein wants to merge 3 commits into
feat/network-wildcardsfrom
feat/network-wildcards-matcher-cache

Conversation

@entlein
Copy link
Copy Markdown

@entlein entlein commented May 13, 2026

Summary

Follow-up to PR #41 (network-wildcards). Caches the compiled
`networkmatch.IPMatcher` / `DNSMatcher` per `(containerID, neighborIndex)`,
invalidated by the profile's `SyncChecksumMetadataKey` annotation. Previously
every CEL function-cache miss re-compiled every `NetworkNeighbor`'s entries;
now each neighbor's matcher is built at most once per profile-checksum
lifetime and reused across subsequent misses.

This builds on top of the existing CEL `functionCache` (which already
absorbs same-`(containerID, observed)` cache hits). The matcher cache
catches what slips through: unique-observation misses within a
profile-checksum window.

Benchmark evidence

Worst-case realistic profile (5 neighbors x 3 IP entries, observation misses
every entry), arm64, `-benchtime=1s`:

Bench ns/op B/op allocs/op
`EgressIP_Realistic_Cold` (no cache, baseline) 1733 1920 76
`EgressIP_Realistic_Hot` (this PR) 177 32 2
-90% time -98% bytes -97% allocs
`EgressDNS_Realistic_Cold` 1219 1800 41
`EgressDNS_Realistic_Hot` (this PR) 318 272 7
-74% time -85% bytes -83% allocs
`EgressIP_ChurningProfile` (checksum flips every call) 1527 1936 77

The churning case (pathological worst-case where every call invalidates
the cache) matches the cold-path baseline. Confirms the cache layer has
no measurable overhead of its own; the wins come strictly from
amortising compile across stable-checksum windows.

Design

```
type matcherCache struct { m sync.Map } // containerID -> *containerMatchers

type containerMatchers struct {
checksum string
egress []neighborMatchers // lazy-built per-index
ingress []neighborMatchers
}

type neighborMatchers struct {
ip *networkmatch.IPMatcher // nil until first match call
dns *networkmatch.DNSMatcher
}
```

Zero-value safe: `matcherCache{}` is a valid empty cache, so existing test
fixtures that construct `nnLibrary{}` directly continue to work without
explicit init.

Per-neighbor matchers are nil-init and lazily compiled on first use, so a
profile with 10 egress entries that only ever fires through 2 of them
pays compile cost for only those 2.

Test plan

  • Race detector: `go test -count=1 -race ./pkg/rulemanager/cel/libraries/networkneighborhood/` -> ok
  • All existing tests pass (cache lazy-init means no test fixture changes needed)
  • New `matcher_cache_bench_test.go` covers small/realistic profile shapes, hot/cold paths, and churning-profile pathological case
  • Cache invalidation works (churning bench confirms semantics; cache size stays bounded)

Base branch

This PR targets `feat/network-wildcards` (PR #41) because it depends on
the new `networkmatch.CompileIP` / `CompileDNS` API introduced there.
When #41 merges to main, the base of this PR will auto-rebase.

Profile-checksum-invalidated cache of compiled networkmatch.IPMatcher /
DNSMatcher per (containerID, neighborIndex). The previous code path
re-compiled every NetworkNeighbor's entries on each CEL function-cache
miss; this PR builds each matcher at most once per profile-checksum
lifetime and reuses it across subsequent misses.

Design:

  matcherCache (sync.Map) inside nnLibrary, zero-value safe so existing
  test fixtures that construct nnLibrary{} directly continue to work
  without changes.

  Per-container entry tagged with the profile's SyncChecksumMetadataKey
  annotation. On lookup: if checksum matches, reuse; else allocate a
  fresh containerMatchers and store with LoadOrStore (concurrent-safe).

  Per-neighbor matchers are nil-init and lazily compiled on first use,
  so a profile with 10 egress entries that only ever fires through 2 of
  them pays compile cost for only those 2.

Benchmarks (arm64, -benchtime=1s):

  IP, realistic profile (5 neighbors x 3 entries, observation misses all):
    Cold (per-call recompile): 1733 ns/op   1920 B/op   76 allocs/op
    Hot  (cached matchers)   :  177 ns/op     32 B/op    2 allocs/op
                             ~ -90% time, -98% bytes, -97% allocs

  DNS, realistic profile:
    Cold: 1219 ns/op   1800 B/op   41 allocs/op
    Hot :  318 ns/op    272 B/op    7 allocs/op
                             ~ -74% time, -85% bytes, -83% allocs

  Churning profile (checksum flips every iteration — pathological):
    1527 ns/op   1936 B/op   77 allocs/op
    Matches cold path: cache overhead itself is negligible; the savings
    come strictly from amortising compile across stable-checksum windows.

In production this stacks on top of the existing CEL functionCache
(which already absorbs same-(containerID,observed) cache hits). The
matcher cache catches what slips through: unique-observation cache
misses within a profile-checksum lifetime.

Touched:
  - matcher_cache.go             new file: cache impl
  - matcher_cache_bench_test.go  new file: comparison bench
  - network.go                   use cached matchers in all 6 CEL fns
  - nn.go                        matcherCache field on nnLibrary
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 9b8f8e33-142b-4d00-a103-76b5dd1da9b5

📥 Commits

Reviewing files that changed from the base of the PR and between 419ebba and 05ce6d9.

📒 Files selected for processing (1)
  • pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache_test.go

📝 Walkthrough

Walkthrough

This PR adds a per-container matcher cache to the network-neighborhood CEL library with lazy, atomic per-neighbor compilation of IP/DNS matchers, integrates the cache into CEL IP/DNS/port functions, and includes benchmarks and tests exercising hot/cold and churn scenarios.

Changes

Network Neighborhood Matcher Caching

Layer / File(s) Summary
Cache Data Model and Library Field
pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go, pkg/rulemanager/cel/libraries/networkneighborhood/nn.go
neighborMatchers and containerMatchers types store per-neighbor compiled matchers and per-container egress/ingress slices validated by checksum. matcherCache field added to nnLibrary for zero-value-safe concurrent memoization.
Lazy Building, Memoization, and Invalidation
pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go
getOrBuild uses sync.Map for lock-free cached lookup and concurrent checksum-race convergence; invalidate removes cached entries.
Per-neighbor Lazy IP/DNS Compilation
pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go
ipMatcher and dnsMatcher lazily compile and memoize networkmatch instances by merging deprecated singular fields with plural lists and publishing winners via atomic CAS.
IP and Domain Matching CEL Functions
pkg/rulemanager/cel/libraries/networkneighborhood/network.go
wasAddressInEgress, wasAddressInIngress, isDomainInEgress, and isDomainInIngress resolve profile checksums and use cached compiled matchers for neighbor slice evaluation, replacing removed back-compat helpers.
Port and Protocol Matching CEL Functions
pkg/rulemanager/cel/libraries/networkneighborhood/network.go
wasAddressPortProtocolInEgress and wasAddressPortProtocolInIngress gate port scanning on cached IP matcher results, then scan matched neighbors' Ports for protocol equality and exact port match, preserving int64 range validation.
Performance Benchmarks for Cache Scenarios
pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache_bench_test.go
Benchmarks measure matcher cache performance across small/realistic hot scenarios (cache retained), cold scenarios (per-iteration invalidation), and profile churn (alternating checksums); includes helpers to build synthetic profiles and wire the benchmark library.
Unit Tests for Cache Correctness
pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache_test.go
Tests cover concurrent first-build races, stale-entry replacement when neighbor shapes change, and checksum-preserved reuse across repeated getOrBuild calls.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a matcher cache to amortise (optimize) the compilation of IP/DNS matchers in the network neighborhood library.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing context, benchmark evidence, design details, and testing approach.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/network-wildcards-matcher-cache

Comment @coderabbitai help to get the list of available commands and usage tips.

@entlein
Copy link
Copy Markdown
Author

entlein commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go`:
- Around line 61-68: The current use of c.m.LoadOrStore returns an existing
containerMatchers even when it is stale (not just checksum-mismatched), causing
stale shape entries to be reused; replace the LoadOrStore logic in
matcher_cache.go with explicit Load and validation: call c.m.Load(containerID),
if present cast to *containerMatchers (containerMatchers) and check both
checksum and the staleness condition used earlier (the neighbor-count/shape
validation used around Lines 49–50); if the existing cm is stale or checksum
differs, call c.m.Store(containerID, fresh) and return fresh; if no entry was
found, store and return fresh. Keep references to c.m.Load, c.m.Store,
containerMatchers and checksum in your changes.
- Around line 79-96: The ip and dns pointer fields are being lazily populated
without synchronization causing race-detector failures; add per-field sync.Once
members (e.g., ipOnce and dnsOnce) to the neighborMatchers struct and change
containerMatchers.ipMatcher and dnsMatcher to call the appropriate Once.Do
wrapper that runs the existing compile/assignment logic so compilation still
happens at most once per field while preserving lazy-init semantics and avoiding
races (keep the existing entries construction and
networkmatch.CompileIP/CompileDNS calls inside the Once.Do closures and assign
the result to nm.ip / nm.dns).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 665f1acd-13bb-4898-8755-4a4c83f90081

📥 Commits

Reviewing files that changed from the base of the PR and between 4c90e22 and 93e6e1b.

📒 Files selected for processing (4)
  • pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache_bench_test.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/network.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/nn.go

Comment thread pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go Outdated
Comment thread pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go
…s replace (CR #42)

Two findings from CodeRabbit round 1, both fixed:

1. Stale-entry shape race in getOrBuild (Major)

   Old code used LoadOrStore on the staleness path and only replaced
   on checksum mismatch — but a shape mismatch (neighbor count change)
   could leak the stale entry to a caller whose profile has a different
   shape, which then index-panics in ipMatcher/dnsMatcher.

   Fix: when staleness is detected (by checksum OR shape), always
   Store unconditionally. Worst-case contention: several goroutines
   build shape-correct fresh entries and one Store wins; all callers
   still see a shape-correct entry. Orphans get GC'd.

2. Unsynchronised lazy-init of per-neighbor matchers (Critical)

   neighborMatchers.ip / .dns were *Matcher with a non-atomic 'if nil
   then build then assign' pattern — a real data race.

   Fix: switched to atomic.Pointer[networkmatch.IPMatcher] (and DNS).
   First-build callers may race on Compile but only one pointer wins
   via CompareAndSwap; everyone returns the winning matcher. Pure
   functions (no shared state) so duplicate Compile work is wasteful
   but not incorrect.

New tests in matcher_cache_test.go pin the contract:
  - TestMatcherCache_ConcurrentFirstBuild: 64 goroutines racing on
    the same slot, run under -race, asserts matchers are populated
    exactly once
  - TestMatcherCache_StaleEntryReplaced: shape-mismatch path returns
    a fresh containerMatchers, not the stale one
  - TestMatcherCache_ChecksumPreservedAcrossCalls: same checksum hits
    cache (no rebuild)

Benchmarks re-run after atomic.Pointer switch — negligible impact
(177 → 186 ns/op, still 8x faster than cold path). All headline
savings preserved.
@entlein
Copy link
Copy Markdown
Author

entlein commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.204 0.000 -100.0%
Peak CPU (cores) 0.218 0.000 -100.0%
Avg Memory (MiB) 275.210 0.000 -100.0%
Peak Memory (MiB) 276.594 0.000 -100.0%
Dedup Effectiveness

No data available.

Event Counters
Metric BEFORE AFTER
capability_counter 8 0
dns_counter 1422 0
exec_counter 7151 0
network_counter 94007 0
open_counter 778936 0
syscall_counter 3500 0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache_test.go`:
- Around line 50-61: Add a start barrier so all goroutines begin the contested
calls at the same time: create a start channel (e.g., start := make(chan
struct{})) before launching the loop, have each goroutine block on <-start (or
<-start with a receive) immediately inside the goroutine before calling
lib.wasAddressInEgress and lib.isDomainInEgress, and after spawning all
goroutines close(start) to release them simultaneously; reference the test's
goroutines/WG setup and the calls to lib.wasAddressInEgress and
lib.isDomainInEgress when applying the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f5a58d1c-711e-4226-a3e9-2b20ea7c79b2

📥 Commits

Reviewing files that changed from the base of the PR and between 93e6e1b and 419ebba.

📒 Files selected for processing (2)
  • pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache.go
  • pkg/rulemanager/cel/libraries/networkneighborhood/matcher_cache_test.go

…und 2)

Without the barrier, goroutine launch jitter staggers first-call
arrivals, hiding any unsynchronised-write data race during the
first-build window. With the barrier, all 64 goroutines hit the
contended path simultaneously when close(start) fires — much tighter
race-detector coverage of the atomic.Pointer.CompareAndSwap path.
@entlein
Copy link
Copy Markdown
Author

entlein commented May 13, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.218 0.223 +2.6%
Peak CPU (cores) 0.225 0.237 +5.6%
Avg Memory (MiB) 314.039 267.155 -14.9%
Peak Memory (MiB) 318.469 274.074 -13.9%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 1 0 0.0%
hardlink 6000 0 0.0%
http 1768 119392 98.5%
network 903 77997 98.9%
open 31201 620541 95.2%
symlink 6000 0 0.0%
syscall 979 1876 65.7%
Event Counters
Metric BEFORE AFTER
capability_counter 10 8
dns_counter 1424 1439
exec_counter 7124 7198
network_counter 93726 94657
open_counter 775947 783462
syscall_counter 3515 3496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant