btf: replace global BTF cache with user provided cache#1988
Conversation
7f48e14 to
dfebdc7
Compare
ti-mo
left a comment
There was a problem hiding this comment.
Thanks! Left some comments. Looks like there's still a failure in the selftests.
d4513a2 to
fa31975
Compare
|
@ti-mo looks like the CI passes now |
fa31975 to
21aba8b
Compare
|
@ti-mo can you check my proposal on the module BTF? |
…tives (#786) Temporarily points two dependencies to forks that implement pending upstream PRs, allowing CI to build a testable image before the PRs land. inspektor-gadget → matthyx/inspektor-gadget @ fd383d3 - Release collectionSpec after eBPF load (~14 MiB heap) #5461 - Release oras target after load (~27 MiB RssFile) #5462 - Release program bytes after load (~6 MiB heap) #5466 - Reduce trace_capabilities current_syscall map 1M→10K (~43 MiB kernel) #5468 Upstream: inspektor-gadget/inspektor-gadget cilium/ebpf → matthyx/ebpf @ 8a32d06 - Weak-pointer kernel BTF global cache (~17 MiB heap auto-freed) - Invalidate module cache on kernel spec reload (fixes kmod BTF errors) Upstream PR: cilium/ebpf#1988 These replace directives should be removed once the upstream PRs are merged and released. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@matthyx I've given this a lot of thought, but something tells me this isn't the right approach. Turning these into weak pointers completely defeats their purpose as a long-lived cache, since there's no guarantee whatsoever that the global cache will outlive two sequential calls to LoadCollectionSpec, at which point we may as well remove it since btf.Cache fills intra-load caching now. I understand the difficulty in finding a good point to call Flush(); I assume you do this after every collection load to keep consumption down. Though I have one question: what does that 20MiB of heap contain? We have lazy decoding now, so we should only (mostly) inflate types that get returned to the caller. I don't know your use case, but 20MiB feels like a lot. Are you counting heap or RSS? It's worth noting that a few things changed since introducing the global cache. Originally, it intended to amortize the cost of parsing BTF across multiple programs within a single LoadCollectionSpec, but the fact that it persisted across multiple collection loads was a nice bonus. Now that decoding is lazy + mmap-backed, a lot of that cost is gone. Conversely, I find the lower-level logic around how the caches interact almost impossible to reason about today. With btf.Cache in the picture, it may be time to sunset the global cache altogether and allow the caller to pass one in through |
|
Thanks, this makes sense. You're right that weak pointers turn the global cache from a deterministic long-lived cache into a best-effort memoization layer, which is probably the wrong trade-off here. Given that we already use a btf.Cache during collection load, surfacing that through CollectionOptions seems like the cleaner direction. That lets callers choose the cache lifetime explicitly instead of the library trying to guess a safe flush The memory saving is heap, but more 17 than 20, you can see some info here: kubescape/node-agent#786 (comment) Should I close this? |
|
Hi, are you generating these replies using an LLM? If you're up for making the CollectionOptions change, feel free to reuse this PR for that. |
|
@ti-mo yes I use one to redact sentences from bullet points, as I'm not native speaker in English. Sorry if that offended you. |
21aba8b to
131f528
Compare
|
@ti-mo something like that? |
dylandreimerink
left a comment
There was a problem hiding this comment.
I think this updated PR is nice, having less memory overhead by default, and allowing power users to enable caching between collections.
ti-mo
left a comment
There was a problem hiding this comment.
Thanks! This cleans up nicely. Left a few nits, and we should prob make btf.Cache concurrency-safe.
| // | ||
| // Deprecated: kernel BTF is no longer cached at the package level. | ||
| // Discard your [Cache] instances to release cached BTF instead. | ||
| func FlushKernelSpec() {} |
There was a problem hiding this comment.
We should simply remove this as a call to action to anyone that depended on this behaviour. For anyone relying on package-level caching, Flush() becoming a no-op is already sort-of a breaking change. We may as well make it explicit.
| return loadKernelModuleSpec(module, base) | ||
| } | ||
|
|
||
| func loadKernelSpec() (*Spec, error) { |
There was a problem hiding this comment.
Nit: move this above LoadKernelModuleSpec and export instead?
| MapReplacements map[string]*Map | ||
|
|
||
| // Cache amortises the cost of decoding kernel BTF across multiple | ||
| // collection loads. Callers loading more than one collection should |
There was a problem hiding this comment.
Nit:
| // collection loads. Callers loading more than one collection should | |
| // Collection loads. Callers loading more than one Collection should |
| // | ||
| // Opportunistically reuses a global cache if possible. | ||
| // Pass it to [CollectionOptions] to share kernel BTF across multiple | ||
| // collection loads. |
There was a problem hiding this comment.
Nit:
| // collection loads. | |
| // Collection loads. |
| // allocate a Cache via [btf.NewCache] and share it across loads. | ||
| // If nil, a fresh cache is allocated per load and discarded. | ||
| // | ||
| // The same Cache must not be used concurrently. |
There was a problem hiding this comment.
This looks straightforward enough to address, it would be a rather inconvenient limitation. Could you pick this up as well?
131f528 to
3ddbf85
Compare
|
@ti-mo updated |
…ionOptions Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
3ddbf85 to
6bbb73e
Compare
Problem
globalCacheinbtf/kernel.goholds strong*Specpointers to the kernel and module BTF specs. Once loaded, these ~20 MiB of type information are never released, even after all eBPF programs have been loaded and noCollectionSpecretains a reference to them.FlushKernelSpec()exists as a manual escape hatch but requires callers to know when it is safe to call — which is inherently racy in a library.Solution
Change
globalCache.kernelandglobalCache.modulesfrom strong*Specpointers toweak.Pointer[Spec](Go 1.24+). The GC will collect the cached specs automatically once noCollectionSpec(or other caller) holds a strong reference to them.Key behaviours preserved:
/sys/kernel/btf/vmlinuxacross multipleLoadKernelSpeccalls that overlap in time.LoadKernelSpeccall transparently re-parses it.FlushKernelSpec(): Still works; zeroing aweak.Pointeris equivalent to the previous= nil. Now deprecated since explicit flushing is no longer needed.NewCache(): Handles the case where the weak pointer has already been collected, falling back to an emptyCache(which then loads on demand viaKernel()).Cacheis now self-containedPreviously,
Cache.Modulere-used the global module cache and then rebased the result against the Cache's own kernel decoder. That created a hidden invariant: the Cache's kernel decoder had to share its raw mmap'd buffer with whatever kernel spec the global cache currently held — otherwiserebaseDecoderwould fail with"raw BTF differs".Under strong references this invariant held permanently. Under weak references it could break: if the global cached kernel spec were GC'd between
Cache.Kernel()andCache.Module(), the next module load would parse a fresh kernel mmap and the rebase would blow up.Cache.Modulenow loads module BTF directly against the Cache's own kernel vialoadKernelModuleSpec(name, c.kernelTypes). The Cache no longer depends on the global cache's state for correctness, and nokernelSourceanchor field is needed. Memory reclaim is unchanged: the mmap'd vmlinux buffer is freed (munmapcleanup onsharedBuf) once the lastCache(and any other strong holder) is dropped.The trade-off is that a module loaded via
Cache.Moduleis no longer re-cached in the global module cache, so a subsequentLoadKernelModuleSpecfor the same module reparses sysfs. Module BTF is small (kilobytes) and this only matters ifCacheand the package-level loaders are mixed for the same modules.Memory impact
In a long-running process that loads a handful of eBPF programs at startup and then runs indefinitely, this frees ~20 MiB of Go heap automatically — without any caller changes.
Measured in a Kubernetes node-agent workload (inspektor-gadget) where kernel BTF was the dominant post-load heap consumer.
Go version requirement
weak.Pointerwas stabilised in Go 1.24. The module already requires Go 1.23; this PR bumps the minimum to Go 1.24.Testing
Existing
TestLoadKernelSpecandTestCachepass. The cache correctness is unchanged — the weak pointer is semantically transparent to callers as long as they hold a strong reference (via a returned*Spec) across the critical section.