Skip to content

bpf: add errmetrics more generously#4989

Draft
3u13r wants to merge 2 commits into
cilium:mainfrom
3u13r:pr/3u13r/add-more-errmetrics-2026-05-11
Draft

bpf: add errmetrics more generously#4989
3u13r wants to merge 2 commits into
cilium:mainfrom
3u13r:pr/3u13r/add-more-errmetrics-2026-05-11

Conversation

@3u13r
Copy link
Copy Markdown
Contributor

@3u13r 3u13r commented May 12, 2026

Fixes #4952

Description

Add errmetrics around function calls that don't yet handle the return code and can fail silently.

Those are mostly calls of probe_read and map_update_elem. Note that we didn't add this to all such cases since we run against the instruction limit on 4.19, but also on 5.10 if we try to add errmetrics to all cases.

Changelog

bpf: add errmetrics to more function calls

@3u13r 3u13r requested a review from a team as a code owner May 12, 2026 02:36
@3u13r 3u13r requested a review from andrewstrohman May 12, 2026 02:36
@3u13r 3u13r marked this pull request as draft May 12, 2026 02:37
@3u13r
Copy link
Copy Markdown
Contributor Author

3u13r commented May 12, 2026

Converted to draft until we are aligned in the issue.

@andrewstrohman
Copy link
Copy Markdown
Contributor

andrewstrohman commented May 12, 2026

For the failures that occur in read_arg() and the functions that it calls (directly and indirectly), I have a plan to communicate the error via arg status. See here. This is preferable to setting the metric because it will prevent bogus argument values in the events. Instead, users will see that the arg could not be read.

In this PR I laid the groundwork to do this, and handled a probe_read_str error within copy_strings. With this groundwork, we just need to return a negative value from these functions in order to plumb the error to userspace.

3u13r added 2 commits May 13, 2026 13:24
Add errmetrics around function calls that don't yet handle the return
code and can fail silently.

Those are mostly calls of `probe_read` and `map_update_elem`.
Note that we didn't add this to all such cases since we run against the
instruction limit on 4.18 (rhel 8.10) and 4.19, but also on 5.10 if we try
to add errmetrics to all cases.

Fixes: cilium#4952

Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>
When an function argument cannot be read in eBPF,
we have existing infrastructure to report this to the userspace.

Use this infrastructure to report any read_probe errors in read_arg().

Related-to: cilium#4923

Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>
@3u13r 3u13r force-pushed the pr/3u13r/add-more-errmetrics-2026-05-11 branch from 222a6ed to f84ff15 Compare May 13, 2026 11:31
@netlify
Copy link
Copy Markdown

netlify Bot commented May 13, 2026

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit f84ff15
🔍 Latest deploy log https://app.netlify.com/projects/tetragon/deploys/6a046119d6b18f000867bf82
😎 Deploy Preview https://deploy-preview-4989--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mtardy mtardy added the release-note/misc This PR makes changes that have no direct user impact. label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add errmetrics in the BPF code where they are missing.

3 participants