Skip to content

new: domain-sharded tp management#4811

Open
FedeDP wants to merge 10 commits into
mainfrom
new/domain_tp_management
Open

new: domain-sharded tp management#4811
FedeDP wants to merge 10 commits into
mainfrom
new/domain_tp_management

Conversation

@FedeDP
Copy link
Copy Markdown
Contributor

@FedeDP FedeDP commented Mar 31, 2026

Fixes #4808

Description

Add domain sharding for tracing policies in sensors manager.
Each "domain" can only see and work on its own policies; tetra tracingpolicy sub commands gained a new --domain flag; by default it will only act upon grpc domain (enforced by the grpc API), but it can be enforced to work eg: on k8s domain, for debug purposes.

Also a new tetra tracingpolicy domains command has been added to list all available domains (this is a dynamic list, ie: it depends upon loaded policies domains).

Examples of current impl:

$ sudo ./tetragon --btf /sys/kernel/btf/vmlinux  --bpf-lib ./bpf/objs/ --tracing-policy /home/fdipierr/tp_passwd.yaml

We will then have:

$ ./tetra tracingpolicy domains
[static]

Then we load a new policy via tetra:

$ ./tetra tracingpolicy add ~/tp_uprobe_ret_copy.yaml 
tracing policy "/home/fdipierr/tp_uprobe_ret_copy.yaml" added

And we now have a new domain:

$ ./tetra tracingpolicy domains
[grpc static]

We then try to remove the tetra policy but looking for it in the static domain:

$ ./tetra tracingpolicy delete trace-bash-readline --domain static
Error: failed to delete tracing policy: rpc error: code = Unknown desc = tracing policy {trace-bash-readline  static} does not exist

But in the correct domain (grpc is the default for grpc connections)

$ ./tetra tracingpolicy delete trace-bash-readline
tracing policy "trace-bash-readline" deleted

Changelog

new: domain-sharded tp management

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 31, 2026

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 028b167
🔍 Latest deploy log https://app.netlify.com/projects/tetragon/deploys/6a043595cb7f630008716a11
😎 Deploy Preview https://deploy-preview-4811--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@FedeDP FedeDP force-pushed the new/domain_tp_management branch 8 times, most recently from b062ed8 to 1b98c00 Compare April 2, 2026 09:45
@FedeDP FedeDP force-pushed the new/domain_tp_management branch from 1b98c00 to c4c82f1 Compare April 8, 2026 07:33
@FedeDP
Copy link
Copy Markdown
Contributor Author

FedeDP commented Apr 8, 2026

Rebased on top of latest main after the landing of big #4265 :)

@FedeDP FedeDP force-pushed the new/domain_tp_management branch from c4c82f1 to 2b76963 Compare April 8, 2026 09:01
@FedeDP FedeDP added the release-note/minor This PR introduces a minor user-visible change label Apr 8, 2026
@FedeDP FedeDP force-pushed the new/domain_tp_management branch 8 times, most recently from 2f42b2b to 24d9590 Compare April 9, 2026 15:44
@FedeDP FedeDP marked this pull request as ready for review April 9, 2026 15:44
@FedeDP FedeDP requested a review from a team as a code owner April 9, 2026 15:44
@FedeDP FedeDP requested a review from will-isovalent April 9, 2026 15:44
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to declare tracingpolicies before using them, since we are going to need to call tp.TpDomain().

message ListTracingPoliciesRequest {}
message ListTracingPoliciesRequest {
// domain to be listed; empty to list all domains
string domain = 1;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have basically 3 domains for policies right now:

  • "static" domain for tracing policies loaded by configured folder/file
  • "grpc" for grpc-loaded policies (eg: tetra CLI)
  • "k8s" for policies loaded by the crd watcher

In the tests this knowledge may be leaked, but normally this is all transparent in the code, ie:

  • "k8s" is enforced by the k8s TracingPolicy::TpDomain method on addition, and by the crd watcher on deletion
  • "static" is enforced by the normal TracingPolicy::TpDomain method on addition (there is no deletion path for static policies)
  • "grpc" is enforced by grpc server for both additions and deletions

Comment thread pkg/sensors/collection.go
// this enables policies with the same name for different namespaces
type collectionKey struct {
name, namespace string
name, namespace, domain string
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each tracing policy is now referenced by its domain too.

Comment thread pkg/sensors/handler.go
var BaseSensorName = "__base__"
const (
BaseSensorName = "__base__"
sensorsDomain = "sensors"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above, the sensors domain is the one enforced by Sensors related APIs.

Comment thread pkg/server/server.go

if err := s.observer.AddTracingPolicy(ctx, tp); err != nil {
gtp := GRPCTracingPolicy{tp, grpcDomain}
if req.GetDomain() != "" {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally (and by default) it is just grpc, but for debug purposes eg: via tetra, users can customize the domain they want to act upon.

Comment thread pkg/server/server.go
return nil, errors.New("ListSensors is deprecated")
}

type GRPCTracingPolicy struct {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small wrapper around TracingPolicy to enforce grpc specific domain.

@FedeDP FedeDP force-pushed the new/domain_tp_management branch 2 times, most recently from 045f89f to 06be127 Compare April 10, 2026 06:53
@FedeDP FedeDP requested a review from kkourt April 10, 2026 07:31
@FedeDP
Copy link
Copy Markdown
Contributor Author

FedeDP commented Apr 10, 2026

Q: do we want to expose the domain in the tetra tracingpolicy list command? Right now, i chose not to update the TracingPolicyStatus message.

Copy link
Copy Markdown
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Can you please add a commit message in the first commit that introduces commits with the motivation for this change?

Things like 6a84712#r3052523498, are also well-suited for commit messages. Making the context part of the git history is very valuable.

Also, do you think we should enforce some constraints on what is allowed (e.g., characters) in a domain name? I think that's probably a good idea.

Comment thread pkg/tracingpolicy/generictracingpolicy.go Outdated
Comment thread api/v1/tetragon/sensors.proto
Comment thread cmd/tetra/tracingpolicy/tracingpolicy.go Outdated
@FedeDP FedeDP force-pushed the new/domain_tp_management branch from 06be127 to 71bc066 Compare April 17, 2026 08:42
@FedeDP FedeDP requested a review from kkourt April 17, 2026 09:13
}

func (tp *TracingPolicy) TpDomain() string {
return k8sDomain
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing. Looking at the subsequent patches, it seems that what you want to do is have all other users of TracingPolicies use a different type that returns a different TpDomain().

Would it be a better choice to add a Domain field in the TracingPolicy? We don't have to expose it to users. It can be a private field (with getters and setters) or maybe marked with json:"-"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it confusing? I mean, i don't find it confusing obviously :) Mind to share your concerns?

Indeed i find it rather simple and straightforward: each implementation of the interface will have its own specific domain.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question.

One part is that historically, we used the same type for multiple "domains" (e.g., gRPC and static). I also feel that at least for the GenericTracingPolicy we can have the same type everywhere, and it can be used to make the domains more dynamic in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't feel strongly about it. I do think, however, that we would need to document the new behavior in the commit message (and maybe the types) because this is a change from what we did before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also feel that at least for the GenericTracingPolicy we can have the same type everywhere, and it can be used to make the domains more dynamic in the future.

What i don't like is that if we allow people to customize the domain, sooner or later we might end up with mixed domains over and over (ie: "oh i added a GenericTracingPolicy { Domain: "k8s" } because x,y,z".
I prefer the "one type for each" in this case; but again, just like you, this is not a strong opinion.
Luckily, we can change in the future without any breaking change.

I do think, however, that we would need to document the new behavior in the commit message (and maybe the types) because this is a change from what we did before.

Fully agree! Let me improve the commit messages :)

Anyway, would love to hear more feedback on this topic; let me ping @mtardy @will-isovalent
Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add more details to first 2 commits: fc1ce37 and 653f046; PTAL :)

@FedeDP
Copy link
Copy Markdown
Contributor Author

FedeDP commented Apr 20, 2026

Also, do you think we should enforce some constraints on what is allowed (e.g., characters) in a domain name? I think that's probably a good idea.

I fully agree!

EDIT: but also, since we are the ones setting them and they are just returned from interface method implementations, i am not sure whether we need it. After all, users cannot change it and they are all statically lived strings.

@FedeDP FedeDP force-pushed the new/domain_tp_management branch 2 times, most recently from ca0fa7a to c133b46 Compare April 27, 2026 08:21
@FedeDP FedeDP requested a review from kkourt April 28, 2026 09:08
@FedeDP FedeDP force-pushed the new/domain_tp_management branch 3 times, most recently from 117dd7b to c906f31 Compare May 6, 2026 08:44
@mtardy mtardy self-requested a review May 11, 2026 16:37
Copy link
Copy Markdown
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Left some nits, but it's up to you if you want to address them.

Comment thread cmd/tetra/tracingpolicy/tracingpolicy.go Outdated
// TPKindDefinition is the kind name of Cilium Tracing Policy
TPNamespacedKindDefinition = "TracingPolicyNamespaced"

k8sDomain = "k8s"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about we add a new package with all the domain constants? This means that we can use them in the CLI code as well.

Copy link
Copy Markdown
Contributor Author

@FedeDP FedeDP May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed i tried hard not to expose them :) i don't think we should care about what the value is from the outside.

This means that we can use them in the CLI code as well.

Do we need to?

Comment thread pkg/server/server.go
Comment thread cmd/tetra/tracingpolicy/tracingpolicy.go
FedeDP added 2 commits May 13, 2026 10:17
Also, ran `make protogen`.

This commit introduces a `domain` for the tracing policies.
We will later introduce a `TpDomain()` method on `TracingPolicy` interface.

The idea is that each component (tetra through grpc, k8s and the static ones
loaded by configured files) will only be able to act upon its own domain.
For example, a crd tracing policy cannot be overridden/deleted by tetra,
and the same goes the other way.
If one pushes the same policy through different domains, it will end up
with multiple policies loaded.

We will have 3 domains for policies:
* "static" domain for tracing policies loaded by configured folder/file
* "grpc" for grpc-loaded policies (eg: tetra CLI)
* "k8s" for policies loaded by the crd watcher

Note that the grpc server will still allow to act on a specific domain,
for debug purposes. That's why we added a `domain` field to the protobuf messages.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
In the tests this knowledge may be leaked, but normally it is all transparent:
* "k8s" is enforced by k8s `v1alpha1.TracingPolicy::TpDomain()` and by the crd watcher
* "static" is enforced by normal `GenericTracingPolicy::TpDomain()`
* "grpc" is enforced by grpc server through `GRPCTracingPolicy::TpDomain()` (introduced in next commit)

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
@FedeDP FedeDP force-pushed the new/domain_tp_management branch from c906f31 to 028b167 Compare May 13, 2026 08:25
FedeDP added 8 commits May 13, 2026 10:26
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
It is needed to list domains.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
It is reserved for internal use (to keep track of loaded sensors).

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
With the `domain` based approach, we can no longer act upon a sensor via
the tracingpolicy gRPC API and viceversa, ie: we can no longer
act upon a tracingpolicy by using the Sensors gRPC API.
This happens because sensors are all registered in the (private)
`sensors` domain, and tracing policies instead belong to their own
specified domain.

In this specific case, the sensor is created and attached during the
`h.addTracingPolicy()` call, and stored as part of the same collection
of the tracing policy:
```
sensors, err := sensorsFromPolicyHandlers(op.tp, filterID)
if err != nil {
	col.err = err
	col.state = LoadErrorState
	return err
}
col.sensors = make([]SensorIface, 0, len(sensors))
col.sensors = append(col.sensors, sensors...)
```
Thus, trying to reach for the sensor via the Sensor API is not going to work,
because as aforementioned, that enforces the `sensors` domain.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
@FedeDP FedeDP force-pushed the new/domain_tp_management branch from 028b167 to b9c8b4d Compare May 13, 2026 08:26
@FedeDP
Copy link
Copy Markdown
Contributor Author

FedeDP commented May 13, 2026

Also, rebased on top of main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/minor This PR introduces a minor user-visible change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manage mixage of CRD and locally pushed tracing policies (tetra)

2 participants