Skip to content

k8s: Migrate to controller-runtime reconcilers and abstract informer pattern#4977

Open
sayboras wants to merge 6 commits into
mainfrom
pr/tammach/k8s-controler
Open

k8s: Migrate to controller-runtime reconcilers and abstract informer pattern#4977
sayboras wants to merge 6 commits into
mainfrom
pr/tammach/k8s-controler

Conversation

@sayboras
Copy link
Copy Markdown
Member

@sayboras sayboras commented May 9, 2026

Description

The main goal is to leverage controller-runtime framework for reconciling resources. Additionally, this PR is also to hide client-go informer from consumer, with one benefit of avoiding init() registration.

Changelog

k8s: Migrate to controller-runtime reconcilers and abstract informer pattern

@sayboras sayboras force-pushed the pr/tammach/k8s-controler branch 2 times, most recently from 79da49d to b3cece3 Compare May 9, 2026 11:32
@sayboras sayboras changed the title Pr/tammach/k8s controler K8s: Migrate to controller-runtime reconcilers and abstract informer pattern May 11, 2026
@sayboras sayboras changed the title K8s: Migrate to controller-runtime reconcilers and abstract informer pattern k8s: Migrate to controller-runtime reconcilers and abstract informer pattern May 11, 2026
@sayboras sayboras force-pushed the pr/tammach/k8s-controler branch from b3cece3 to af9079e Compare May 11, 2026 10:42
@sayboras sayboras added the release-note/misc This PR makes changes that have no direct user impact. label May 11, 2026
sayboras added 6 commits May 11, 2026 20:50
The previous hand-rolled informer gated agent startup on WaitCRDs for
both TracingPolicy CRDs at once, so the cluster-scoped reconciler
would not run if the namespaced CRD was absent (and vice versa). Each
Reconciler is now registered independently when its own CRD becomes
available, via a new RegisterControllerWhenCRDReady helper.

Also picks up GenerationChangedPredicate so status-only updates no
longer trigger reconciliation loops.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Replaces the global init-time podhooks callback registry with a typed
PodEventSource passed via constructor injection. This removes the
implicit "InstallHooks must run before X" ordering constraint that
only failed at runtime, and centralises the
cache.DeletedFinalStateUnknown tombstone unwrap so each consumer
doesn't repeat it.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
This commit is to drop the podhooks-callback + AddEventHandler dance in
favour of a single OnPodDelete call.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Replaces cgidmap's init-time podhooks registration with a Register
function called explicitly from main, gated by EnableCgIDmap at the
call site instead of being silently wired in init(). This makes the
feature flag the single source of truth for whether cgidmap observes
pod events, and lets unit tests pass a fake PodEventSource instead of
relying on the global podhooks registry.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Drops the last consumer of the global init-time podhooks registry.
policyfilter no longer reaches into a process-global pod informer at
startup; it now receives a PodEventSource explicitly from main and
registers its own typed handlers. The k8s_test suite is rewritten
around a fake PodEventSource, which also removes the dependency on
client-go/kubernetes/fake.

Also removes pkg/podhooks entirely and the now-unused podhooks call
in pkg/manager.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
No longer imported after the policyfilter test rewrite. Regenerated
via go mod vendor.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras sayboras force-pushed the pr/tammach/k8s-controler branch from af9079e to 1fb25f1 Compare May 11, 2026 10:52
@sayboras sayboras marked this pull request as ready for review May 11, 2026 12:10
@sayboras sayboras requested review from a team and will-isovalent as code owners May 11, 2026 12:10
@kkourt kkourt self-requested a review May 11, 2026 16:40
@mtardy mtardy self-requested a review May 11, 2026 16:41
Copy link
Copy Markdown
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems like a very nice cleanup, I have some questions, not sure I followed everything :)!!

Comment thread cmd/tetragon/main_k8s.go
}

func initK8sPolicyWatcher(ctx context.Context) error {
func initK8sPolicyWatcher(_ context.Context) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove the arg completely? the nok8s version is doing nothing


// Reconcile drives Add/Delete on sensors.Manager from the live state of the
// TracingPolicyNamespaced CR.
func (r *TracingPolicyNamespacedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you share the code between the two reconcilers? you could also probably share the unit tests so we don't need to have everything twice?

Maybe having integrations tests for both is good on the other side like we have now.

Comment thread pkg/manager/podevents.go
handler(pod)
},
}); err != nil {
logger.GetLogger().Warn("PodEventSource: failed to register Add handler", logfields.Error, err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we actually bubble errors from there? Is it possible to change the interface so that you bubble from here to RegisterPodDeleteHandler to the main initialization function?

I don't see the point of logging here, I think we should error if it's used during initialization.

Comment thread pkg/cgidmap/register.go
Comment on lines +19 to +27
// PodEventSource is the narrow capability cgidmap needs to keep its pod→
// container-cgroup mappings up to date. The concrete adapter lives in
// pkg/manager; tests can pass a hand-rolled fake satisfying the same
// interface.
type PodEventSource interface {
OnPodAdd(handler func(pod *corev1.Pod))
OnPodUpdate(handler func(oldPod, newPod *corev1.Pod))
OnPodDelete(handler func(pod *corev1.Pod))
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we redefine one if it's exactly the same set of needed methods as pkg/manager/podevents.go.PodEventSource? I think I understood for the other one as it's only using OnPodDelete but not sure to follow here

// registerCgidmapPodHandlers wires cgidmap into the supplied pod-event source
// when the cgidmap feature is enabled. cgidmap is unavailable on Windows, so
// this function is built only for non-Windows targets.
func registerCgidmapPodHandlers(events manager.PodEventSource) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I know the naming is all over the place but you could follow the existing EnableCgIDmap case thing

Comment thread pkg/cgidmap/register.go
Comment on lines +41 to +42
logger.GetLogger().Warn("failed to retrieve cgidmap, not registering pod handlers", logfields.Error, err)
return err
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the logging + returning the error? Maybe just return fmt.Errorf("...")?

Comment thread pkg/policyfilter/state.go
return
}
// onPodAdd handles a new pod observed by the cluster. The handler refreshes
// pod→policy mappings from the pod's current labels and containers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've seen in some other comments you've been using non ascii characters, I'd do simple -> if that's possible

log.Info("adding tracing policy", "info", tp.TpInfo())
if addErr := r.Sensors.AddTracingPolicy(ctx, tp); addErr != nil {
log.Warn("adding tracing policy failed", logfields.Error, addErr)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: so we reuse the same pattern as previously, not retrying, should we now retry since we use the reconciler or not? maybe we should try to install the TP as long as the user deployed it? idk if in some situations it might be actually useful or not, I imagine the reason why it errored would not change over time in this case.

Comment on lines +19 to +26
// PodEventSource is the narrow capability policyfilter needs to consume pod
// lifecycle events. The concrete adapter lives in pkg/manager; tests can pass
// a hand-rolled fake satisfying the same interface.
type PodEventSource interface {
OnPodAdd(handler func(pod *v1.Pod))
OnPodUpdate(handler func(oldPod, newPod *v1.Pod))
OnPodDelete(handler func(pod *v1.Pod))
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another copy, maybe we want to reuse the og?

Comment thread pkg/manager/manager.go
@@ -370,10 +394,17 @@ func (cm *ControllerManager) addPodInformer() error {
if err != nil {
return nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existing bug it seems

Suggested change
return nil
return err

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants