k8s: Migrate to controller-runtime reconcilers and abstract informer pattern#4977
k8s: Migrate to controller-runtime reconcilers and abstract informer pattern#4977sayboras wants to merge 6 commits into
Conversation
79da49d to
b3cece3
Compare
b3cece3 to
af9079e
Compare
The previous hand-rolled informer gated agent startup on WaitCRDs for both TracingPolicy CRDs at once, so the cluster-scoped reconciler would not run if the namespaced CRD was absent (and vice versa). Each Reconciler is now registered independently when its own CRD becomes available, via a new RegisterControllerWhenCRDReady helper. Also picks up GenerationChangedPredicate so status-only updates no longer trigger reconciliation loops. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Replaces the global init-time podhooks callback registry with a typed PodEventSource passed via constructor injection. This removes the implicit "InstallHooks must run before X" ordering constraint that only failed at runtime, and centralises the cache.DeletedFinalStateUnknown tombstone unwrap so each consumer doesn't repeat it. Signed-off-by: Tam Mach <tam.mach@cilium.io>
This commit is to drop the podhooks-callback + AddEventHandler dance in favour of a single OnPodDelete call. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Replaces cgidmap's init-time podhooks registration with a Register function called explicitly from main, gated by EnableCgIDmap at the call site instead of being silently wired in init(). This makes the feature flag the single source of truth for whether cgidmap observes pod events, and lets unit tests pass a fake PodEventSource instead of relying on the global podhooks registry. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Drops the last consumer of the global init-time podhooks registry. policyfilter no longer reaches into a process-global pod informer at startup; it now receives a PodEventSource explicitly from main and registers its own typed handlers. The k8s_test suite is rewritten around a fake PodEventSource, which also removes the dependency on client-go/kubernetes/fake. Also removes pkg/podhooks entirely and the now-unused podhooks call in pkg/manager. Signed-off-by: Tam Mach <tam.mach@cilium.io>
No longer imported after the policyfilter test rewrite. Regenerated via go mod vendor. Signed-off-by: Tam Mach <tam.mach@cilium.io>
af9079e to
1fb25f1
Compare
mtardy
left a comment
There was a problem hiding this comment.
Thanks, this seems like a very nice cleanup, I have some questions, not sure I followed everything :)!!
| } | ||
|
|
||
| func initK8sPolicyWatcher(ctx context.Context) error { | ||
| func initK8sPolicyWatcher(_ context.Context) error { |
There was a problem hiding this comment.
maybe remove the arg completely? the nok8s version is doing nothing
|
|
||
| // Reconcile drives Add/Delete on sensors.Manager from the live state of the | ||
| // TracingPolicyNamespaced CR. | ||
| func (r *TracingPolicyNamespacedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { |
There was a problem hiding this comment.
could you share the code between the two reconcilers? you could also probably share the unit tests so we don't need to have everything twice?
Maybe having integrations tests for both is good on the other side like we have now.
| handler(pod) | ||
| }, | ||
| }); err != nil { | ||
| logger.GetLogger().Warn("PodEventSource: failed to register Add handler", logfields.Error, err) |
There was a problem hiding this comment.
should we actually bubble errors from there? Is it possible to change the interface so that you bubble from here to RegisterPodDeleteHandler to the main initialization function?
I don't see the point of logging here, I think we should error if it's used during initialization.
| // PodEventSource is the narrow capability cgidmap needs to keep its pod→ | ||
| // container-cgroup mappings up to date. The concrete adapter lives in | ||
| // pkg/manager; tests can pass a hand-rolled fake satisfying the same | ||
| // interface. | ||
| type PodEventSource interface { | ||
| OnPodAdd(handler func(pod *corev1.Pod)) | ||
| OnPodUpdate(handler func(oldPod, newPod *corev1.Pod)) | ||
| OnPodDelete(handler func(pod *corev1.Pod)) | ||
| } |
There was a problem hiding this comment.
why do we redefine one if it's exactly the same set of needed methods as pkg/manager/podevents.go.PodEventSource? I think I understood for the other one as it's only using OnPodDelete but not sure to follow here
| // registerCgidmapPodHandlers wires cgidmap into the supplied pod-event source | ||
| // when the cgidmap feature is enabled. cgidmap is unavailable on Windows, so | ||
| // this function is built only for non-Windows targets. | ||
| func registerCgidmapPodHandlers(events manager.PodEventSource) { |
There was a problem hiding this comment.
nit: I know the naming is all over the place but you could follow the existing EnableCgIDmap case thing
| logger.GetLogger().Warn("failed to retrieve cgidmap, not registering pod handlers", logfields.Error, err) | ||
| return err |
There was a problem hiding this comment.
why the logging + returning the error? Maybe just return fmt.Errorf("...")?
| return | ||
| } | ||
| // onPodAdd handles a new pod observed by the cluster. The handler refreshes | ||
| // pod→policy mappings from the pod's current labels and containers. |
There was a problem hiding this comment.
nit: I've seen in some other comments you've been using non ascii characters, I'd do simple -> if that's possible
| log.Info("adding tracing policy", "info", tp.TpInfo()) | ||
| if addErr := r.Sensors.AddTracingPolicy(ctx, tp); addErr != nil { | ||
| log.Warn("adding tracing policy failed", logfields.Error, addErr) | ||
| } |
There was a problem hiding this comment.
question: so we reuse the same pattern as previously, not retrying, should we now retry since we use the reconciler or not? maybe we should try to install the TP as long as the user deployed it? idk if in some situations it might be actually useful or not, I imagine the reason why it errored would not change over time in this case.
| // PodEventSource is the narrow capability policyfilter needs to consume pod | ||
| // lifecycle events. The concrete adapter lives in pkg/manager; tests can pass | ||
| // a hand-rolled fake satisfying the same interface. | ||
| type PodEventSource interface { | ||
| OnPodAdd(handler func(pod *v1.Pod)) | ||
| OnPodUpdate(handler func(oldPod, newPod *v1.Pod)) | ||
| OnPodDelete(handler func(pod *v1.Pod)) | ||
| } |
There was a problem hiding this comment.
another copy, maybe we want to reuse the og?
| @@ -370,10 +394,17 @@ func (cm *ControllerManager) addPodInformer() error { | |||
| if err != nil { | |||
| return nil | |||
There was a problem hiding this comment.
pre-existing bug it seems
| return nil | |
| return err |
Description
The main goal is to leverage controller-runtime framework for reconciling resources. Additionally, this PR is also to hide client-go informer from consumer, with one benefit of avoiding init() registration.
Changelog