grpc: Support mTLS for tetragon grpc server#4950
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
fe2e84c to
ebbb9d4
Compare
kkourt
left a comment
There was a problem hiding this comment.
Thanks!
Please find some comments below.
| Debug(msg string, args ...any) | ||
| } | ||
|
|
||
| func run(ctx context.Context, w *fsnotify.Watcher, tracked []string, log slogger, r *Reloader) { |
There was a problem hiding this comment.
I can't wrap my head around this funciton. Can we add some comments to make its intention more clear?
There was a problem hiding this comment.
sure, my intention is to make sure the reload only runs once the timer fires without further activity for debounceWindow, in case of multiple events happening sequentially.
| // "" disables gRPC; "unix://X" runs a single plaintext listener at X; | ||
| // on Linux, a TCP address runs that listener plus an in-pod plaintext | ||
| // unix socket at the platform default path, with --server-tls-* gating | ||
| // the TCP path. Windows skips the sidecar unix listener. |
There was a problem hiding this comment.
If I understand correctly, the service on the Unix domain socket will run implicitly when the listen address is specified? This seems too convoluted and, I think it means that there is no way to run a TCP (tls or otherswise) gRPC server alone (without a unix socket)?
Here's a couple of alternatives for discussion:
-
allow for multiple instances of --server-address (so we can have one for unix, one for TCP)
-
add a --tls-server-address for specifying a TCP listening address for the TLS endpoint.
-
makes more sense to me.
There was a problem hiding this comment.
yes, your understanding is correct. My main reason is to avoid any breakage with in-pod tetra command, it's in trusted IPC in local host.
Re 1) I thought about option 1, but then we need to have TLS settings for each server address as well, I think it's a little bit overkill and potentially introduce some breaking change in helm (or make helm config more complicated with 2 phase rollout)
Re 2) Option 2 can work with another helm flag grpc.tls.address. I didn't think about this approach tbh. The only concern I have is that non-tls address then could be with TCP, and it's kind of a workaround and less secure if you also have TLS enabled.
Re 3) Apart from avoiding any breaking change, it's actually how we did with hubble i.e. unix socket domain is configured automatically in addition to TLS listener.
Add a small package that builds a *tls.Config for the Tetragon gRPC server and supports zero-downtime cert rotation via an fsnotify-driven background watcher. The reloader keeps cert/key and (for mTLS) the client CA bundle on hot-reloadable storage so rotating cert material does not require an agent restart. This package is adapted from cilium certloader package. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Add four flags that operators set on the agent and that the Helm chart will render: --server-tls-cert-file --server-tls-key-file --server-tls-client-ca-files --server-tls-require-client-cert Validation enforces that cert-file and key-file are set together, that require-client-cert demands at least one client CA bundle, and that client CA files cannot be supplied without enabling client cert verification. Signed-off-by: Tam Mach <tam.mach@cilium.io>
When the agent configures --server-tls-cert-file (and friends), build a certloader reloader, start its watcher, and pass it as grpc.Creds() to the gRPC server. Previously the agent ran a single gRPC listener determined by --server-address, so enabling TCP+TLS made the unix socket disappear and in-pod tetra had to ship cert material into the pod just to talk to the local agent. Run an always-on plaintext unix listener at /var/run/tetragon/tetragon.sock for trusted in-pod IPC, plus an opt-in TCP listener via --server-address that carries TLS / mTLS when configured. Legacy --server-address values pointing at a unix path are honored as a no-op (the unix listener is now fixed) with a warning when the path differs from the default. Signed-off-by: Tam Mach <tam.mach@cilium.io>
In-pod tooling (`kubectl exec $POD -- tetra getevents`) reads the server address from /var/run/tetragon/tetragon-info.json and dials it plaintext. With the dual-listener layout the unix socket is always-on while the TCP listener is opt-in and may now require TLS / mTLS, so saving --server-address verbatim makes in-pod tetra dial a TLS-gated TCP endpoint without credentials and fail with "error reading server preface: EOF". Always advertise the unix path so in-pod tetra picks the trusted IPC channel by default. The TCP address is still configurable via --server-address; users who explicitly want tetra to dial it can pass --server-address themselves. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Add five persistent flags on the tetra root command so the CLI can dial agents that have the TCP gRPC listener. Setting any of them switches NewClient from insecure credentials to TLS 1.3 credentials built from the supplied material. Without any --tls-* flag the existing plaintext path is unchanged, so `tetra` over the unix socket keeps working as before. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Wire the TLS flags through the Helm chart and offer the same provisioning modes as the cilium/cilium hubble TLS workflow: - tetragon.grpc.tls.auto.method=helm (default) - tetragon.grpc.tls.auto.method=cronJob (cilium-certgen) - tetragon.grpc.tls.auto.method=certmanager (cert-manager Certificate) - tetragon.grpc.tls.auto.enabled=false (user-supplied existingSecret) The agent always serves an unauthenticated unix-domain listener for in-pod tooling; these settings only apply to the optional TCP listener configured via tetragon.grpc.address. The configmap renders --server-tls-cert-file / --server-tls-key-file (plus the require/CA flags when requireClientCert is true), and the daemonset mounts the provisioned (or user-supplied) Secret read-only at /var/lib/tetragon/tls. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Add e2e tests for gRPC server with mTLS - valid client cert dials and calls GetVersion successfully, - plaintext client is rejected, - anonymous TLS client is rejected. Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Description
This PR is to add TLS/mTLS support for tetragon grpc server, below points worth highlighting:
Changelog