Skip to content

grpc: Support mTLS for tetragon grpc server#4950

Open
sayboras wants to merge 8 commits into
mainfrom
pr/tammach/mtls
Open

grpc: Support mTLS for tetragon grpc server#4950
sayboras wants to merge 8 commits into
mainfrom
pr/tammach/mtls

Conversation

@sayboras
Copy link
Copy Markdown
Member

@sayboras sayboras commented May 5, 2026

Description

This PR is to add TLS/mTLS support for tetragon grpc server, below points worth highlighting:

  • grpc TLS feature is disabled by default, mainly for upgrade scenario if existing installation is already configured with TCP (instead of unix socket)
  • dual listeners if TCP + TLS combination is used i.e. unix domain socket is enabled as second listener. This is to avoid any potential issue for in-pod tetra command.

Changelog

grpc: Support mTLS for tetragon grpc server

@sayboras sayboras changed the title Pr/tammach/mtls grpc: Support mTLS for tetragon grpc server May 5, 2026
@sayboras sayboras force-pushed the pr/tammach/mtls branch from 2cc447f to ac01283 Compare May 5, 2026 12:21
@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 81344a8
🔍 Latest deploy log https://app.netlify.com/projects/tetragon/deploys/6a01bc913899ad0008bfdfbd
😎 Deploy Preview https://deploy-preview-4950--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@sayboras sayboras added the release-note/minor This PR introduces a minor user-visible change label May 5, 2026
@sayboras sayboras force-pushed the pr/tammach/mtls branch 8 times, most recently from fe2e84c to ebbb9d4 Compare May 6, 2026 10:27
@sayboras sayboras marked this pull request as ready for review May 6, 2026 11:38
@sayboras sayboras requested review from a team and mtardy as code owners May 6, 2026 11:38
Comment thread cmd/tetragon/main.go
Copy link
Copy Markdown
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Please find some comments below.

Comment thread pkg/certloader/watcher.go Outdated
Comment thread pkg/certloader/watcher.go
Debug(msg string, args ...any)
}

func run(ctx context.Context, w *fsnotify.Watcher, tracked []string, log slogger, r *Reloader) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't wrap my head around this funciton. Can we add some comments to make its intention more clear?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, my intention is to make sure the reload only runs once the timer fires without further activity for debounceWindow, in case of multiple events happening sequentially.

Comment thread pkg/certloader/watcher.go Outdated
Comment thread cmd/tetragon/main.go
// "" disables gRPC; "unix://X" runs a single plaintext listener at X;
// on Linux, a TCP address runs that listener plus an in-pod plaintext
// unix socket at the platform default path, with --server-tls-* gating
// the TCP path. Windows skips the sidecar unix listener.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the service on the Unix domain socket will run implicitly when the listen address is specified? This seems too convoluted and, I think it means that there is no way to run a TCP (tls or otherswise) gRPC server alone (without a unix socket)?

Here's a couple of alternatives for discussion:

  1. allow for multiple instances of --server-address (so we can have one for unix, one for TCP)

  2. add a --tls-server-address for specifying a TCP listening address for the TLS endpoint.

  3. makes more sense to me.

Copy link
Copy Markdown
Member Author

@sayboras sayboras May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, your understanding is correct. My main reason is to avoid any breakage with in-pod tetra command, it's in trusted IPC in local host.

Re 1) I thought about option 1, but then we need to have TLS settings for each server address as well, I think it's a little bit overkill and potentially introduce some breaking change in helm (or make helm config more complicated with 2 phase rollout)

Re 2) Option 2 can work with another helm flag grpc.tls.address. I didn't think about this approach tbh. The only concern I have is that non-tls address then could be with TCP, and it's kind of a workaround and less secure if you also have TLS enabled.

Re 3) Apart from avoiding any breaking change, it's actually how we did with hubble i.e. unix socket domain is configured automatically in addition to TLS listener.

sayboras added 6 commits May 13, 2026 23:44
Add a small package that builds a *tls.Config for the Tetragon gRPC
server and supports zero-downtime cert rotation via an fsnotify-driven
background watcher. The reloader keeps cert/key and (for mTLS) the
client CA bundle on hot-reloadable storage so rotating cert material
does not require an agent restart. This package is adapted from cilium
certloader package.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Add four flags that operators set on the agent and that the Helm chart
will render:

--server-tls-cert-file
--server-tls-key-file
--server-tls-client-ca-files
--server-tls-require-client-cert

Validation enforces that cert-file and key-file are set together, that
require-client-cert demands at least one client CA bundle, and that
client CA files cannot be supplied without enabling client cert
verification.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
When the agent configures --server-tls-cert-file (and friends), build a
certloader reloader, start its watcher, and pass it as grpc.Creds() to the
gRPC server.

Previously the agent ran a single gRPC listener determined by
--server-address, so enabling TCP+TLS made the unix socket disappear and
in-pod tetra had to ship cert material into the pod just to talk to the
local agent. Run an always-on plaintext unix listener at
/var/run/tetragon/tetragon.sock for trusted in-pod IPC, plus an
opt-in TCP listener via --server-address that carries TLS / mTLS when
configured. Legacy --server-address values pointing at a unix path are
honored as a no-op (the unix listener is now fixed) with a warning when
the path differs from the default.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
In-pod tooling (`kubectl exec $POD -- tetra getevents`) reads the server
address from /var/run/tetragon/tetragon-info.json and dials it
plaintext. With the dual-listener layout the unix socket is always-on
while the TCP listener is opt-in and may now require TLS / mTLS, so
saving --server-address verbatim makes in-pod tetra dial a TLS-gated
TCP endpoint without credentials and fail with "error reading server
preface: EOF".

Always advertise the unix path so in-pod tetra picks the trusted IPC
channel by default. The TCP address is still configurable via
--server-address; users who explicitly want tetra to dial it can pass
--server-address themselves.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Add five persistent flags on the tetra root command so the CLI can dial
agents that have the TCP gRPC listener. Setting any of them switches
NewClient from insecure credentials to TLS 1.3 credentials built from
the supplied material. Without any --tls-* flag the existing plaintext
path is unchanged, so `tetra` over the unix socket keeps working as
before.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Wire the TLS flags through the Helm chart and offer the same
provisioning modes as the cilium/cilium hubble TLS workflow:

- tetragon.grpc.tls.auto.method=helm (default)
- tetragon.grpc.tls.auto.method=cronJob (cilium-certgen)
- tetragon.grpc.tls.auto.method=certmanager (cert-manager Certificate)
- tetragon.grpc.tls.auto.enabled=false (user-supplied existingSecret)

The agent always serves an unauthenticated unix-domain listener for
in-pod tooling; these settings only apply to the optional TCP listener
configured via tetragon.grpc.address. The configmap renders
--server-tls-cert-file / --server-tls-key-file (plus the require/CA
flags when requireClientCert is true), and the daemonset mounts the
provisioned (or user-supplied) Secret read-only at /var/lib/tetragon/tls.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added 2 commits May 13, 2026 23:44
Add e2e tests for gRPC server with mTLS

- valid client cert dials and calls GetVersion successfully,
- plaintext client is rejected,
- anonymous TLS client is rejected.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/minor This PR introduces a minor user-visible change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants