Skip to content

Feat/mtls cert source abstraction#293

Open
mahil-2040 wants to merge 27 commits intovolcano-sh:mainfrom
mahil-2040:feat/mtls-cert-source-abstraction
Open

Feat/mtls cert source abstraction#293
mahil-2040 wants to merge 27 commits intovolcano-sh:mainfrom
mahil-2040:feat/mtls-cert-source-abstraction

Conversation

@mahil-2040
Copy link
Copy Markdown
Contributor

@mahil-2040 mahil-2040 commented Apr 21, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Description

This PR implements the foundational mTLS certificate abstraction layer (pkg/mtls) and actively wires the internal HTTP/gRPC listeners across the AgentCube architecture. Following the secure workload identity framework outlined in auth-proposal.md, this PR fully establishes the zero-trust, mutually authenticated TLS connections for the Router, WorkloadManager, and PicoD components.

This officially completes Phase 1: Internal Workload Authentication (SPIRE) of the Auth Proposal.


Core Changes

1. Dynamic Zero-Downtime Rotation (CertWatcher)

Introduced a highly robust, fsnotify-driven CertWatcher capable of hot-reloading certificates from disk without dropping active connections.

2. Strict SPIFFE Identity Verification

Added pkg/mtls/spiffeid.go with hardcoded, secure SPIFFE identity constants.

  • The WorkloadManager and PicoD servers are hard-wired to strictly reject connections that do not present the RouterSPIFFEID.
  • The Router client cryptographically verifies it is communicating with the authentic WorkloadManagerSPIFFEID or SandboxSPIFFEID before proxying requests.

3. Component Wiring (Router & WorkloadManager)

Wired the --enable-mtls and certificate path flags directly into the Router and WorkloadManager Config structs. Their respective internal HTTP transports and listeners are now securely wrapped with mTLS using the new abstraction layer.

4. Dynamic SPIRE Sidecar Injection (PicoD)

Enhanced the dynamic Sandbox pod builder to natively support SPIRE. When mTLS is enabled, the builder automatically injects the spiffe-helper sidecar, mounts the spire-agent-socket from the host node, provisions the shared emptyDir volumes, and appends the --enable-mtls flags to the PicoD container arguments.

5. PicoD Authentication Mode Toggle

Added the --picod-auth-mode flag to the Router (mtls or jwt). This gives operators the flexibility to choose between high-security transport-layer authentication or latency-optimized application-layer authentication for sandboxes. The PicoD server now correctly bypasses the JWT middleware when mTLS is enabled.

6. Production-Pattern E2E Testing

Added in-process integration tests to mtls_e2e_test.go that simulate the actual Router→WorkloadManager and Router→PicoD handshake channels using the real SPIFFE constants, ensuring identity enforcement works flawlessly.


Verification

  • All AgentCube binaries build successfully with the new CLI flags.
  • pkg/mtls test suite passes (including rotational tests, intermediate cert validation logic, and production-pattern handshake E2E tests).
  • pkg/router, pkg/workloadmanager, and pkg/picod unit tests updated and passing.
  • Manually tested full RouterWorkloadManager communication with dynamically generated certificates and successful health checks.

Which issue(s) this PR fixes:
Fixes Part of #243

Special notes for your reviewer:
The ptr package was removed during the pod builder sidecar injection logic as it is natively supported via pointer creation. If multi-cluster or configurable trust domains are needed in the future, the hardcoded constants in spiffeid.go should be exposed via CLI flags.

Does this PR introduce a user-facing change?:

yes

Implemented full zero-trust Workload Identity via SPIRE. The Router, WorkloadManager, and PicoD sandboxes now support mutually authenticated TLS (mTLS) with strict SPIFFE ID enforcement. Dynamic sandboxes automatically receive a spiffe-helper sidecar to manage short-lived, auto-rotating certificates. A new --picod-auth-mode flag has been added to the Router to toggle between 'mtls' and 'jwt' authentication modes.

…s.Config from file or SPIRE-managed SVIDs using crypto/tls and fsnotify.

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
…file, and --mtls-ca-file flags across all components and wired them into their respective Config structs with startup validation.

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
…rce-agnostic

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
…ication

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings April 21, 2026 13:05
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a certificate-source agnostic mTLS framework across the picod, router, and workload-manager components. It includes new command-line flags, a configuration validation layer, and a pkg/mtls package for dynamic certificate loading and file watching with SPIFFE ID verification. Key feedback points out that the picod server does not yet utilize the new mTLS configuration in its runtime, and the CertWatcher contains a bug where it fails to re-establish watches for the key file after an atomic rename. Additionally, the current implementation performs synchronous disk I/O on every TLS handshake, which should be optimized using the provided CertWatcher, and file watching should be shifted to directory-level monitoring for better reliability in Kubernetes environments.

Comment thread cmd/picod/main.go
Comment on lines 51 to 57
config := picod.Config{
Port: *port,
Workspace: *workspace,
Port: *port,
Workspace: *workspace,
MTLSCertFile: *mtlsCertFile,
MTLSKeyFile: *mtlsKeyFile,
MTLSCAFile: *mtlsCAFile,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the mTLS flags are correctly parsed and passed to the picod.Config, the picod.Server implementation (specifically in Run()) does not yet utilize these fields to configure its network listener. This results in the mTLS feature being effectively disabled despite the flags being set. The server's Run method needs to be updated to use mtls.LoadServerConfig.

Comment thread pkg/mtls/watcher.go
Comment on lines +114 to +122
if event.Has(fsnotify.Rename) || event.Has(fsnotify.Remove) {
_ = cw.watcher.Remove(cw.certFile)
if err := cw.watcher.Add(cw.certFile); err != nil {
klog.Errorf("Failed to re-watch %s: %v", cw.certFile, err)
}
if err := cw.reload(); err != nil {
klog.Errorf("Failed to reload certificate after rename: %v", err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The re-watch logic for atomic renames is currently hardcoded to only handle cw.certFile. If the keyFile is updated via an atomic rename (common in SPIRE/cert-manager rotations), its watch will be lost and never re-established. Use event.Name to ensure the watch is re-added for the specific file that triggered the event.

Suggested change
if event.Has(fsnotify.Rename) || event.Has(fsnotify.Remove) {
_ = cw.watcher.Remove(cw.certFile)
if err := cw.watcher.Add(cw.certFile); err != nil {
klog.Errorf("Failed to re-watch %s: %v", cw.certFile, err)
}
if err := cw.reload(); err != nil {
klog.Errorf("Failed to reload certificate after rename: %v", err)
}
}
// Re-watch after atomic rename (spiffe-helper does atomic rename)
if event.Has(fsnotify.Rename) || event.Has(fsnotify.Remove) {
_ = cw.watcher.Remove(event.Name)
if err := cw.watcher.Add(event.Name); err != nil {
klog.Errorf("Failed to re-watch %s: %v", event.Name, err)
}
if err := cw.reload(); err != nil {
klog.Errorf("Failed to reload certificate after rename: %v", err)
}
}

Comment thread pkg/mtls/watcher.go
Comment on lines +48 to +55
if err := watcher.Add(certFile); err != nil {
watcher.Close()
return nil, fmt.Errorf("watch cert file %s: %w", certFile, err)
}
if err := watcher.Add(keyFile); err != nil {
watcher.Close()
return nil, fmt.Errorf("watch key file %s: %w", keyFile, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Watching individual files in Kubernetes (where Secrets/ConfigMaps are symlinked) is often unreliable because the file inodes change when the mount is updated. A more robust pattern for K8s is to watch the parent directory or the ..data symlink. Since this component is intended for use with SPIRE/K8s, consider watching the directory to ensure updates are always detected.

Comment thread pkg/mtls/loader.go Outdated
Comment on lines +19 to +30
tlsCfg := &tls.Config{
GetCertificate: func(_ *tls.ClientHelloInfo) (*tls.Certificate, error) {
cert, err := tls.LoadX509KeyPair(cfg.CertFile, cfg.KeyFile)
if err != nil {
return nil, err
}
return &cert, nil
},
ClientCAs: caPool,
ClientAuth: tls.RequireAndVerifyClientCert,
MinVersion: tls.VersionTLS12,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The GetCertificate (and GetClientCertificate in LoadClientConfig) callbacks perform synchronous disk I/O on every TLS handshake via tls.LoadX509KeyPair. This is inefficient and can become a bottleneck. Since CertWatcher is provided in this package, it should be integrated here to provide an in-memory cache of the certificate that reloads only on file changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new pkg/mtls package intended to provide a certificate-source-agnostic mTLS foundation (PEM file inputs) and adds CLI/config plumbing for Router, WorkloadManager, and PicoD to accept mTLS certificate paths.

Changes:

  • Added pkg/mtls (config + TLS config loader + fsnotify-based certificate watcher) and accompanying unit tests.
  • Added --mtls-cert-file, --mtls-key-file, --mtls-ca-file flags to Router / WorkloadManager / PicoD and stored them in each component’s Config.
  • Added github.com/fsnotify/fsnotify as a direct Go module dependency.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
pkg/workloadmanager/server.go Adds mTLS cert/key/CA fields to WorkloadManager Config.
pkg/router/config.go Adds mTLS cert/key/CA fields to Router Config.
pkg/router/config_test.go Extends Router config JSON round-trip tests for new mTLS fields.
pkg/picod/server.go Adds mTLS cert/key/CA fields (with JSON tags) to PicoD Config.
pkg/mtls/config.go New shared mTLS path config + validation helpers.
pkg/mtls/config_test.go Tests for mtls.Config validation/enabled behavior.
pkg/mtls/loader.go New TLS config “loader” with SPIFFE ID verification hooks.
pkg/mtls/loader_test.go Tests for TLS config loader and SPIFFE verification helpers.
pkg/mtls/watcher.go New fsnotify-driven certificate watcher with hot reload callbacks.
pkg/mtls/watcher_test.go Tests for watcher initial load / update / idempotent stop.
cmd/workload-manager/main.go Adds mTLS flags, validates them, passes values into WorkloadManager Config.
cmd/router/main.go Adds mTLS flags, validates them, passes values into Router Config.
cmd/picod/main.go Adds mTLS flags, validates them, passes values into PicoD Config.
go.mod Promotes fsnotify to a direct dependency.

Comment thread pkg/mtls/config.go Outdated
Comment thread pkg/mtls/watcher.go Outdated
Comment thread pkg/mtls/loader.go Outdated
Comment thread pkg/mtls/config.go
Comment thread cmd/router/main.go
Comment thread pkg/mtls/loader_test.go Outdated
Comment thread pkg/mtls/loader_test.go Outdated
Comment thread pkg/mtls/watcher_test.go Outdated
Comment thread pkg/mtls/loader.go
Comment thread pkg/mtls/watcher.go
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 21, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 63.63636% with 144 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.79%. Comparing base (86ae12f) to head (4c9a422).
⚠️ Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
pkg/mtls/watcher.go 64.70% 34 Missing and 8 partials ⚠️
pkg/workloadmanager/workload_builder.go 67.79% 34 Missing and 4 partials ⚠️
pkg/mtls/loader.go 72.72% 14 Missing and 7 partials ⚠️
pkg/workloadmanager/server.go 0.00% 21 Missing ⚠️
pkg/router/session_manager.go 50.00% 8 Missing and 4 partials ⚠️
pkg/picod/server.go 0.00% 10 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #293      +/-   ##
==========================================
+ Coverage   46.75%   49.79%   +3.04%     
==========================================
  Files          30       33       +3     
  Lines        2862     3227     +365     
==========================================
+ Hits         1338     1607     +269     
- Misses       1381     1449      +68     
- Partials      143      171      +28     
Flag Coverage Δ
unittests 49.79% <63.63%> (+3.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…tability

- Updated LoadServerConfig and LoadClientConfig to use CertWatcher cache, eliminating repeated disk I/O and ensuring seamless cert rotation.
- Improved fsnotify handling to track exact file events, maintaining watch consistency during atomic cert/key replacements.
- Enhanced TLS validation by including intermediate CAs in VerifyOptions for proper certificate chain verification.
- Added nil guards and enforced strict Config.Enabled() checks to prevent misconfigurations and runtime panics.

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pkg/mtls package is well-structured — clean separation of Config, CertWatcher, and loader helpers, good test coverage, and the sync.Once-guarded Stop() is correct. A few issues to address before merging.

Across all three binaries (pkg/picod/server.go, pkg/workloadmanager/server.go, pkg/router/config.go): the mTLS fields are validated at startup but never consumed by the HTTP listeners. An operator who deploys with --mtls-cert-file etc. will see the flags silently have no effect — no error, no warning. This is the stated intent (follow-up PR will wire it), but please at minimum add a startup klog.Warningf or a // TODO(author): wire mTLS in follow-up PR comment at each ListenAndServe / Start call site so the intent is clear in the code.

Comment thread pkg/mtls/watcher.go Outdated
if event.Has(fsnotify.Rename) || event.Has(fsnotify.Remove) {
targetFile := event.Name
_ = cw.watcher.Remove(targetFile)
if err := cw.watcher.Add(targetFile); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If watcher.Add(targetFile) fails here (for example, spiffe-helper renames cert.pem away before atomically placing the new one), the file is permanently dropped from the watch set and no further events will be received for it — cert hot-reload silently stops working.

The current code only logs the error, so a rotation failure could go unnoticed until the cert expires. Two safer options:

  1. Watch the parent directory rather than individual files (standard cert-manager/spiffe-helper approach).
  2. Implement a short retry loop (e.g. try re-Add up to 5× with 50 ms sleep) so a transient race during atomic rename is recovered.

At minimum, downgrade the log to klog.V(4) for the expected case and add a clear klog.Errorf when the retry budget is exhausted so operators know cert reload is degraded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with your 2nd suggested approach and implemented a 5-iteration retry loop with a 50ms sleep, plus a CRITICAL log if the budget is exhausted.

Comment thread pkg/mtls/loader.go Outdated
MinVersion: tls.VersionTLS12,
}

if expectedServerID != "" {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When expectedServerID == "" the else branch (lines 78-82) returns a tls.Config with standard hostname verification enabled (InsecureSkipVerify=false, ServerName=""). SPIFFE-issued certificates carry only URI SANs — no DNS SANs — so Go's hostname check will fail with x509: certificate is valid for ... against any SPIFFE endpoint. The config returned by this branch is essentially unusable with the mTLS setup this package targets.

Options:

  1. Require expectedServerID to be non-empty when mTLS is configured, and return an error if it is empty.
  2. Document clearly in the godoc that expectedServerID="" is only valid when connecting to non-SPIFFE endpoints that have a matching DNS SAN, and consider setting tlsCfg.ServerName to prevent accidental misuse.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've strictly enforced Option 1: LoadClientConfig now returns an explicit error immediately if expectedServerID is empty, completely preventing accidental fallback to standard DNS SAN verification.

Comment thread pkg/picod/server.go Outdated
// mTLS configuration (certificate source abstraction)

// MTLSCertFile is the path to the mTLS certificate (--mtls-cert-file)
MTLSCertFile string `json:"mtlsCertFile"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three fields (MTLSCertFile, MTLSKeyFile, MTLSCAFile) are plumbed through and validated in cmd/picod/main.go, but Run() always calls server.ListenAndServe() (plain HTTP) regardless of whether they are set. The same pattern applies in pkg/workloadmanager/server.go (Start()) and pkg/router (server.go/Start()).

Please add either:

  • A klog.Warningf("mTLS cert paths configured but listener not yet wired — ignored until follow-up PR") guard in Run()/Start() when MTLSCertFile != "", or
  • A // TODO(<author>): wire mtls.LoadServerConfig here — follow-up PR comment at the ListenAndServe call site

Otherwise operators silently pass flags that have zero effect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a startup klog.Warningf and a TODO guard at all three ListenAndServe call sites to explicitly warn operators until the follow-up wiring PR.

Comment thread pkg/mtls/loader_test.go
certFile, keyFile, caFile := generateTestCerts(t)
cfg := &Config{CertFile: certFile, KeyFile: keyFile, CAFile: caFile}

tlsCfg, watcher, err := LoadServerConfig(cfg, nil)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only exercises LoadServerConfig(cfg, nil) (no expectedClientIDs). The code path where VerifyPeerCertificate is injected for SPIFFE enforcement (LoadServerConfig(cfg, []string{spiffeID})) has no test. Since this callback is the central security enforcement for server-side identity, please add a test analogous to TestVerifyPeerSPIFFEID_MatchingID/NonMatchingID but going through LoadServerConfig:

func TestLoadServerConfig_WithExpectedClientIDs(t *testing.T) {
    certFile, keyFile, caFile := generateTestCerts(t)
    cfg := &Config{CertFile: certFile, KeyFile: keyFile, CAFile: caFile}
    tlsCfg, watcher, err := LoadServerConfig(cfg, []string{"spiffe://cluster.local/test"})
    ...
    if tlsCfg.VerifyPeerCertificate == nil {
        t.Error("VerifyPeerCertificate should be set when expectedClientIDs provided")
    }
    // Also exercise the callback directly to confirm reject/accept behaviour.
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added TestLoadServerConfig_WithExpectedClientIDs which directly exercises the injected VerifyPeerCertificate callback to confirm it correctly accepts and rejects matching/non-matching identities.

…and expand integration tests

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings April 22, 2026 04:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comment thread pkg/mtls/loader.go
Comment on lines +58 to +66
// LoadClientConfig returns a tls.Config for a client that presents its cert and verifies the server.
// Certs are loaded dynamically via GetClientCertificate to support rotation.
// If expectedServerID is non-empty, the server must present a matching SPIFFE ID.
// This sets InsecureSkipVerify=true to bypass hostname checking (SPIFFE uses URI SANs)
// and manually verifies the chain + SPIFFE ID instead.
func LoadClientConfig(cfg *Config, expectedServerID string) (*tls.Config, *CertWatcher, error) {
if expectedServerID == "" {
return nil, nil, fmt.Errorf("client TLS config: expectedServerID is required for SPIFFE verification")
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for LoadClientConfig says "If expectedServerID is non-empty" but the implementation returns an error when expectedServerID == "". Please either update the comment to reflect that expectedServerID is required, or adjust the function to support a non-SPIFFE mode (standard verification with RootCAs / ServerName) when it is empty.

Copilot uses AI. Check for mistakes.
Comment thread pkg/picod/server.go Outdated
Comment on lines +37 to +41
MTLSCertFile string `json:"mtlsCertFile"`
// MTLSKeyFile is the path to the mTLS private key (--mtls-key-file)
MTLSKeyFile string `json:"mtlsKeyFile"`
// MTLSCAFile is the path to the mTLS CA bundle (--mtls-ca-file)
MTLSCAFile string `json:"mtlsCAFile"`
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new PicoD Config JSON tags use lowerCamelCase (mtlsCertFile/mtlsKeyFile/mtlsCAFile), but other PicoD request/response types consistently use snake_case for multi-word JSON fields (e.g., working_dir, exit_code, is_dir). To keep the API consistent, rename these tags to snake_case (for example, mtls_cert_file/mtls_key_file/mtls_ca_file) or align with whichever format PicoD exposes elsewhere.

Suggested change
MTLSCertFile string `json:"mtlsCertFile"`
// MTLSKeyFile is the path to the mTLS private key (--mtls-key-file)
MTLSKeyFile string `json:"mtlsKeyFile"`
// MTLSCAFile is the path to the mTLS CA bundle (--mtls-ca-file)
MTLSCAFile string `json:"mtlsCAFile"`
MTLSCertFile string `json:"mtls_cert_file"`
// MTLSKeyFile is the path to the mTLS private key (--mtls-key-file)
MTLSKeyFile string `json:"mtls_key_file"`
// MTLSCAFile is the path to the mTLS CA bundle (--mtls-ca-file)
MTLSCAFile string `json:"mtls_ca_file"`

Copilot uses AI. Check for mistakes.
Comment thread pkg/mtls/loader.go Outdated
Comment on lines +125 to +129
intermediates := x509.NewCertPool()
for i, rawCert := range rawCerts[1:] {
intermediateCert, err := x509.ParseCertificate(rawCert)
if err != nil {
return fmt.Errorf("parse peer intermediate certificate %d: %w", i+1, err)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyPeerChainAndSPIFFEID has intermediate-certificate handling (building an Intermediates pool from rawCerts[1:]), but loader_test.go doesn’t include a case where the leaf is signed by an intermediate CA. Please add a unit test (root CA → intermediate CA → leaf) to ensure this logic actually works.

Copilot generated this review using guidance from repository custom instructions.
Comment thread pkg/mtls/watcher.go Outdated
Comment on lines +130 to +134
// Re-watch after atomic rename (spiffe-helper does atomic renames for both cert and key)
if event.Has(fsnotify.Rename) || event.Has(fsnotify.Remove) {
targetFile := event.Name
_ = cw.watcher.Remove(targetFile)
// Retry loop to handle transient races during atomic replacement
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rename/Remove re-watch path in watchLoop (intended to support atomic rotation via rename/symlink swaps) isn’t covered by watcher_test.go (it only overwrites files). Please add a unit test that rotates cert/key using os.Rename into place so this branch is exercised.

Copilot generated this review using guidance from repository custom instructions.
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings April 22, 2026 05:14
@mahil-2040 mahil-2040 force-pushed the feat/mtls-cert-source-abstraction branch from def4e7a to 6edb3da Compare April 22, 2026 05:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Comment thread pkg/router/server.go Outdated

// Start HTTP or HTTPS server
if s.config.MTLSCertFile != "" || s.config.MTLSKeyFile != "" || s.config.MTLSCAFile != "" {
klog.Warningf("TODO(author): wire mTLS in follow-up PR. mTLS flags are currently parsed but not consumed by the Router listener.")
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shipping a runtime log message containing a TODO(author) placeholder is brittle and can leak internal process details into production logs. Consider replacing this with a stable message (no TODO/author), and ideally fail fast (return an error) if users provide --mtls-* flags that are not yet honored, or gate the warning behind a debug/feature flag. (Same pattern appears in WorkloadManager and PicoD.)

Suggested change
klog.Warningf("TODO(author): wire mTLS in follow-up PR. mTLS flags are currently parsed but not consumed by the Router listener.")
klog.Warning("mTLS configuration was provided, but Router does not currently apply these settings to its listener; the mTLS settings will be ignored.")

Copilot uses AI. Check for mistakes.
Comment thread cmd/router/main.go
Comment on lines +53 to +61
// Validate mTLS configuration early (fail fast on bad flags)
mTLSCfg := mtls.Config{
CertFile: *mtlsCertFile,
KeyFile: *mtlsKeyFile,
CAFile: *mtlsCAFile,
}
if err := mTLSCfg.Validate(); err != nil {
klog.Fatalf("Invalid mTLS configuration: %v", err)
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/release-note states the components now support mTLS flags to bootstrap mTLS, but the listeners currently do not consume these flags (only warn). This is user-visible and potentially misleading: users can pass valid --mtls-* flags and still run without mTLS. Consider either (a) updating the release note/PR description to explicitly say the flags are accepted but not yet functional, (b) hiding the flags until mTLS is wired, or (c) erroring out when --mtls-* is set to avoid a false sense of security.

Copilot uses AI. Check for mistakes.
Comment thread pkg/mtls/watcher.go
Comment on lines +124 to +133
// Reload on write or create
if event.Has(fsnotify.Write) || event.Has(fsnotify.Create) {
if err := cw.reload(); err != nil {
klog.Errorf("Failed to reload certificate: %v", err)
}
}
// Re-watch after atomic rename (spiffe-helper does atomic renames for both cert and key)
if event.Has(fsnotify.Rename) || event.Has(fsnotify.Remove) {
cw.handleRenameEvent(event.Name)
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During atomic rotations where cert and key are replaced in quick succession, reloading immediately on any single file event can frequently attempt to load a mismatched cert/key pair (or a temporarily missing file), causing repeated error logs and delayed rotation. A more robust approach is to debounce/coalesce events (short timer) and/or retry tls.LoadX509KeyPair with backoff until it succeeds, so rotation converges quickly without log spam.

Copilot uses AI. Check for mistakes.
Comment thread pkg/mtls/watcher.go Outdated
Comment on lines +149 to +161
var added bool
for i := 0; i < 5; i++ {
if err := cw.watcher.Add(targetFile); err != nil {
klog.V(4).Infof("Failed to re-watch %s (attempt %d/5): %v", targetFile, i+1, err)
time.Sleep(50 * time.Millisecond)
continue
}
added = true
break
}
if !added {
klog.Errorf("CRITICAL: Exhausted retry budget attempting to re-watch %s. Certificate hot-reload is degraded!", targetFile)
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry budget (5) and sleep interval (50ms) are hard-coded magic numbers and duplicated in the log message (attempt %d/5). Consider extracting them into named constants (or making them configurable) to keep the code and logs consistent and make tuning easier. Also consider avoiding "CRITICAL" in free-form log text in favor of structured severity via log level / consistent prefixing.

Copilot uses AI. Check for mistakes.
Comment thread pkg/mtls/loader.go

opts := x509.VerifyOptions{
Roots: caPool,
Intermediates: intermediates,
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity and to prevent accidental behavior changes if Go's defaults evolve, consider explicitly setting VerifyOptions.KeyUsages to []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth} (client-side server verification). This makes the intended usage constraints explicit in the custom verification path.

Suggested change
Intermediates: intermediates,
Intermediates: intermediates,
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},

Copilot uses AI. Check for mistakes.
@mahil-2040 mahil-2040 requested a review from hzxuzhonghu April 22, 2026 05:24
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foundational mTLS library looks solid overall — good test coverage, correct use of InsecureSkipVerify with manual chain verification, and clean SPIFFE ID checking. Flagging a few issues that could bite in production rotation scenarios.

Comment thread pkg/mtls/watcher.go
return
}
// Reload on write or create
if event.Has(fsnotify.Write) || event.Has(fsnotify.Create) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug – cert/key rotation race (no debounce).

Both cert and key files are watched independently, and every Write/Create event immediately calls reload()tls.LoadX509KeyPair(certFile, keyFile). During an atomic rotation (e.g. spiffe-helper renames cert, then renames key), the first event loads the new cert with the old key → "private key does not match public key" error is logged, and the watcher falls back to the stale certificate.

The second file event self-heals, so it's not an outage, but with SPIRE SVIDs (1-hour TTL) you'd get ~24 spurious klog.Errorf lines/day/component — noisy for alerting.

Suggested fix: add a short debounce (e.g. a time.AfterFunc of 200–300 ms that resets on each event) so both files settle before calling reload(). controller-runtime's certwatcher uses exactly this pattern.

Comment thread pkg/mtls/watcher.go Outdated
Comment on lines +149 to +159
var added bool
for i := 0; i < 5; i++ {
if err := cw.watcher.Add(targetFile); err != nil {
klog.V(4).Infof("Failed to re-watch %s (attempt %d/5): %v", targetFile, i+1, err)
time.Sleep(50 * time.Millisecond)
continue
}
added = true
break
}
if !added {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug – permanent watch loss after retry exhaustion; "degraded" is misleading.

If all 5 retries (total budget 250 ms) fail, the log says "Certificate hot-reload is degraded!" — but in reality the file watch for targetFile is permanently lost. No future changes to that file will ever be detected; the component silently serves a stale certificate until it expires, then connections fail with no self-healing path (restart required).

Two issues:

  1. Retry budget is very tight. 5 × 50 ms = 250 ms. On network-mounted volumes or under I/O pressure, file creation after an atomic swap can take longer. Consider backing off (e.g. 100, 200, 400, 800, 1600 ms) and/or bumping retries.
  2. No recovery path. Once the watch is lost the only fix is a process restart. Consider a background re-try goroutine, or at minimum change the error message from "degraded" to something like "permanently broken — process restart required to restore certificate rotation."

Comment thread pkg/mtls/loader.go Outdated
return nil, nil, fmt.Errorf("server TLS config: mtls.Config is nil")
}

caPool, err := loadCAPool(cfg.CAFile)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design gap – CA bundle is loaded once and never refreshed.

loadCAPool reads the CA file at startup and bakes the resulting *x509.CertPool into tls.Config.ClientCAs / RootCAs. If the CA rotates (standard in SPIFFE trust domains — trust bundle can change when a root or intermediate CA is re-keyed), the stale pool will reject certs signed by the new CA, causing a full connection outage once the old CA's certs expire.

Since this is a foundational PR, this doesn't need to be fixed here, but please document the limitation prominently (e.g. in the LoadServerConfig / LoadClientConfig godocs) so the follow-up PR that wires listeners knows to address it. A CAWatcher analogous to CertWatcher would close the gap.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check tls config, tls.Config.GetCertificate

Comment thread pkg/mtls/loader.go
Comment on lines +135 to +138
opts := x509.VerifyOptions{
Roots: caPool,
Intermediates: intermediates,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility – VerifyOptions.KeyUsages defaults to ExtKeyUsageServerAuth only.

When KeyUsages is nil/empty, Go's x509.Certificate.Verify defaults to []ExtKeyUsage{ExtKeyUsageServerAuth}. The SPIFFE X.509-SVID spec says leaf SVIDs SHOULD (not MUST) include id-kp-serverAuth. Certs from non-SPIRE CAs or custom PKIs that omit this EKU will be silently rejected here.

Since the library aims to be "certificate-source agnostic", consider explicitly setting:

opts := x509.VerifyOptions{
    Roots:         caPool,
    Intermediates: intermediates,
    KeyUsages:     []x509.ExtKeyUsage{x509.ExtKeyUsageAny},
}

This makes chain verification purely about trust, leaving EKU policy to the caller/SPIFFE layer.

Comment thread pkg/router/server.go Outdated

// Start HTTP or HTTPS server
if s.config.MTLSCertFile != "" || s.config.MTLSKeyFile != "" || s.config.MTLSCAFile != "" {
klog.Warningf("TODO(author): wire mTLS in follow-up PR. mTLS flags are currently parsed but not consumed by the Router listener.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit – TODO(author) violates K8s convention.

Per K8s coding conventions, TODOs should use a real username: TODO(mahil-2040):. Same applies to the identical TODOs in pkg/picod/server.go:119 and pkg/workloadmanager/server.go:177.

Comment thread cmd/picod/main.go Outdated
// mTLS flags (certificate source abstraction)
mtlsCertFile := flag.String("mtls-cert-file", "", "Path to mTLS certificate file")
mtlsKeyFile := flag.String("mtls-key-file", "", "Path to mTLS private key file")
mtlsCAFile := flag.String("mtls-ca-file", "", "Path to mTLS CA bundle file")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can change to not use pointer type

Comment thread cmd/picod/main.go Outdated

// mTLS flags (certificate source abstraction)
mtlsCertFile := flag.String("mtls-cert-file", "", "Path to mTLS certificate file")
mtlsKeyFile := flag.String("mtls-key-file", "", "Path to mTLS private key file")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. I forget have you mounted this dir anywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the control plane (Router/WorkloadManager), we already mounted the spiffe-helper volumes in the previous SPIRE PR. For PicoD, the sandbox pod is built dynamically in Go code. I haven't wired the volume mounts yet, but I added a specific TODO in workload_builder.go so we don't forget to mount the SPIRE CSI driver (or sidecar) in the follow-up PR where the listeners are wired!

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

hzxuzhonghu

This comment was marked as abuse.

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall solid implementation — the abstraction layer is clean, the CertWatcher pattern is correct, and wiring mTLS across all three components consistently is the right approach. A few correctness issues below.

Comment thread pkg/mtls/watcher.go
if debounceTimer != nil {
debounceTimer.Stop()
}
debounceTimer = time.AfterFunc(200*time.Millisecond, func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use-after-stop race. time.AfterFunc fires in a new goroutine. If Stop() is called while the 200ms debounce is in flight, the closure will still invoke cw.reload() after the watcher is torn down. Add a done-channel check inside the closure:

debounceTimer = time.AfterFunc(200*time.Millisecond, func() {
    select {
    case <-cw.done:
        return
    default:
    }
    if err := cw.reload(); err != nil { ... }
})

Comment thread pkg/mtls/loader.go Outdated
ClientCAs: caPool,
ClientAuth: tls.RequireAndVerifyClientCert,
MinVersion: tls.VersionTLS12,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer tls.VersionTLS13 for internal service-to-service mTLS. All three components are modern Go, and TLS 1.3 removes weak cipher suites still negotiable under TLS 1.2.

Comment thread pkg/mtls/spiffeid.go

// SandboxSPIFFEID is the SPIFFE identity for PicoD sandbox pods.
// Note: PicoD omits the namespace segment because sandboxes can run in any namespace.
SandboxSPIFFEID = "spiffe://cluster.local/sa/agentcube-sandbox"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SandboxSPIFFEID omits the ns/<namespace> segment. The SPIRE Kubernetes PSAT node attestor typically issues SVIDs as spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>. If the actual SVID differs from this constant, every PicoD mTLS handshake will fail at VerifyPeerCertificate. Please confirm this matches the SPIRE registration entry exactly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It omit /ns/<namespace> because sandbox pods can run in any namespace. The standard Istio-convention format (/ns/<ns>/sa/<sa>) is used for Router and WorkloadManager since they always run in agentcube-system. I expanded the comment to clearly explain

Comment thread pkg/workloadmanager/workload_builder.go Outdated
// spiffeHelperConfigMapName is the ConfigMap name created by the Helm chart.
spiffeHelperConfigMapName = "spiffe-helper-config"
// spiffeHelperImage is the container image for the spiffe-helper sidecar.
spiffeHelperImage = "ghcr.io/spiffe/spiffe-helper:0.8.0"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spiffeHelperImage is hardcoded to ghcr.io/spiffe/spiffe-helper:0.8.0. Operators in air-gapped environments or upgrading SPIFFE helper versions must recompile. Move this into Config.SPIFFEHelperImage with this string as the default.

Comment thread pkg/workloadmanager/workload_builder.go Outdated
ReadOnly: true,
})

// Append mTLS flags ONLY if the container is a known PicoD entrypoint
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent failure: if no container is named "picod" or "code-interpreter", the --enable-mtls args are never injected but there's no warning emitted. AgentRuntime container names are user-defined, so this will silently produce sandboxes with mTLS volumes but no mTLS flags. Add a warning log if the loop finds zero matching containers.

Comment thread pkg/workloadmanager/workload_builder.go Outdated
},
}

if enableMTLS {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mTLS not injected on the warm-pool (SandboxClaim) path. When WarmPoolSize > 0, buildSandboxByCodeInterpreter returns early at line ~413 before reaching this injectMTLSVolumes call. Claim-provisioned sandbox pods will not get the spiffe-helper sidecar or cert mounts. If the SandboxTemplate already embeds the sidecar, add a comment here. Otherwise injectMTLSVolumes needs to run on the claim-path pod spec too.

Comment thread cmd/picod/main.go Outdated
Comment thread pkg/picod/server.go Outdated
- Fixed CertWatcher shutdown race and resource leaks
- Refactored mTLS injection logic and improved operator warnings
- Added configurable SPIFFE helper image support
- Added graceful shutdown handling for PicoD
- Changed the Min TLS version from TLS1.2 to TLS1.3

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue reading the rest parts this afternoon

Comment thread cmd/picod/main.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we revert picod side, this belong to users' workload. This is hacky, correct me if i am wrong

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, i have reverted it. The previous approach had WorkloadManager reaching into PicoD's container spec to inject --enable-mtls / --mtls-cert-file flags, which couples infrastructure to the workload's CLI interface.

The fix: PicoD now auto-detects mTLS on startup by checking if the well-known SPIRE cert files exist at /run/spire/certs/ (provisioned by the spiffe-helper sidecar). If all three SVID files are present, mTLS is enabled automatically, no flags needed. WorkloadManager only injects the sidecar container, volumes, and cert mounts (the standard sidecar injection pattern) and never touches the workload container's args.

Comment thread cmd/router/main.go Outdated
- Removed workload CLI arg mutation for mTLS configuration
- Added cert-volume-only injection using standard sidecar pattern
- Preserved user-defined spiffe-helper annotations
- Scoped SPIRE sidecar injection to CodeInterpreter sandboxes only
- Added automatic mTLS detection in PicoD based on SPIRE cert presence
- Restricted Router mTLS transport to CodeInterpreter workloads
- Clarified Router mTLS flag behavior in CLI help text

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert all the picod related change, router -> agent or codeinterpreter, this path is special, we cannot fully control it

Comment thread cmd/router/main.go Outdated
Comment thread cmd/router/main.go Outdated
Comment thread cmd/workload-manager/main.go Outdated
Comment thread cmd/workload-manager/main.go Outdated
Comment thread cmd/workload-manager/main.go Outdated
Comment thread cmd/router/main.go Outdated
Comment thread pkg/mtls/config.go
Comment thread pkg/mtls/loader.go
// LoadServerConfig returns a tls.Config for a server that requires mTLS.
// Certs are loaded dynamically via GetCertificate to support rotation.
// If expectedClientIDs is non-empty, clients must present a matching SPIFFE ID.
func LoadServerConfig(cfg *Config, expectedClientIDs []string) (*tls.Config, *CertWatcher, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer implement all in this pr. so it can be tested

Comment thread pkg/mtls/loader.go Outdated
return nil, nil, fmt.Errorf("server TLS config: mtls.Config is nil")
}

caPool, err := loadCAPool(cfg.CAFile)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check tls config, tls.Config.GetCertificate

@hzxuzhonghu
Copy link
Copy Markdown
Member

Please also update the e2e test setup, so mtls can be tested

mahil-2040 added 2 commits May 8, 2026 17:13
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
- Separated Router upstream mTLS flags from external HTTPS listener flags
- Added certificate file existence validation during startup
- Implemented dynamic CA bundle rotation with CertWatcher reloads
- Added dynamic peer verification against rotating CA pools
- Updated tests for the new dynamic CA loading architecture

Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Copilot AI review requested due to automatic review settings May 8, 2026 14:19
@mahil-2040
Copy link
Copy Markdown
Contributor Author

Please also update the e2e test setup, so mtls can be tested

i am working on e2e tests

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (2)

pkg/workloadmanager/handlers_test.go:382

  • The gomonkey patch for buildSandboxByAgentRuntime no longer matches the updated function signature (it now takes an extra spiffeHelperImageOverride string). As written, this test will not compile; update the patched function to accept the full argument list (namespace, name, informers, enableMTLS, spiffeHelperImageOverride).
			patches.ApplyFunc(buildSandboxByAgentRuntime, func(_, _ string, _ *Informers, _ bool) (*sandboxv1alpha1.Sandbox, *sandboxEntry, error) {
				if tc.kind != types.AgentRuntimeKind {
					return nil, nil, errors.New("unexpected kind")
				}
				if tc.buildErr != nil {

pkg/workloadmanager/handlers_test.go:392

  • The gomonkey patch for buildSandboxByCodeInterpreter does not match the new signature (missing spiffeHelperImageOverride string). This will fail to compile; adjust the patch function to accept the extra parameter.
			patches.ApplyFunc(buildSandboxByCodeInterpreter, func(_, _ string, _ *Informers, _ bool) (*sandboxv1alpha1.Sandbox, *extensionsv1alpha1.SandboxClaim, *sandboxEntry, error) {
				if tc.kind != types.CodeInterpreterKind {
					return nil, nil, nil, errors.New("unexpected kind")
				}
				if tc.buildErr != nil {

Comment on lines +74 to +85
// Validate mTLS configuration early (fail fast on bad flags)
tlsConfig := mtls.Config{
CertFile: *tlsCert,
KeyFile: *tlsKey,
CAFile: *tlsCA,
}
if err := tlsConfig.Validate(); err != nil {
klog.Fatalf("Invalid mTLS configuration: %v", err)
}

if *enableMTLS && !tlsConfig.Enabled() {
klog.Fatalf("Invalid mTLS configuration: --enable-mtls requires --tls-cert, --tls-key, and --tls-ca")
Comment thread pkg/mtls/loader.go
Comment on lines +110 to +118
caPool := watcher.GetCAPool()
opts := x509.VerifyOptions{
Roots: caPool,
Intermediates: intermediates,
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth},
}
if _, err := peerCert.Verify(opts); err != nil {
return fmt.Errorf("verify client certificate chain: %w", err)
}
Comment thread pkg/mtls/loader.go
Comment on lines +158 to +166
caPool := watcher.GetCAPool()
opts := x509.VerifyOptions{
Roots: caPool,
Intermediates: intermediates,
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
}
if _, err := peerCert.Verify(opts); err != nil {
return fmt.Errorf("verify server certificate chain: %w", err)
}
}

if mtlsCfg == nil || !mtlsCfg.Enabled() {
return nil, fmt.Errorf("--enable-mtls requires --mtls-cert-file, --mtls-key-file, and --mtls-ca-file to be set")
Comment thread cmd/router/main.go
Comment on lines 38 to +44
debug = flag.Bool("debug", false, "Enable debug mode")
maxConcurrentRequests = flag.Int("max-concurrent-requests", 1000, "Maximum number of concurrent requests that a router server can handle (0 = unlimited)")
enableMTLS = flag.Bool("enable-mtls", false, "Enable mutual TLS on the Router listener and for Router-to-WorkloadManager connections")

// mTLS client certificates used by the Router to authenticate with upstream WorkloadManager.
// These are distinct from --tls-cert/--tls-key which serve the Router's own HTTPS listener.
mtlsCert = flag.String("mtls-cert", "", "Path to mTLS client certificate for upstream WorkloadManager connections")
Comment thread cmd/picod/main.go
Comment on lines +51 to 69
// Start PicoD server in goroutine
errCh := make(chan error, 1)
go func() {
klog.Infof("Starting PicoD server on port %d", *port)
if err := server.Start(ctx); err != nil {
errCh <- err
}
close(errCh)
}()

// Wait for signal or fatal error
select {
case <-ctx.Done():
klog.Info("Received shutdown signal, shutting down gracefully...")
cancel()
<-errCh
case err := <-errCh:
klog.Fatalf("Server error: %v", err)
}
Comment thread cmd/picod/main.go
Comment on lines 31 to 42
func main() {
port := flag.Int("port", 8080, "Port for the PicoD server to listen on")
workspace := flag.String("workspace", "", "Root directory for file operations (default: current working directory)")

// Initialize klog flags
klog.InitFlags(nil)
flag.Parse()

config := picod.Config{
Port: *port,
Workspace: *workspace,
}
VolumeSource: corev1.VolumeSource{
HostPath: &corev1.HostPathVolumeSource{
Path: spireAgentSocketPath,
Type: hostPathType(corev1.HostPathDirectoryOrCreate),
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
@mahil-2040 mahil-2040 requested a review from hzxuzhonghu May 10, 2026 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants