fix: honor internal endpoint resolution on Kubernetes#1097
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes Kubernetes endpoint resolution so resolve_internal=True returns internal pod/service endpoints for server-proxy backends while preserving gateway ingress endpoints as the default in gateway mode (addresses #1096).
Changes:
- Thread
resolve_internalthroughKubernetesSandboxService.get_endpoint()intoWorkloadProvider.get_endpoint_info(). - Update
BatchSandboxProviderandAgentSandboxProviderto bypass gateway ingress formatting whenresolve_internal=True. - Add/extend unit tests to cover resolve-internal behavior in Kubernetes service and both workload providers.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| server/opensandbox_server/services/k8s/workload_provider.py | Extends the provider interface to accept resolve_internal for endpoint resolution. |
| server/opensandbox_server/services/k8s/kubernetes_service.py | Passes resolve_internal through to the workload provider (and updates docstring semantics). |
| server/opensandbox_server/services/k8s/batchsandbox_provider.py | Skips gateway endpoint formatting when resolving internal endpoints. |
| server/opensandbox_server/services/k8s/agent_sandbox_provider.py | Skips gateway endpoint formatting when resolving internal endpoints, preserving internal pod/service fallback. |
| server/tests/k8s/test_kubernetes_service.py | Adds coverage ensuring resolve_internal is passed to the provider. |
| server/tests/k8s/test_batchsandbox_provider.py | Adds coverage for default gateway endpoint vs internal resolution behavior. |
| server/tests/k8s/test_agent_sandbox_provider.py | Adds coverage for default gateway endpoint vs internal resolution behavior (including avoiding pod list calls in gateway-default case). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ba1079c70
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| endpoint = self.workload_provider.get_endpoint_info( | ||
| workload, | ||
| port, | ||
| sandbox_id, | ||
| resolve_internal=resolve_internal, | ||
| ) |
There was a problem hiding this comment.
Preserve secure-access enforcement for server proxy
When Kubernetes gateway mode is combined with secureAccess=true, forwarding resolve_internal=True here makes the server proxy connect directly to the pod/service instead of the ingress gateway. The proxy route does not validate OpenSandbox-Secure-Access itself and filters out automatically attached secure-access headers, so a caller with only the lifecycle API key can now use /sandboxes/{id}/proxy/{port} to bypass the gateway check that previously enforced the per-sandbox token. Either keep routing secure-access sandboxes through the gateway or enforce the secure-access header before proxying internally.
Useful? React with 👍 / 👎.
| workload: Any, | ||
| port: int, | ||
| sandbox_id: str, | ||
| resolve_internal: bool = False, |
There was a problem hiding this comment.
Preserve custom workload provider compatibility
Adding this new keyword does not protect custom providers implemented against the previous get_endpoint_info(self, workload, port, sandbox_id) interface: Python's ABC machinery still treats those subclasses as concrete, and register_provider can still instantiate them, but endpoint resolution now calls the method with resolve_internal=... and raises TypeError for every endpoint/proxy request in those deployments. Please add a compatibility fallback or gate the new keyword to providers that support it.
Useful? React with 👍 / 👎.
|
Updated this PR with a follow-up commit (
Validation run: Question for maintainers: should server proxy be an officially supported path when Kubernetes ingress is in gateway mode, or should docs position it as fallback-only for clients that cannot reach gateway endpoints directly? The current update keeps it supported but enforces secure-access equivalently at the lifecycle proxy boundary. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d38b88a32e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if resolve_internal and not supports_internal_resolution: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_501_NOT_IMPLEMENTED, |
There was a problem hiding this comment.
Keep server proxy working for legacy providers
When a registered custom provider still implements the pre-change get_endpoint_info(workload, port, sandbox_id) signature, any /proxy/... request now hits this branch because the proxy calls get_endpoint(..., resolve_internal=True), so server proxy returns 501 instead of using the provider's existing public/gateway endpoint as it did before. Fresh evidence in the current code is that the compatibility path now explicitly raises here for legacy providers rather than merely avoiding the new keyword, which still disables server proxy after upgrade for those deployments.
Useful? React with 👍 / 👎.
| ) -> StreamingResponse: | ||
| _schedule_proxy_renew(request, sandbox_id) | ||
| endpoint = lifecycle.sandbox_service.get_endpoint(sandbox_id, port, resolve_internal=True) | ||
| _validate_secure_access_header(request.headers, endpoint.headers) |
There was a problem hiding this comment.
Strip secure-access token after validation
When a secure-access sandbox is reached through /proxy/..., this validation now requires the client to send OpenSandbox-Secure-Access, but the backend request is still built from the original request.headers below, and that header is not excluded there. The ingress gateway explicitly deletes this credential before proxying (components/ingress/pkg/proxy/proxy.go:92), so the server proxy now leaks the sandbox access token to the in-sandbox application for every authorized proxied request.
Useful? React with 👍 / 👎.
Pangjiping
left a comment
There was a problem hiding this comment.
Overall: logic correct, security properties sound, well-tested. Approve with minor notes below.
| return "resolve_internal" in signature.parameters or any( | ||
| param.kind is inspect.Parameter.VAR_KEYWORD | ||
| for param in signature.parameters.values() | ||
| ) |
There was a problem hiding this comment.
nit: inspect.signature for capability detection is fragile — decorators, wrappers, or monkeypatching can break the introspected signature.
Since the ABC already declares resolve_internal in the abstract method, any conforming subclass has it. This path only serves duck-typed providers that don't subclass WorkloadProvider.
Alternatives to consider:
- A class attribute like
_supports_internal_resolution = True - Always pass
resolve_internalas kwarg and catchTypeError
Not blocking, but worth noting the fragility for future maintainers.
Summary
Fixes #1096
Tests