Skip to content

fix: honor internal endpoint resolution on Kubernetes#1097

Open
weinong wants to merge 2 commits into
opensandbox-group:mainfrom
weinong:fix/k8s-internal-endpoint-1096
Open

fix: honor internal endpoint resolution on Kubernetes#1097
weinong wants to merge 2 commits into
opensandbox-group:mainfrom
weinong:fix/k8s-internal-endpoint-1096

Conversation

@weinong

@weinong weinong commented Jun 18, 2026

Copy link
Copy Markdown

Summary

  • thread resolve_internal through Kubernetes workload endpoint providers
  • keep gateway ingress endpoints as the default public endpoint in gateway mode
  • bypass gateway formatting for server-proxy backend resolution so BatchSandbox and agent-sandbox use internal pod/service endpoints

Fixes #1096

Tests

  • uv run pytest tests/k8s/test_kubernetes_service.py tests/k8s/test_batchsandbox_provider.py tests/k8s/test_agent_sandbox_provider.py
  • uv run ruff check opensandbox_server/services/k8s/workload_provider.py opensandbox_server/services/k8s/kubernetes_service.py opensandbox_server/services/k8s/batchsandbox_provider.py opensandbox_server/services/k8s/agent_sandbox_provider.py tests/k8s/test_kubernetes_service.py tests/k8s/test_batchsandbox_provider.py tests/k8s/test_agent_sandbox_provider.py
  • git diff --check

@weinong weinong requested a review from Pangjiping as a code owner June 18, 2026 04:42
Copilot AI review requested due to automatic review settings June 18, 2026 04:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Kubernetes endpoint resolution so resolve_internal=True returns internal pod/service endpoints for server-proxy backends while preserving gateway ingress endpoints as the default in gateway mode (addresses #1096).

Changes:

  • Thread resolve_internal through KubernetesSandboxService.get_endpoint() into WorkloadProvider.get_endpoint_info().
  • Update BatchSandboxProvider and AgentSandboxProvider to bypass gateway ingress formatting when resolve_internal=True.
  • Add/extend unit tests to cover resolve-internal behavior in Kubernetes service and both workload providers.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
server/opensandbox_server/services/k8s/workload_provider.py Extends the provider interface to accept resolve_internal for endpoint resolution.
server/opensandbox_server/services/k8s/kubernetes_service.py Passes resolve_internal through to the workload provider (and updates docstring semantics).
server/opensandbox_server/services/k8s/batchsandbox_provider.py Skips gateway endpoint formatting when resolving internal endpoints.
server/opensandbox_server/services/k8s/agent_sandbox_provider.py Skips gateway endpoint formatting when resolving internal endpoints, preserving internal pod/service fallback.
server/tests/k8s/test_kubernetes_service.py Adds coverage ensuring resolve_internal is passed to the provider.
server/tests/k8s/test_batchsandbox_provider.py Adds coverage for default gateway endpoint vs internal resolution behavior.
server/tests/k8s/test_agent_sandbox_provider.py Adds coverage for default gateway endpoint vs internal resolution behavior (including avoiding pod list calls in gateway-default case).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ba1079c70

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +885 to +890
endpoint = self.workload_provider.get_endpoint_info(
workload,
port,
sandbox_id,
resolve_internal=resolve_internal,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve secure-access enforcement for server proxy

When Kubernetes gateway mode is combined with secureAccess=true, forwarding resolve_internal=True here makes the server proxy connect directly to the pod/service instead of the ingress gateway. The proxy route does not validate OpenSandbox-Secure-Access itself and filters out automatically attached secure-access headers, so a caller with only the lifecycle API key can now use /sandboxes/{id}/proxy/{port} to bypass the gateway check that previously enforced the per-sandbox token. Either keep routing secure-access sandboxes through the gateway or enforce the secure-access header before proxying internally.

Useful? React with 👍 / 👎.

workload: Any,
port: int,
sandbox_id: str,
resolve_internal: bool = False,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve custom workload provider compatibility

Adding this new keyword does not protect custom providers implemented against the previous get_endpoint_info(self, workload, port, sandbox_id) interface: Python's ABC machinery still treats those subclasses as concrete, and register_provider can still instantiate them, but endpoint resolution now calls the method with resolve_internal=... and raises TypeError for every endpoint/proxy request in those deployments. Please add a compatibility fallback or gate the new keyword to providers that support it.

Useful? React with 👍 / 👎.

@weinong

weinong commented Jun 18, 2026

Copy link
Copy Markdown
Author

Updated this PR with a follow-up commit (d38b88a3) addressing the review comments:

  • Server proxy now validates OpenSandbox-Secure-Access when the resolved endpoint indicates secure access is required, before forwarding HTTP/WebSocket traffic.
  • Proxy renew scheduling now happens only after secure-access validation succeeds.
  • Secure-access endpoint header lookup is case-insensitive.
  • Custom workload providers remain compatible for normal endpoint resolution; internal server-proxy resolution is passed only to providers that explicitly accept resolve_internal or **kwargs, otherwise it returns a clear 501.
  • Added focused regression tests for HTTP/WS secure-access proxy behavior and provider compatibility.

Validation run:
cd server && uv run pytest tests/test_routes_proxy.py tests/k8s/test_kubernetes_service.py tests/k8s/test_batchsandbox_provider.py tests/k8s/test_agent_sandbox_provider.py
cd server && uv run ruff check opensandbox_server/api/proxy.py opensandbox_server/services/k8s/kubernetes_service.py tests/test_routes_proxy.py tests/k8s/test_kubernetes_service.py

Question for maintainers: should server proxy be an officially supported path when Kubernetes ingress is in gateway mode, or should docs position it as fallback-only for clients that cannot reach gateway endpoints directly? The current update keeps it supported but enforces secure-access equivalently at the lifecycle proxy boundary.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d38b88a32e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +897 to +899
if resolve_internal and not supports_internal_resolution:
raise HTTPException(
status_code=status.HTTP_501_NOT_IMPLEMENTED,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep server proxy working for legacy providers

When a registered custom provider still implements the pre-change get_endpoint_info(workload, port, sandbox_id) signature, any /proxy/... request now hits this branch because the proxy calls get_endpoint(..., resolve_internal=True), so server proxy returns 501 instead of using the provider's existing public/gateway endpoint as it did before. Fresh evidence in the current code is that the compatibility path now explicitly raises here for legacy providers rather than merely avoiding the new keyword, which still disables server proxy after upgrade for those deployments.

Useful? React with 👍 / 👎.

) -> StreamingResponse:
_schedule_proxy_renew(request, sandbox_id)
endpoint = lifecycle.sandbox_service.get_endpoint(sandbox_id, port, resolve_internal=True)
_validate_secure_access_header(request.headers, endpoint.headers)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip secure-access token after validation

When a secure-access sandbox is reached through /proxy/..., this validation now requires the client to send OpenSandbox-Secure-Access, but the backend request is still built from the original request.headers below, and that header is not excluded there. The ingress gateway explicitly deletes this credential before proxying (components/ingress/pkg/proxy/proxy.go:92), so the server proxy now leaks the sandbox access token to the in-sandbox application for every authorized proxied request.

Useful? React with 👍 / 👎.

@Pangjiping Pangjiping left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: logic correct, security properties sound, well-tested. Approve with minor notes below.

return "resolve_internal" in signature.parameters or any(
param.kind is inspect.Parameter.VAR_KEYWORD
for param in signature.parameters.values()
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inspect.signature for capability detection is fragile — decorators, wrappers, or monkeypatching can break the introspected signature.

Since the ABC already declares resolve_internal in the abstract method, any conforming subclass has it. This path only serves duck-typed providers that don't subclass WorkloadProvider.

Alternatives to consider:

  • A class attribute like _supports_internal_resolution = True
  • Always pass resolve_internal as kwarg and catch TypeError

Not blocking, but worth noting the fragility for future maintainers.

@Pangjiping Pangjiping self-assigned this Jun 18, 2026
@Pangjiping Pangjiping added bug Something isn't working component/server labels Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working component/server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes server proxy returns gateway endpoint in gateway mode despite resolve_internal=True

3 participants