Skip to content

fix(CubeAPI): propagate create-time env vars to envd-backed commands#566

Open
zyl1121 wants to merge 1 commit into
TencentCloud:masterfrom
zyl1121:fix/cubeapi-envd-init
Open

fix(CubeAPI): propagate create-time env vars to envd-backed commands#566
zyl1121 wants to merge 1 commit into
TencentCloud:masterfrom
zyl1121:fix/cubeapi-envd-init

Conversation

@zyl1121

@zyl1121 zyl1121 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Motivation

For envd-backed command execution, create-time environment variables should be propagated through the sandbox create path and initialized inside the sandbox runtime.

On the current CubeAPI sandbox creation path, this propagation is incomplete:

  • the E2B SDK sends create-time environment variables as envs, while CubeAPI only accepted envVars
  • create-time environment variables are not forwarded end-to-end to the cubelet/envd initialization path used by envd-backed commands

As a result, envd-backed execution such as commands.run cannot see environment variables passed at sandbox creation time.

This PR does not introduce a generic sandbox startup-time environment refresh model across all runtime paths.

What this PR changes

This PR fixes the create-time env propagation chain for envd-backed command execution:

  • accepts envs as an alias of envVars in NewSandbox, so both E2B SDK and CubeSandbox SDK callers are supported
  • validates create-time env var names and rejects dangerous control-character values before sandbox creation
  • forwards create-time env vars from CubeAPI into CubeMaster as create_time_env_vars
  • serializes the same env map into an internal annotation for cubelet
  • lets cubelet trigger envd /init after sandbox startup and probe success
  • gates envd /init on the propagated envd capability annotation, cube.master.components.envd.version
  • fails sandbox creation if create-time env vars are requested for a template that is not envd-backed
  • fails sandbox creation if cubelet cannot complete envd initialization after startup
  • adds focused tests for alias deserialization, env forwarding, annotation serialization, cubelet-side envd /init, and fail-fast gating

On the main CubeMaster-managed create path, failures after sandbox allocation are still covered by CubeMaster's async destroy compensation. The additional cubelet-side cleanup keeps the runtime-local fail-fast path self-contained and also protects direct cubelet callers.

Scope

This PR restores create-time environment variables for envd-backed execution, such as commands.run.

It does not change run_code. run_code goes through the bundled lightweight code interpreter instead of envd-backed process execution, and that interpreter does not currently read envd /envs.

It also does not claim generic startup-time env refresh support for every sandbox runtime path.

Testing

Unit tests:

cargo check --manifest-path CubeAPI/Cargo.toml --bin cube-api
(cd CubeMaster && go test -vet=off ./pkg/service/sandbox -run TestSetCreateTimeEnvVarsAnnotation -count=1)

Cluster validation:

from e2b_code_interpreter import Sandbox

with Sandbox.create(
    template="tpl-xxxx",
    envs={"CUBE_TEST_CREATE": "from-e2b-create"},
) as sandbox:
    print("commands.run:")
    print(sandbox.commands.run("echo $CUBE_TEST_CREATE").stdout)

    print("run_code:")
    result = sandbox.run_code(
        "import os; print(os.environ.get('CUBE_TEST_CREATE', '<NOT SET>'))"
    )
    print(result.logs.stdout)

Before this PR, on current origin/master:

commands.run:


run_code:
['<NOT SET>\n']

After this PR:

commands.run:
from-e2b-create

run_code:
['<NOT SET>\n']

Verified behavior:

  • verified the E2B SDK envs path
  • verified the CubeSandbox SDK native envVars path
  • verified that commands.run sees create-time env vars after sandbox creation
  • verified that sandbox creation fails fast if create-time env vars are requested on a template without the envd capability annotation
  • verified that envd-backed templates still initialize envd successfully and expose create-time env vars to commands.run
  • verified that per-call command env vars override sandbox-level env vars
  • verified that after a per-call override, the next commands.run falls back to the sandbox-level env vars
  • verified that run_code remains <NOT SET> on the current runtime image
  • verified on a real cluster that fail-fast after sandbox allocation does not leave a zombie sandbox on the main CubeMaster-managed path

Compatibility

This change is backward compatible for existing create requests:

  • existing envVars requests continue to work
  • E2B SDK envs requests now work
  • sandbox creation without create-time env vars continues to work for legacy templates
  • sandbox creation with create-time env vars now fails fast when the template does not carry the envd capability annotation
  • per-call command environment variable precedence is unchanged

Migration / Compatibility Note

Create-time env injection now requires the template to carry the propagated envd capability annotation:

cube.master.components.envd.version

Templates created before envd capability reporting was introduced may not have this annotation yet. For those legacy templates:

  • sandbox creation without create-time env vars continues to work as before
  • sandbox creation with create-time env vars now fails fast with a clear error
  • tpl redo may not be sufficient, because it can resume from the existing snapshot artifact without re-running envd capability collection
  • create a new template through create-from-image so the build goes through the full BOOT -> SNAPSHOT path and records the envd capability annotation before using create-time env injection

In validation, tpl redo resumed from SNAPSHOTTING, reused the existing artifact, and finished in a few seconds without re-running envd capability collection. A fresh create-from-image rebuild went through the full PULLING -> INIT -> BOOT -> SNAPSHOT flow and correctly recorded cube.master.components.envd.version.

For envd-backed command execution, precedence remains:

sandbox-level env vars from create-time init < per-call command env vars

Documentation

This PR includes a small quickstart visibility update in examples/code-sandbox-quickstart:

  • README.md
  • README_zh.md
  • create_with_envs.py

The behavior change is intentionally narrow: the sandbox create path now propagates create-time environment variables through CubeAPI, CubeMaster, and cubelet, and treats failed envd initialization as a create failure for envd-backed command execution.

The remaining run_code behavior is tracked in the linked issue and requires runtime/image-side support in the bundled lightweight code interpreter.

Related issue

Refs #565

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
return Ok(());
};
let url = build_envd_init_url(base_url, sandbox_id);
let resp = self

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing HTTP timeout on envd init call – The shared reqwest::Client in state.rs (line 46-50) is built without .timeout(), which means reqwest::Client defaults to no timeout. If the envd proxy is unresponsive, this POST will hang indefinitely, blocking the sandbox creation endpoint for that handler. Since this call is best-effort (failure is logged and swallowed), it should have a tight timeout so it fails fast.

Suggested change
let resp = self
let resp = self
.http_client
.post(url)
.json(&EnvdInitRequest { env_vars })
.timeout(std::time::Duration::from_secs(10))
.send()
.await
.map_err(|e| AppError::Internal(anyhow::anyhow!("envd init request failed: {}", e)))?;

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
let url = build_envd_init_url(base_url, sandbox_id);
let resp = self
.http_client
.post(url)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing auth header on envd init request – Every other call to the sandbox proxy (e.g. run_envd_command in agenthub.rs:2162) sends Authorization: Basic cm9vdDo=. This new init_sandbox_env_vars method sends no auth header. If the downstream /init endpoint enforces the same proxy-level auth, this will silently fail on every request (logged at warn, caller sees success). If it does not enforce auth, this is an inconsistency that may widen the attack surface.

Suggested change
.post(url)
.http_client
.post(url)
.header("Authorization", "Basic cm9vdDo=")
.json(&EnvdInitRequest { env_vars })

Comment thread CubeAPI/src/services/sandboxes.rs Outdated

async fn init_sandbox_env_vars(
&self,
sandbox_id: &str,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent skip when env vars are provided but no proxy URL is configured – When sandbox_proxy_base_url is None and env_vars is non-empty, the init call is skipped silently. The caller (create_sandbox) only logs a warning when the HTTP call itself fails, not when it's skipped. This makes it hard for operators to debug why client-provided env vars are being silently ignored. Consider adding a tracing::warn! here when env_vars is non-empty but the proxy URL is unset.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
env_vars: &'a HashMap<String, String>,
}

pub(crate) fn default_sandbox_proxy_base_url() -> Option<String> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub(crate) function should have a doc commentdefault_sandbox_proxy_base_url is visible outside this module (called from services/mod.rs:104) but has no doc comment. It reads three environment variables with a fallback chain. A doc comment explaining the env vars (AGENTHUB_SANDBOX_PROXY_URL, CUBE_SANDBOX_NODE_IP, CUBE_PROXY_HTTP_PORT), their meanings, and the fallback priority would help future maintainers.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Path((sandbox_id, port)): Path<(String, u16)>,
Json(body): Json<Value>,
) -> StatusCode {
assert_eq!(sandbox_id, "sb-123");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions inside spawned test handler are invisible on failure – These assert_eq! calls run inside a tokio::spawn-ed axum handler. If they fail (say from a code change that alters the URL path), the panic is caught by tokio and logged to stderr, but does not cause the test to fail. The test would silently pass because the handler returning an error response triggers the best-effort fallback. Consider returning the received values in the response and asserting from the caller side, or using a tokio::sync::oneshot channel to propagate assertion failures.

@zyl1121 zyl1121 changed the title fix(CubeAPI): initialize envd for create-time env vars fix(CubeAPI): propagate create-time env vars to envd-backed commands Jun 15, 2026
@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Are you doing the same thing as PR #554?

@zyl1121

zyl1121 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Are you doing the same thing as PR #554?

Thanks for pointing this out. It is related to #554, but the fix is applied at a different layer.

#554 calls envd /init from the CubeSandbox SDKs after Sandbox.create(env_vars=...) succeeds. This PR moves that propagation into CubeAPI itself: CubeAPI accepts both envs and envVars, and initializes envd after CubeMaster creates the sandbox.

The main reason is compatibility with clients that do not use the CubeSandbox SDK, especially E2B SDK users, since the E2B SDK sends envs instead of envVars. With this PR, any client creating sandboxes through CubeAPI gets the same create-time env propagation for envd-backed command execution, without implementing the /init call client-side.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
const RET_CODE_HTTP_OK: i32 = 200;
const RET_CODE_NOT_FOUND: i32 = 130404;
const RET_CODE_CONFLICT: i32 = 130409;
const ENVD_PORT: u16 = 49983;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: This hardcoded credential (Basic cm9vdDo= → root with empty password) is shared across all deployments and transmitted over plain HTTP when the proxy URL is derived from CUBE_SANDBOX_NODE_IP. Consider making this configurable via an environment variable (with the current value as a fallback default), and document that production deployments should use per-deployment secrets over HTTPS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR intentionally reuses the existing sandbox-proxy auth behavior rather than introducing a new auth contract.

The same Authorization: Basic cm9vdDo= value is already used by the existing envd proxy path in run_envd_command (CubeAPI/src/handlers/agenthub.rs). In this PR I only aligned the new create-time /init path with that existing server-side behavior.

I agree that making the proxy auth configurable would be a worthwhile follow-up, but I would prefer to keep that as a separate change because it affects the broader CubeAPI -> sandbox proxy contract rather than this narrow env propagation fix.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
return Ok(());
};
let url = build_envd_init_url(base_url, sandbox_id);
let resp = self

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quality: Wrapping the error with format!("...{}", e) calls Display on the underlying error, which often produces a shorter message than the full chain. Consider using .context("envd init request failed") from the anyhow crate (if AppError::Internal accepts anyhow::Error) to preserve the full error chain for debugging.

SandboxLogsRequest {
sandbox_id: sandbox_id.to_string(),
cursor,
limit,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security (input validation): The env_vars map accepts arbitrary key/value pairs with no size limits. An API client could send thousands of entries or large values, potentially causing memory exhaustion. Consider capping the number of entries and maximum key/value length, either via #[validate] attributes or manual validation in create_sandbox.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree that request-size limits are worth considering.

For this PR, I kept the scope limited to restoring the verified CubeAPI -> envd propagation path for create-time envs. Adding entry-count / key-length / value-length limits would introduce a broader API validation contract for NewSandbox, which likely deserves separate discussion so the limits can be applied consistently across callers and documented clearly.

I would prefer to keep this PR focused on the propagation fix and handle request-size validation as a follow-up if maintainers think that should be enforced at the API layer.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
derive_sandbox_proxy_base_url(
agenthub_proxy.as_deref(),
node_ip.as_deref(),
proxy_port.as_deref(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security / Robustness: CUBE_SANDBOX_NODE_IP is used directly in URL construction without validating it's a valid IP address or hostname. Consider validating the value (e.g., parse as IPv4) before using it in URL construction, and reject malformed values instead of silently constructing a potentially unsafe URL.

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 34e765f to c466139 Compare June 15, 2026 07:52
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
const RET_CODE_CONFLICT: i32 = 130409;
const ENVD_PORT: u16 = 49983;
const ENVD_INIT_TIMEOUT_SECS: u64 = 10;
const SANDBOX_PROXY_BASIC_AUTH: &str = "Basic cm9vdDo=";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded weak Basic auth credential — This decodes to root: (username root, empty password). The credential is identical across all deployments, non-rotatable without a code change, and trivially recoverable from source/binary. Anyone with network access to the CubeProxy endpoint can replay it to inject arbitrary env vars into any sandbox.

Suggestion: Promote this to a runtime configuration value (e.g., env var CUBE_PROXY_AUTH_TOKEN) loaded at service startup instead of hardcoding it. At minimum, use a non-empty password and document that operators must override it.

Ref: CWE-798

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment on lines +181 to +189
if let Some(env_vars) = env_vars.as_ref() {
if let Err(err) = self.init_sandbox_env_vars(&resp.sandbox_id, env_vars).await {
tracing::warn!(
sandbox_id = %resp.sandbox_id,
error = %err,
"envd init after sandbox create failed"
);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential envd init delays sandbox creation response — The init_sandbox_env_vars call is awaited synchronously before returning the sandbox response to the caller. Since this is best-effort (errors are swallowed), the client still waits for envd init latency — up to the 10-second timeout on failure. This inflates tail latency for every create_sandbox even though the sandbox is already usable.

Suggestion: Spawn envd init as a detached background task so the response returns immediately after CubeMaster confirms creation. SandboxService already derives Clone and reqwest::Client is Arc-backed, so cloning is cheap.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
let proxy_port = proxy_port
.map(str::trim)
.filter(|value| !value.is_empty())
.unwrap_or("80");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent default port 80 masks misconfiguration — When CUBE_PROXY_HTTP_PORT is unset, the code silently defaults to port 80. If the proxy listens on 8080 (a common alternative), envd init silently fails and — because it's best-effort — the sandbox still creates without env vars being applied. The failure surfaces only as a generic HTTP error.

Suggestion: Return None from derive_sandbox_proxy_base_url when CUBE_SANDBOX_NODE_IP is set but CUBE_PROXY_HTTP_PORT is absent, letting the missing configuration be logged as a warning rather than silently assuming port 80.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment on lines 590 to 597
fn build_envd_init_url(base_url: &str, sandbox_id: &str) -> String {
format!(
"{}/sandbox/{}/{}/init",
base_url.trim_end_matches('/'),
sandbox_id,
ENVD_PORT
)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing URL path-segment encodingsandbox_id is interpolated directly into the URL path without validation or encoding. The codebase has a validate_path_segment helper (in cubemaster/mod.rs:567) used elsewhere that is not applied here. While the current sandbox_id is server-generated with a constrained character set, inconsistent validation creates a maintenance hazard.

Suggestion: Call validate_path_segment before constructing the URL, or add a comment documenting why it's unnecessary (e.g., the ID is guaranteed UUID-like).

Comment thread CubeAPI/src/models/mod.rs

#[serde(rename = "envVars", skip_serializing_if = "Option::is_none")]
#[serde(
alias = "envs",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envs alias not documented on the struct — The #[serde(alias = "envs")] attribute is tested but the NewSandbox struct-level doc comment only mentions envVars. API consumers reading the source won't know that envs (used by the E2B SDK) is also accepted.

Suggestion: Update the struct doc comment to mention that envs is an accepted alias for envVars. Also consider documenting the behavior when both envs and envVars are present (serde_json last-writer-wins for aliased fields).

@cubesandboxbot

cubesandboxbot Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review: PR #566 — Propagate create-time env vars to envd-backed commands

Overall: This is a well-structured PR with clear data flow (CubeAPI -> CubeMaster -> Cubelet -> envd), thorough input validation, and backward compatibility. The design is sound and the happy-path test coverage is solid. Below are findings that merit attention before merge.


1. Zombie sandbox on envd init failure (High)

Cubelet/services/cubebox/cube_container_create.go:339

When doCreateTimeEnvdInit fails, the function returns an error but containers have already been created, probed, and post-create hooks have run (lines 331-337). There is no cleanup. The sandbox becomes a zombie consuming resources.

Fix: tear down containers on envd init failure, or at minimum mark the sandbox as failed for garbage collection.


2. Synchronous envd init blocks create path (Medium)

Cubelet/services/cubebox/cube_container_create.go:339

The envd init HTTP call blocks sandbox creation for up to 10s. This directly inflates p95/p99 create latency. Consider a shorter dial timeout or asynchronous init.


3. No aggregate env var payload size limit (Medium)

CubeMaster/pkg/service/sandbox/util.go:692

Individual values are capped at 4KB, but there is no limit on total payload size. Hundreds of large env vars could exceed the approx 256KB Kubernetes annotation limit. Add a cap after marshaling.


4. Documentation: Missing files in directory listing (Medium)

examples/code-sandbox-quickstart/README.md and README_zh.md

The directory tree is missing create_with_envs.py (the new file) and env_utils.py (shared loader). Both should be added to the listing.


5. Documentation: envs vs env_vars parameter name (Low)

examples/code-sandbox-quickstart/create_with_envs.py:13

The example uses envs=... but the SDK Sandbox.create() uses env_vars=. It works via **kwargs but wont appear in IDE autocompletion. Consider using env_vars= for consistency.


Summary

Priority Issue
High Zombie sandbox on envd init failure - no resource cleanup
Medium Synchronous envd init blocks create path (10s timeout)
Medium No aggregate env var payload size limit
Medium Missing file entries in README directory listings
Low Example uses envs= instead of documented env_vars=

Inline comments cover the HTTP client redirect behavior, io.ReadAll error discard, mutable global port variable, and test race conditions in more detail.

Overall the approach is sound and core functionality works correctly. The main concerns are operational reliability (zombie cleanup, aggregate size limits).

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 851ec0d to c7f0071 Compare June 15, 2026 08:18
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
Comment thread CubeAPI/src/services/sandboxes.rs
Comment thread CubeAPI/src/services/sandboxes.rs Outdated
@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 629828b to bd9f786 Compare June 25, 2026 06:47
@kinwin-ustc

Copy link
Copy Markdown
Collaborator

Full E2B compatibility is our goal. This PR supports passing environment variables when creating sandboxes for E2B. Can you add some documentation or examples to show that we already support this E2B capability?

@zyl1121

zyl1121 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@kinwin-ustc

Full E2B compatibility is our goal. This PR supports passing environment variables when creating sandboxes for E2B. Can you add some documentation or examples to show that we already support this E2B capability?

I think the wording should be a bit broader than “for E2B”.

Before this PR, create-time environment variables were not implemented in CubeAPI’s sandbox creation path, because they were dropped during sandbox creation. So this was not only an E2B issue. It affected both E2B SDK callers (envs) and CubeSandbox SDK callers (envVars), because both go through the same backend path.

This PR fixes that missing backend propagation, so both SDKs benefit from the same functional fix.

I’m happy to add a small note or example in the quickstart docs in this PR to make that visible. Then I can follow up with a dedicated docs PR to add the broader usage documentation, since some of the current wording implies this capability already worked before this fix. What do you think?

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from bd9f786 to 5d22dfe Compare June 26, 2026 06:07
@zyl1121

zyl1121 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@kinwin-ustc I added a small quickstart/example update in this PR to make the supported behavior visible:

  • examples/code-sandbox-quickstart/create_with_envs.py
  • quickstart README notes in English and Chinese

I can follow up with a dedicated docs/guide PR for fuller user-facing usage guidance and SDK-specific examples.


/// Environment variable names that may compromise sandbox isolation if injected
/// at the runtime level (loader overrides, language runtime paths).
const FORBIDDEN_ENV_NAMES: &[&str] = &[

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NODE_OPTIONS and PERL5OPT in the forbidden list — these enable arbitrary code execution through Node.js (--require, --eval flags) and Perl (PERL5OPT), comparable to JAVA_TOOL_OPTIONS and RUBYOPT which are already blocked.

Also consider adding TLS/CA-path env vars (SSL_CERT_FILE, REQUESTS_CA_BUNDLE, NODE_EXTRA_CA_CERTS, CURL_CA_BUNDLE) to prevent MITM through custom certificate injection.

value.len()
)));
}
if value.contains('\0') {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only NUL bytes are rejected in values, but other control characters (\n, \r, \x1b, etc.) pass through. These can forge log entries, inject terminal escape sequences into error messages, or corrupt line-oriented protocols. Consider rejecting non-printable ASCII (0x00-0x1F, 0x7F) or sanitizing values before embedding in logs/errors.

StatusCode::NO_CONTENT
}

async fn spawn_server(app: Router) -> String {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawn_server is duplicated verbatim in both async tests. Consider extracting to a shared test helper to avoid maintenance drift.

/// Validate environment variable names against the POSIX name convention
/// and a deny-list of runtime-loader / path-override names that could
/// compromise sandbox isolation.
fn validate_env_vars(env_vars: &HashMap<String, String>) -> AppResult<()> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No maximum env var count limit. An attacker could send thousands of env var entries, consuming CPU and memory during validation. Consider enforcing a reasonable upper bound (e.g., 64 or 128 entries) in validate_env_vars.

Comment thread CubeAPI/src/services/sandboxes.rs Outdated

resp.ret.into_result().map_err(internal_error)?;

if let Some(env_vars) = env_vars.as_ref() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty env var maps (Some({})) still trigger an HTTP POST to envd with {"envVars":{}}. If the init endpoint doesn't handle empty maps gracefully or is unreachable, this causes unnecessary sandbox creation failures. Consider if !env_vars.is_empty() guard before calling init_sandbox_env_vars.

}

#[test]
fn create_sandbox_rejects_dangerous_env_var_names() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 11 of the 22 FORBIDDEN_ENV_NAMES entries are tested in this test. Consider iterating over the full constant so new additions are automatically covered (uncovered: LD_AUDIT, LD_ORIGIN_PATH, DYLD_INSERT_LIBRARIES, DYLD_LIBRARY_PATH, GCONV_PATH, NODE_PATH, GEM_PATH, PERL5LIB, CLASSPATH, IFS).

@ls-ggg

ls-ggg commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Thanks for your PR! I think this approach is feasible, but it requires a more refined implementation. Here are a few points to consider:

Initiation Point: If we need to refresh the env via envd's init interface, I believe it's more appropriate to initiate the request from cubelet. Cubelet already has an implementation for probing sandboxes, so the logic would be quite similar. Furthermore, this allows us to resolve the issue as close to the sandbox as possible.

Envd Detection: We need to confirm that envd is actually started inside the sandbox before taking this action. This means we should analyze the sandbox information during template creation to determine whether envd will be launched. @fslongjin cc

@zyl1121

zyl1121 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@ls-ggg Thanks, I agree that cubelet is probably the cleaner long-term place to trigger envd init, since it is closer to the sandbox/runtime.

My main point is that there are two separate issues here:

  1. CubeAPI currently drops the E2B envs field during sandbox creation.
    This part needs to be fixed anyway, regardless of where envd init is eventually triggered.
  2. Where to trigger envd /init.
    Keeping it in CubeAPI is a smaller and more targeted fix, and this path has already been validated on a real cluster.
    Moving it to cubelet would be cleaner from an ownership perspective, but it also means a larger cross-layer change: CubeAPI still needs to accept the env fields, CubeMaster / cubelet need to pass them through, and we need a clear way to know whether a sandbox is envd-backed.

So I agree cubelet may be the better design. I’m happy to follow the preferred direction here. @fslongjin @kinwin-ustc could you help confirm whether this PR should keep the smaller CubeAPI-side fix for now, or change direction to a cubelet-side runtime propagation design?

@kinwin-ustc

Copy link
Copy Markdown
Collaborator

@zyl1121 I think it's more suitable for cubelet to handle this trigger. Cubemaster and cubelet already support envs; we only need to fix the cubeapi to pass envs down. Meanwhile, the cubelet can easily use the existing probe mechanism to initiate the init request. As for detecting envd in the image, we can reuse the mechanism in PR #650 to determine whether a template has envd.

@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch 2 times, most recently from 3b45415 to b68dcf4 Compare June 28, 2026 07:51
Comment thread Cubelet/services/cubebox/probe.go Outdated
Comment thread Cubelet/services/cubebox/probe.go Outdated
Comment thread Cubelet/services/cubebox/probe.go Outdated
Comment thread CubeMaster/pkg/service/sandbox/util.go
Comment thread Cubelet/services/cubebox/cube_container_create.go
Comment thread Cubelet/services/cubebox/probe.go Outdated
Comment thread Cubelet/services/cubebox/probe.go Outdated
Comment thread CubeAPI/src/cubemaster/mod.rs
@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch 3 times, most recently from ab49df5 to 25437c2 Compare June 28, 2026 09:32
Create-time env vars were dropped during sandbox creation, so envd-backed
command execution (e.g. commands.run) could not see them.

Forward env vars from CubeAPI into CubeMaster, serialize them into an
internal annotation, and have cubelet initialize envd via its /init
data-plane endpoint after sandbox startup. Gate on the propagated envd
capability signal (cube.master.components.envd.version) and fail fast when
injection is requested on a template without it.

Scope is limited to the envd data-plane init path; container-level startup
env refresh is intentionally not included.

Add focused tests across CubeAPI/CubeMaster/cubelet and a quickstart example.

Signed-off-by: zhengyilei <zheng_yilei@qq.com>
Co-authored-by: jinlong <jinlong@tencent.com>
@zyl1121 zyl1121 force-pushed the fix/cubeapi-envd-init branch from 25437c2 to d3b6e17 Compare June 28, 2026 09:38
@zyl1121

zyl1121 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

@kinwin-ustc @ls-ggg I reworked the implementation based on your earlier guidance.

Main changes:

  • CubeAPI now only accepts, validates, and forwards create-time env vars instead of triggering envd init directly
  • CubeMaster propagates them as create_time_env_vars and serializes them into the internal annotation for cubelet
  • cubelet triggers envd /init after sandbox startup and probe success
  • sandbox creation now fails fast if create-time env vars are requested but the template does not carry the envd capability annotation, or if envd init fails after startup
  • the implementation now reuses the envd capability signal cube.master.components.envd.version introduced in feat(envd): report collected envd versions #650

I also updated the PR description to document the current scope, fail-fast behavior, legacy-template compatibility, and the remaining run_code limitation.

For legacy templates, ordinary sandbox creation without create-time env vars remains backward compatible. Create-time env injection requires rebuilding the template through create-from-image so the envd capability annotation is recorded.

Could you help check whether this direction now matches the intended CubeAPI / CubeMaster / cubelet ownership split? @fslongjin could you also take a look when convenient, especially around the fail-fast behavior and legacy-template compatibility note?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants