Feat/webCaseCenter#615
Conversation
b9cb05d to
321e017
Compare
| .env("CUBE_API_ENVD_AUTH", &self.envd_auth) | ||
| .current_dir(&work_dir); | ||
|
|
||
| if std::env::var("CUBE_API_KEY").is_err() { |
There was a problem hiding this comment.
Security: Hardcoded default CUBE_API_KEY
When CUBE_API_KEY is not set in the parent process environment, the example subprocess receives a well-known default key (cube_0000000000...). Any user who triggers an example run can extract this key from the process environment and use it to authenticate arbitrary API calls.
Either remove the fallback entirely (scripts should handle a missing key gracefully) or use an obviously non-functional sentinel (e.g., empty string) that scripts can detect and refuse to use.
| } | ||
|
|
||
| // 3. Any healthy/ready template | ||
| if let Ok(tpls) = templates.list_templates().await { |
There was a problem hiding this comment.
Performance: Template validation iterates healthy candidates twice + makes 2N gRPC calls
Line 448 chains tpls.iter() (the full set) after the filtered subset, so every template is visited at least once and healthy/ready templates are validated twice. The inner templates.get_template() call is a gRPC round-trip to CubeMaster — in the worst case this makes 2N sequential gRPC calls.
The fix: replace .chain(tpls.iter()) with .chain(tpls.iter().filter(|t| t.status != "healthy" && t.status != "ready")), or better, inline the status check from list_templates and skip get_template entirely for templates already confirmed as healthy/ready.
| } | ||
| }; | ||
|
|
||
| let fingerprint = { |
There was a problem hiding this comment.
DefaultHasher uses random seeds per-process — cache misses after every restart
DefaultHasher::new() uses RandomState with per-process seeds, so the same requirements.txt content produces a different hash after every server restart. The fingerprint stamp file written by a previous process lifetime will never match, causing pip3 install to rerun unconditionally on the first example run after each restart.
Replace with a deterministic hash (e.g., SipHasher::new_with_keys(0, 0)) or simply compare the file content directly for these trivially small requirements files.
| .map_err(anyhow::Error::from) | ||
| } | ||
|
|
||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Dead code with explicit #[allow(dead_code)] — will drift from schema
This method is never called (explicitly acknowledged by the attribute). DB access methods that go unused will silently drift as columns are renamed/added/removed, then fail when someone finally calls them. Either wire it into a real caller or remove it.
| fn default_sandbox_proxy_url() -> String { | ||
| std::env::var("AGENTHUB_SANDBOX_PROXY_URL").unwrap_or_else(|_| "http://127.0.0.1".to_string()) | ||
| } | ||
| fn default_envd_auth() -> String { |
There was a problem hiding this comment.
Security: Default envd_auth is a well-known base64 value
The default Basic cm9vdDo= decodes to root: with an empty password — the envd built-in default. This shared credential is sent to every sandbox's Jupyter/envd endpoint. Anyone who learns it (including by running code in their own sandbox and observing request headers) can access the in-sandbox services of other sandboxes. Use per-sandbox credentials (the sandbox detail already includes an envd_access_token field) instead of a cluster-wide shared secret.
| // Pull all images in parallel, then inspect each. | ||
| let handles: Vec<_> = STORE_IMAGES | ||
| /// Pulls all store images from the registry then returns the updated metadata. | ||
| pub async fn refresh_store_meta(State(state): State<AppState>) -> impl IntoResponse { |
There was a problem hiding this comment.
Security & performance: Concurrent Docker pulls with no rate limiting
This spawns one docker pull --quiet per catalog image concurrently via spawn_blocking. With 6+ images (some multi-GB), this can saturate Docker daemon capacity and network bandwidth. Also, POST /store/refresh should use with_auth_and_rate_limit in the router (like the node isolation endpoint) rather than plain with_auth. Consider adding a semaphore to cap concurrent pulls (e.g., 2-3 at a time).
| .env("CUBE_API_ENVD_AUTH", &self.envd_auth) | ||
| .current_dir(&work_dir); | ||
|
|
||
| if std::env::var("CUBE_API_KEY").is_err() { |
There was a problem hiding this comment.
Security: Hardcoded default CUBE_API_KEY
When CUBE_API_KEY is not set in the parent process environment, the example subprocess receives a well-known default key (cube_0000000000...). Any user who triggers an example run can extract this key from the process environment and use it to authenticate arbitrary API calls.
Either remove the fallback entirely (scripts should handle a missing key gracefully) or use an obviously non-functional sentinel (e.g., empty string) that scripts can detect and refuse to use.
| } | ||
|
|
||
| // 3. Any healthy/ready template | ||
| if let Ok(tpls) = templates.list_templates().await { |
There was a problem hiding this comment.
Performance: Template validation iterates healthy candidates twice + makes 2N gRPC calls
Line 448 chains tpls.iter() (the full set) after the filtered subset, so every template is visited at least once and healthy/ready templates are validated twice. The inner templates.get_template() call is a gRPC round-trip to CubeMaster — in the worst case this makes 2N sequential gRPC calls.
The fix: replace .chain(tpls.iter()) with .chain(tpls.iter().filter(|t| t.status != "healthy" && t.status != "ready")), or better, inline the status check from list_templates and skip get_template entirely for templates already confirmed as healthy/ready.
| } | ||
| }; | ||
|
|
||
| let fingerprint = { |
There was a problem hiding this comment.
DefaultHasher uses random seeds per-process → cache misses after every restart
DefaultHasher::new() uses RandomState with per-process seeds, so the same requirements.txt content produces a different hash after every server restart. The fingerprint stamp file written by a previous process lifetime will never match, causing pip3 install to rerun unconditionally on the first example run after each restart.
Replace with a deterministic hash (e.g., SipHasher::new_with_keys(0, 0)) or simply compare the file content directly for these trivially small requirements files.
| /// calls return the same `&'static` slice, so the front-end gets a stable | ||
| /// ordering without the borrow checker complaining about temporaries. | ||
| pub fn scenario_registry() -> &'static [ScenarioSpec] { | ||
| Box::leak(Box::new(vec![ |
There was a problem hiding this comment.
scenario_registry() currently allocates and leaks a new Vec<ScenarioSpec> on every call, even though the comment says the registry is materialized once. Endpoints such as list, source, and run can call this repeatedly, so a long-lived API process will grow memory over time. Please store the vector behind OnceLock/LazyLock and return the initialized slice, ideally with a regression test that repeated calls reuse the same registry.
| let start = Instant::now(); | ||
|
|
||
| let output = match req.language.as_str() { | ||
| "python" => self.exec_python(sandbox_id, domain, &req.code).await?, |
There was a problem hiding this comment.
timeout_secs is only used after the Jupyter/envd request has already returned; the service still awaits exec_python/exec_bash without a real deadline. A hung or slow sandbox request can keep the handler occupied, and larger user timeouts will still be cut off by the outer 30s route timeout. Please wrap the actual execution future in tokio::time::timeout and align the route timeout with the allowed request budget.
|
|
||
| let start = Instant::now(); | ||
| let max_secs = sc.timeout_secs.unwrap_or(120); | ||
| let run_result = timeout(Duration::from_secs(max_secs), cmd.output()).await; |
There was a problem hiding this comment.
Timing out cmd.output() drops the future but does not explicitly kill the spawned process or its process tree, so long-running Python/Go/Chromium children can continue after the API has returned. This path also installs requirements.txt during a request, mutating the server's Python environment and using a source-tree stamp file without concurrency control. Please spawn and kill/wait the child on timeout, and move dependency installation to build/deploy time or an isolated, locked cache that fails closed.
| fn build_examples_run_routes(state: &AppState, auth_configured: bool) -> Router<AppState> { | ||
| let routes = Router::new().route("/examples/run", post(examples::run_example)); | ||
| if auth_configured { | ||
| routes.layer(middleware::from_fn_with_state(state.clone(), unified_auth)) |
There was a problem hiding this comment.
POST /examples/run is moved to a long-timeout router with auth only, but it no longer gets the rate limiting used by other resource-creating or execution endpoints. Because this route starts local processes and can create sandboxes or browser sessions for up to several minutes, missing rate limits increases abuse and resource-exhaustion risk. Please apply the existing auth+rate-limit layer here, or add a dedicated concurrency/rate budget for example runs.
| // `hidden: true` filter so even an attacker who knows the IDs cannot reach | ||
| // these scenarios through the HTTP API. | ||
|
|
||
| export const EXAMPLE_SCENARIOS: ExampleScenario[] = [ |
There was a problem hiding this comment.
This adds a second hand-maintained example registry in the frontend while the backend registry also defines scenario IDs, files, categories, store items, visibility, and run metadata. The two sources can drift, producing UI entries that cannot run or backend examples with stale labels/topology in the UI. Please make the backend /examples response include the UI metadata, or generate both Rust and TS registries from a shared data source.
| pub async fn exec_code( | ||
| State(state): State<AppState>, | ||
| Path(sandbox_id): Path<String>, | ||
| Json(body): Json<ExecCodeRequest>, |
There was a problem hiding this comment.
ExecCodeRequest derives Validate, but this handler only extracts Json(body) and never calls body.validate() before executing the request. As a result, the documented code length constraint is not enforced and the API can accept payloads larger than the model contract suggests. Please validate here and map failures to a 400 response, or use a shared validated-JSON extractor.
There was a problem hiding this comment.
This PR adds the compiled examples/cube-bench/cube-bench binary alongside the Go source. Committing the build artifact substantially increases PR size, is platform/architecture-specific, and makes supply-chain review noisier. Please remove the binary and add an ignore rule for this executable or for example build outputs.
| exit_code, | ||
| success, | ||
| elapsed_ms, | ||
| steps: Vec::new(), |
There was a problem hiding this comment.
The run response now exposes steps, and the frontend adds a StepTimeline, but the service always returns steps: Vec::new() and the page does not render the timeline from backend data. This leaves a partially implemented API/UI surface that users cannot observe and future maintainers may assume is complete. Please either generate and render real step logs, or remove the unused field/component/comment until the feature is wired through.
| """ | ||
|
|
||
| with Sandbox.create(template=template_id) as sandbox: | ||
|
|
There was a problem hiding this comment.
This added blank line contains trailing whitespace, so git diff --check still fails for the PR. Please remove the stray spaces (and any other diff-check formatting issues) to avoid CI or review noise.
| </div> | ||
| ); | ||
| } | ||
|
|
There was a problem hiding this comment.
git diff --check also reports this as a new blank line at EOF. Please remove the extra trailing blank line so the PR is clean for whitespace checks.
488bea9 to
2f8f47f
Compare
| .env("CUBE_TEMPLATE_ID", &template_id) | ||
| .env("SSL_CERT_FILE", ssl_cert) | ||
| .env("AGENTHUB_SANDBOX_PROXY_URL", &self.sandbox_proxy_url) | ||
| .env("CUBE_API_ENVD_AUTH", &self.envd_auth) |
There was a problem hiding this comment.
Security: envd_auth credential leaked into example subprocess environment
The full envd_auth header value (credential for the sandbox proxy) is injected as the CUBE_API_ENVD_AUTH environment variable into every example subprocess. Example scripts (including user-edited code in req.code) can read os.environ["CUBE_API_ENVD_AUTH"] and authenticate as root to the sandbox proxy, bypassing API-level authorization.
Impact: Escalation from "can run an example" to "can execute arbitrary code inside any sandbox."
Suggestion: Do not pass envd_auth to subprocesses. If example scripts need proxy access, create a scoped time-limited token instead.
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
…SandboxCases Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
… prefix Signed-off-by: liciazhu <liciazhu@tencent.com>
…put, centralise credentials Signed-off-by: liciazhu <liciazhu@tencent.com>
…stry Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
…licy Signed-off-by: liciazhu <liciazhu@tencent.com>
POST /store/refresh previously spawned an unbounded number of concurrent blocking tasks — one docker-pull per catalog entry — which could saturate Docker daemon and network bandwidth when the catalog is large. Introduce a tokio Semaphore (MAX_CONCURRENT_PULLS = 3) so that at most 3 docker pulls run simultaneously. Each catalog image acquires a permit before entering the spawn_blocking call and releases it automatically on drop. Assisted-by: CodeBuddy:claude-sonnet-4-6 Signed-off-by: liciazhu <liciazhu@tencent.com>
…ement Improvement in scenario_registry();Consistent Auth + Rate Limiting on /examples/run;Remove Committed Binary from Repository;Whitespace Formatting Fixes Signed-off-by: liciazhu <liciazhu@tencent.com>
…xec-code Signed-off-by: liciazhu <liciazhu@tencent.com>
Signed-off-by: liciazhu <liciazhu@tencent.com>
…bprocess environment;resolve N+1 HTTP cascade in template resolution;resolve Hardcoded fallback API key;resolve SVG XSS via DOMPurify's SVG profile and topology layout;resolve async runtie Signed-off-by: liciazhu <liciazhu@tencent.com>
db07d7d to
b4457e7
Compare
| return ( | ||
| <div | ||
| className="jupyter-html-output max-h-[300px] overflow-auto rounded-md bg-white p-2 text-xs ring-1 ring-border/60 [&_table]:border-collapse [&_td]:border [&_td]:px-2 [&_td]:py-1 [&_th]:border [&_th]:px-2 [&_th]:py-1 [&_th]:bg-muted/40" | ||
| dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(result.html) }} |
There was a problem hiding this comment.
Security: DOMPurify default config allows SVG elements, enabling potential mXSS attack surface
DOMPurify.sanitize(result.html) uses the library's default configuration, which allows SVG tags (e.g. <svg>, <use>, <animate>). SVG's complex namespace model has been the source of multiple mutation XSS bypasses across many DOMPurify versions. Since this content originates from sandbox code execution output (potentially user-controlled), this is a defense-in-depth concern.
Suggestion: Restrict to safe HTML tags:
DOMPurify.sanitize(result.html, {
FORBID_TAGS: ['svg', 'use', 'animate', 'animateTransform', 'set'],
SANITIZE_NAMED_PROPS: true,
})| <div className="flex items-center gap-1.5 text-xs font-medium text-foreground/80"> | ||
| <Monitor size={12} /> | ||
| Sandbox desktop preview (noVNC) | ||
| </div> |
There was a problem hiding this comment.
i18n gap: Hardcoded English string — should use t() like the rest of this component
Every other user-facing string in this file uses t('...') from useTranslation('examples'), but this line renders "Sandbox desktop preview (noVNC)" as a raw JSX string. Also consider extracting the repeated isZh detection pattern ((i18n.language ?? 'en').toLowerCase().startsWith('zh') — used at least 3 times in this file) into a shared utility.
| let _ = tokio::fs::write(&stamp_file, &req_content).await; | ||
| true | ||
| } else { | ||
| tracing::warn!( |
There was a problem hiding this comment.
Silent failure: pip install errors are swallowed with true return
When pip3 install -r requirements.txt fails (non-zero exit or spawn error), ensure_requirements returns true and the example script runs anyway. The user will see a confusing ModuleNotFoundError that looks like their code's fault rather than a failed dependency install.
Suggestion: Log at error level instead of warn so operators have visibility. Alternatively, include a warning header in the run response when dependencies were not installed.
| pub cube_proxy_node_ip: Option<String>, | ||
|
|
||
| /// CubeProxy HTTP port (passed as CUBE_PROXY_PORT_HTTP). | ||
| /// Env var: CUBE_PROXY_PORT_HTTP (default "80") |
There was a problem hiding this comment.
Doc comment mismatch: claims default is "80", but default_cube_proxy_port_http() returns None
The doc comment on line 91 reads (default "80"), but the default function default_cube_proxy_port_http() returns Option<u16> and resolves to None when the env var is absent. There is no fallback to port 80 — when unset the variable is simply omitted from the subprocess environment.
Suggestion: Update the comment: (no default; omitted when unset)
| return ( | ||
| <div | ||
| className="jupyter-html-output max-h-[300px] overflow-auto rounded-md bg-white p-2 text-xs ring-1 ring-border/60 [&_table]:border-collapse [&_td]:border [&_td]:px-2 [&_td]:py-1 [&_th]:border [&_th]:px-2 [&_th]:py-1 [&_th]:bg-muted/40" | ||
| dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(result.html) }} |
There was a problem hiding this comment.
Security: DOMPurify default config allows SVG elements, enabling potential mXSS attack surface
DOMPurify.sanitize(result.html) uses the library's default configuration, which allows SVG tags (e.g. <svg>, <use>, <animate>). SVG's complex namespace model has been the source of multiple mutation XSS bypasses across many DOMPurify versions. Since this content originates from sandbox code execution output (potentially user-controlled), this is a defense-in-depth concern.
Suggestion: Restrict to safe HTML tags:
DOMPurify.sanitize(result.html, {
FORBID_TAGS: ['svg', 'use', 'animate', 'animateTransform', 'set'],
SANITIZE_NAMED_PROPS: true,
})| <div className="flex items-center gap-1.5 text-xs font-medium text-foreground/80"> | ||
| <Monitor size={12} /> | ||
| Sandbox desktop preview (noVNC) | ||
| </div> |
There was a problem hiding this comment.
i18n gap: Hardcoded English string — should use t() like the rest of this component
Every other user-facing string in this file uses t('...') from useTranslation('examples'), but this line renders "Sandbox desktop preview (noVNC)" as a raw JSX string. Also consider extracting the repeated isZh detection pattern ((i18n.language ?? 'en').toLowerCase().startsWith('zh') — used at least 3 times in this file) into a shared utility.
| let _ = tokio::fs::write(&stamp_file, &req_content).await; | ||
| true | ||
| } else { | ||
| tracing::warn!( |
There was a problem hiding this comment.
Silent failure: pip install errors are swallowed with true return
When pip3 install -r requirements.txt fails (non-zero exit or spawn error), ensure_requirements returns true and the example script runs anyway. The user will see a confusing ModuleNotFoundError that looks like their code's fault rather than a failed dependency install.
Suggestion: Log at error level instead of warn so operators have visibility. Alternatively, include a warning header in the run response when dependencies were not installed.
| pub cube_proxy_node_ip: Option<String>, | ||
|
|
||
| /// CubeProxy HTTP port (passed as CUBE_PROXY_PORT_HTTP). | ||
| /// Env var: CUBE_PROXY_PORT_HTTP (default "80") |
There was a problem hiding this comment.
Doc comment mismatch: claims default is "80", but default_cube_proxy_port_http() returns None
The doc comment on line 91 reads (default "80"), but the default function default_cube_proxy_port_http() returns Option<u16> and resolves to None when the env var is absent. There is no fallback to port 80 — when unset the variable is simply omitted from the subprocess environment.
Suggestion: Update the comment: (no default; omitted when unset)
Signed-off-by: liciazhu <liciazhu@tencent.com>
add cube web case center and mod web sandbox detail