fix(cloud): reserve executor env for user requests only by G4614 · Pull Request #682 · boxlite-ai/boxlite

G4614 · 2026-06-09T03:58:01Z

BOXLITE_EXECUTOR env setting
reject requests from API create/recover, API REST exec proxy, runner HTTP create/recover/exec, and v2 job payloads
accept requests from runner inside / Go SDK

Tests

yarn nx test api --testFile=reserved-env.util.spec.ts --testFile=box.service.box-id.spec.ts --testFile=boxlite-proxy.controller.spec.ts --runInBand
go test ./pkg/boxlite ./pkg/api/controllers ./pkg/runner/v2/executor
cargo build -p boxlite-c
bash scripts/build/build-guest.sh --profile debug --dest-dir /tmp/boxlite-pr682-runtime-debug
SKIP_GUEST_BUILD=1 bash scripts/build/build-runtime.sh --profile debug --dest-dir /tmp/boxlite-pr682-runtime-debug
BOXLITE_RUNTIME_DIR=/tmp/boxlite-pr682-runtime-debug go test -tags boxlite_dev ./pkg/boxlite -run TestIntegrationReservedExecutorEnvPolicyWithRealVM -count=1 -v

coderabbitai · 2026-06-09T03:58:10Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b153fa49-56ae-41d0-9fd0-c87d64f29959

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR centralizes shared E2E test utilities, switches the E2E runner to explicitly invoke per-SDK sdks/*/tests/e2e directories, updates pytest import behavior to avoid source-wheel shadowing, rewires SDK/CLI test entry imports to the centralized lib, and expands CI path filters to trigger on SDK e2e/source changes.

Changes

E2E Test Suite Infrastructure Reorganization

Layer / File(s)	Summary
E2E Helper Module Creation and Extraction `scripts/test/e2e/lib/e2e_helpers.py`, `scripts/test/e2e/pytest.ini`, `sdks/python/tests/e2e/conftest.py`	Creates shared utilities module with `collect_stream`, `drain`, and `stdout_line_count`; sets pytest `--import-mode=importlib`; removes local helper implementations from conftest and imports shared helpers from centralized lib.
Test Runner and Orchestration Refactoring `make/test.mk`, `scripts/test/e2e/run.sh`, `scripts/test/e2e/two_sided.sh`, `.github/workflows/e2e-stack.yml`, `scripts/test/e2e/README.md`	Updates `test:e2e` to call `run.sh`; `run.sh` runs `python3 -m pytest` with explicit per-SDK `tests/e2e` paths; two-sided harness uses `run.sh`; expands GitHub Actions path filters to include C/Go/Node/Python/CLI e2e and source paths; README updated for per-SDK layout and fixture setup.
Documentation and SDK Test Configuration `scripts/test/e2e/README.md`, `sdks/python/pytest.ini`	Rewrites suite docs to describe SDK-agnostic end-to-end path, updates fixture/profile instructions, and adds `norecursedirs = e2e` to SDK pytest config to avoid recursing into cross-SDK e2e tests.
C/Go/Node SDK E2E Test Rewiring `sdks/c/tests/e2e/`, `sdks/go/tests/e2e/`, `sdks/node/tests/e2e/*`, `src/cli/tests/e2e/test_cli_entry.py`	SDK and CLI e2e tests now reference local sibling `e2e_basic` sources, compute `REPO` consistently, and import path-verification helpers from centralized `scripts/test/e2e/lib`; Node smoke script uses local SDK build and improves cleanup handling.
Python E2E Test Helper Import Migration `sdks/python/tests/e2e/*`	All Python e2e tests updated to import `drain`, `collect_stream`, and `stdout_line_count` from `e2e_helpers` instead of local conftest; import wiring and minor formatting/whitespace changes applied without behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

boxlite-ai/boxlite#678: Introduces the cross-SDK E2E regression suite infrastructure that this PR reorganizes and refactors.

Poem

🐰 Helpers gathered, paths realigned,

Tests hop home to SDKs designed.
Shared code snug in one small burrow,
Runner calls each suite in a row.
Hooray — the e2e carrots grow!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	The PR title 'fix(cloud): reserve executor env for user requests only' does not match the actual changes. The PR reorganizes e2e tests to per-SDK folders, not cloud executor environment handling.	Update the title to accurately reflect the reorganization work, such as 'test(e2e): move test cases to per-SDK tests/ folders' as described in the PR objectives.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the existing local-FFI integration tests can't reach. All ship with `xfail(strict=True)` for the 10 known production bugs they pin: test_error_code_mapping.py 10 cases 4 xfails test_quota_enforcement.py 5 cases 5 xfails (module-level) test_runner_concurrency.py 6 cases 1 xfail -- -- 21 10 Each xfail's reason= string carries a `file:line` pointer to the bug location and a short root-cause note, so a follow-up fix-PR can grep for the right reason, drop the marker, and watch the test go green. `strict=True` means an unexpected xpass becomes a CI failure — i.e. the moment any of these bugs is silently fixed we'll notice. xfail coverage map (which PR removes which marker): test_exec_after_box_removed_is_typed_error → PR C (runner exec error mapping: IsNotFound → 404) test_invalid_argument_zero_cpu_returns_400 ) test_invalid_argument_negative_memory_returns_400 ) needs API test_oversized_cpu_returns_400 ) ValidationPipe 5x test_quota_enforcement.py ) + per-sandbox ) quota fix ) (separate ) follow-up) test_execution_invalid_command_returns_422 → needs Rust spawn-failed → ErrExecution mapping fix (separate follow-up; not in C or D) Split out of boxlite-ai#681 — this is **PR B of 4**: A (boxlite-ai#682) reorganize — merged before this rebases cleanly B (this) 21 new cases C runner exec error mapping fix D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn. Two related observable bugs: exec on a removed box → 500 "spawn_failed: build failed" exec on a stopped box → 500 (same shape) exec on box mid-rebuild → 500 (same shape) Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go` called `execManager.Start(...)` and unconditionally wrapped any non-nil error as 500. The SDK Start path already tunnels typed errors from the Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState) when the box state changed under the request — they just weren't being read. Fix: extend `classifyExecError` to peek for those SDK typed errors and return the canonical mapping from `src/shared/src/errors.rs:198-280`: IsNotFound → 404 Not Found IsStopped → 409 Conflict IsInvalidState → 409 Conflict (other / no SDK match) → 500 (unchanged) Also routes the `Start()` error site through `classifyExecError` instead of hard-coding 500. Flips the xfail on `test_exec_after_box_removed_is_typed_error` (sdks/python/tests/e2e/test_runner_concurrency.py) — that case now xpasses, which under strict=True trips the marker and fails CI; the fix here removes the marker so it lands green. Other related xfails in test_error_code_mapping.py are NOT addressed by this PR (they need DTO validation / quota / Rust spawn-failed reclassification — separate follow-ups). Split out of boxlite-ai#681 — this is **PR C of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (this) runner exec error mapping fix + flip 1 xfail D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the existing local-FFI integration tests can't reach. All ship with `xfail(strict=True)` for the 10 known production bugs they pin: test_error_code_mapping.py 10 cases 4 xfails test_quota_enforcement.py 5 cases 5 xfails (module-level) test_runner_concurrency.py 6 cases 1 xfail -- -- 21 10 Each xfail's reason= string carries a `file:line` pointer to the bug location and a short root-cause note, so a follow-up fix-PR can grep for the right reason, drop the marker, and watch the test go green. `strict=True` means an unexpected xpass becomes a CI failure — i.e. the moment any of these bugs is silently fixed we'll notice. xfail coverage map (which PR removes which marker): test_exec_after_box_removed_is_typed_error → PR C (runner exec error mapping: IsNotFound → 404) test_invalid_argument_zero_cpu_returns_400 ) test_invalid_argument_negative_memory_returns_400 ) needs API test_oversized_cpu_returns_400 ) ValidationPipe 5x test_quota_enforcement.py ) + per-sandbox ) quota fix ) (separate ) follow-up) test_execution_invalid_command_returns_422 → needs Rust spawn-failed → ErrExecution mapping fix (separate follow-up; not in C or D) Split out of boxlite-ai#681 — this is **PR B of 4**: A (boxlite-ai#682) reorganize — merged before this rebases cleanly B (this) 21 new cases C runner exec error mapping fix D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn. Two related observable bugs: exec on a removed box → 500 "spawn_failed: build failed" exec on a stopped box → 500 (same shape) exec on box mid-rebuild → 500 (same shape) Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go` called `execManager.Start(...)` and unconditionally wrapped any non-nil error as 500. The SDK Start path already tunnels typed errors from the Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState) when the box state changed under the request — they just weren't being read. Fix: extend `classifyExecError` to peek for those SDK typed errors and return the canonical mapping from `src/shared/src/errors.rs:198-280`: IsNotFound → 404 Not Found IsStopped → 409 Conflict IsInvalidState → 409 Conflict (other / no SDK match) → 500 (unchanged) Also routes the `Start()` error site through `classifyExecError` instead of hard-coding 500. Flips the xfail on `test_exec_after_box_removed_is_typed_error` (sdks/python/tests/e2e/test_runner_concurrency.py) — that case now xpasses, which under strict=True trips the marker and fails CI; the fix here removes the marker so it lands green. Other related xfails in test_error_code_mapping.py are NOT addressed by this PR (they need DTO validation / quota / Rust spawn-failed reclassification — separate follow-ups). Split out of boxlite-ai#681 — this is **PR C of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (this) runner exec error mapping fix + flip 1 xfail D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

scripts/test/e2e/lib/e2e_helpers.py (1)

20-39: ⚡ Quick win

Consider adding parameter type annotations for better IDE support and type safety.

The helper functions are well-implemented and correct, but adding explicit type annotations for the stream and ex parameters would improve the developer experience when importing and using these utilities. This is especially valuable for a shared module that will be imported across multiple test files.

📝 Suggested type annotations

+from typing import AsyncIterator, Any
+
-async def collect_stream(stream) -> str:
+async def collect_stream(stream: AsyncIterator[bytes] | AsyncIterator[str] | None) -> str:
+    """Collect and decode an async stream into a single string."""
     if stream is None:
         return ""
     chunks: list[str] = []
     async for ch in stream:
         chunks.append(
             ch.decode("utf-8", "replace") if isinstance(ch, bytes) else str(ch)
         )
     return "".join(chunks)


-async def drain(ex) -> tuple[str, str]:
+async def drain(ex: Any) -> tuple[str, str]:
     """Drain stdout + stderr concurrently — required for REST exec."""
     out_t = asyncio.create_task(collect_stream(ex.stdout()))
     err_t = asyncio.create_task(collect_stream(ex.stderr()))
     return await asyncio.gather(out_t, err_t)


 def stdout_line_count(s: str) -> int:
+    """Count non-empty lines in a string."""
     return len([ln for ln in s.splitlines() if ln])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/lib/e2e_helpers.py` around lines 20 - 39, Add explicit
parameter type annotations to improve IDE/type-checker support: annotate
collect_stream(stream) with an appropriate async-iterable type (e.g., stream:
Optional[AsyncIterable[Union[bytes, str]]]) and annotate drain(ex) with the
expected execution object type (e.g., ex: Any or a small Protocol with
stdout()/stderr() -> AsyncIterable[Union[bytes, str]]), and update the drain
signature to return Tuple[str, str] (or import Tuple) if not already recognized;
also add any necessary typing imports (Optional, AsyncIterable, Union, Any,
Tuple) at the top. Ensure you keep the existing return types and semantics of
collect_stream and drain while only adding these parameter type hints.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/test/e2e/lib/e2e_helpers.py`:
- Around line 20-39: Add explicit parameter type annotations to improve
IDE/type-checker support: annotate collect_stream(stream) with an appropriate
async-iterable type (e.g., stream: Optional[AsyncIterable[Union[bytes, str]]])
and annotate drain(ex) with the expected execution object type (e.g., ex: Any or
a small Protocol with stdout()/stderr() -> AsyncIterable[Union[bytes, str]]),
and update the drain signature to return Tuple[str, str] (or import Tuple) if
not already recognized; also add any necessary typing imports (Optional,
AsyncIterable, Union, Any, Tuple) at the top. Ensure you keep the existing
return types and semantics of collect_stream and drain while only adding these
parameter type hints.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ffa6a850-42c0-4fe7-a972-75621dd42e1d

📥 Commits

Reviewing files that changed from the base of the PR and between e4bbd0e and b5e6f30.

📒 Files selected for processing (27)

.github/workflows/e2e-stack.yml
make/test.mk
scripts/test/e2e/README.md
scripts/test/e2e/lib/e2e_helpers.py
scripts/test/e2e/pytest.ini
scripts/test/e2e/run.sh
scripts/test/e2e/two_sided.sh
sdks/c/tests/e2e/e2e_basic.c
sdks/c/tests/e2e/test_c_entry.py
sdks/go/tests/e2e/e2e_basic.go
sdks/go/tests/e2e/test_go_entry.py
sdks/node/tests/e2e/e2e_basic.ts
sdks/node/tests/e2e/test_node_entry.py
sdks/python/pytest.ini
sdks/python/tests/e2e/conftest.py
sdks/python/tests/e2e/test_box_management.py
sdks/python/tests/e2e/test_errors.py
sdks/python/tests/e2e/test_exec_options.py
sdks/python/tests/e2e/test_exec_timeout.py
sdks/python/tests/e2e/test_exec_unit_shape.py
sdks/python/tests/e2e/test_execution_shutdown.py
sdks/python/tests/e2e/test_lifecycle.py
sdks/python/tests/e2e/test_p0_6_exec_stdout_race.py
sdks/python/tests/e2e/test_path_verification.py
sdks/python/tests/e2e/test_resize_tty.py
sdks/python/tests/e2e/test_shutdown.py
src/cli/tests/e2e/test_cli_entry.py

…xlite-ai#682) Adds Python type hints to the two helper functions that were missing parameter annotations, and docstrings where they were missing too. Per CodeRabbit nitpick on PR boxlite-ai#682 — quality-of-life only, no behavior change. `from __future__ import annotations` already in the file means the `|` union syntax works on Python 3.10 too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

G4614 · 2026-06-09T10:30:25Z

Closing — abandoning the per-SDK relocation approach. New e2e cases will stay under scripts/test/e2e/cases/ matching main's layout. Stacked PRs #683/#684/#685 will be rebased to drop b5e6f30; #686-#696 are already independent from this commit.

21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the existing local-FFI integration tests can't reach. Cases shipped with xfail(strict=True) markers for the known production bugs they pin — a follow-up fix-PR flipping a bug closed will trip the strict marker into a CI failure, surfacing the fact that the xfail can now be dropped. Files added: scripts/test/e2e/cases/test_error_code_mapping.py scripts/test/e2e/cases/test_quota_enforcement.py scripts/test/e2e/cases/test_runner_concurrency.py Replaces an earlier split-out attempt that stacked on a reorganize PR (boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn: exec on a removed box -> 500 "spawn_failed: build failed" exec on a stopped box -> 500 (same shape) exec on box mid-rebuild -> 500 (same shape) Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a generic 500 without consulting the SDK's typed error helpers (IsNotFound, IsStopped, IsInvalidState). Other handlers in this file (signal/resize/kill/status) already classify these; POST /exec was the one that got missed. Fix: extend classifyExecError to recognise the SDK's typed errors and map them to 404 (NotFound), 409 (Stopped, InvalidState) per the canonical mapping at src/shared/src/errors.rs. Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes from 500 -> 4xx after this change. Re-validation: stack on top of boxlite-ai#683 (which adds test_runner_concurrency.py) for the exec-after-soft-deleted-box variant. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's box.Exec would occasionally return with stdout chunks still in flight — the user's callback got OnExit (or the wait gRPC reply) before the matching OnStdout/OnStderr chunks landed on the queue. From the caller's perspective, the exec had finished but its stdout was silently truncated. Root cause: execution_wait spawned an independent terminal task that pushed the Wait event as soon as wait_on_clone returned — with no drain barrier — so the wait reply could land in the event queue ahead of still-flushing stream pumps. Fix: make exit_pump the sole owner of terminal-event dispatch. Both execution_wait and register_exit fan into it; exit_pump awaits all stream pump receivers before pushing the terminal event. Queue order becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task. Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70% stdout-loss to 0%. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… pin known bugs) (#683) real e2e tests add-up ## What 21 new e2e cases under \`sdks/python/tests/e2e/\` pin contracts the existing local-FFI integration tests can't reach: | File | Cases | xfails | Topic | |---|---|---|---| | \`test_error_code_mapping.py\` | 10 | 4 | \`BoxliteError → HTTP\` mapping (\`src/shared/src/errors.rs:198-280\`) | | \`test_quota_enforcement.py\` | 5 | 5 | API create-boundary per-sandbox quota | | \`test_runner_concurrency.py\` | 6 | 1 | Cross-process state-machine races | All 10 xfails are \`strict=True\` — an unexpected xpass fails CI, so the moment any bug is silently fixed we notice. Each xfail's \`reason=\` carries a \`file:line\` pointer and short root-cause. ## xfail → fix PR map | xfail | Will flip in | |---|---| | \`test_exec_after_box_removed_is_typed_error\` | PR C (runner classifies \`IsNotFound → 404\`) | | \`test_invalid_argument_zero_cpu_returns_400\` | needs API \`ValidationPipe\` fix (separate follow-up, not in C/D) | | \`test_invalid_argument_negative_memory_returns_400\` | same | | \`test_oversized_cpu_returns_400\` | same + per-sandbox quota (separate follow-up) | | 5× \`test_quota_enforcement.py\` (module-level) | per-sandbox quota fix (separate follow-up) | | \`test_execution_invalid_command_returns_422\` | needs Rust \`spawn_failed → ErrExecution\` mapping (separate follow-up) | ## Stack | | what | this PR | |---|---|---| | A (#682) | reorganize only (rename + wiring) | ← prereq | | **B (this)** | **21 new e2e cases** | ← you are here | | C | runner exec error mapping fix + flip 1 xfail | depends on B | | D | FFI exec drain race fix (no PR-B xfails to flip) | depends on B | 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Tests** * Added end-to-end coverage validating API HTTP status ↔ error-code mappings, auth boundary behavior, unknown-resource and unregistered-image semantics, and state-transition error handling. * Added quota enforcement tests covering per-sandbox/org limits, boundary cases (including zero CPUs), and checks that quota rejections do not silently create resources. * Added concurrency/regression tests for parallel execs and creates, delete-during-exec behavior, repeated execs, and idempotent/typed stop semantics.  --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced Jun 9, 2026

test(e2e): add error-mapping, quota, runner-concurrency cases (xfails pin known bugs) #683

Merged

fix(runner): classify SDK errors at exec boundary (404/409 not 500) #684

Open

fix(ffi): drain stdout/stderr pumps before Wait terminal event #685

Closed

G4614 force-pushed the test/e2e-reorganize-only branch from c7cc31a to 24b3efa Compare June 9, 2026 04:06

G4614 force-pushed the test/e2e-reorganize-only branch from 24b3efa to b5e6f30 Compare June 9, 2026 04:47

G4614 mentioned this pull request Jun 9, 2026

fix(jailer): unblock the default sandbox (drop host-side Landlock + bind box/bin) #681

Merged

G4614 marked this pull request as ready for review June 9, 2026 05:54

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

G4614 marked this pull request as draft June 9, 2026 06:59

G4614 closed this Jun 9, 2026

G4614 mentioned this pull request Jun 9, 2026

fix(ffi): drain stdout/stderr pumps before Wait terminal event #705

Closed

This was referenced Jun 9, 2026

test(e2e): add cli-detach-recovery, exec-attach, volume-readonly cases #710

Merged

fix(ffi): isolate exec stream delivery and backpressure #714

Closed

G4614 reopened this Jun 22, 2026

G4614 force-pushed the test/e2e-reorganize-only branch from 23ed4b8 to 6809d65 Compare June 22, 2026 06:11

G4614 changed the title ~~test(e2e): move test cases to per-SDK tests/ folders~~ fix(cloud): reject reserved executor env Jun 22, 2026

G4614 force-pushed the test/e2e-reorganize-only branch from 6809d65 to eac9697 Compare June 22, 2026 07:22

G4614 force-pushed the test/e2e-reorganize-only branch 2 times, most recently from 3e6c9ec to 4c76529 Compare June 22, 2026 08:01

fix(cloud): reject reserved executor env

509617d

G4614 force-pushed the test/e2e-reorganize-only branch from 4c76529 to 509617d Compare June 22, 2026 08:34

G4614 changed the title ~~fix(cloud): reject reserved executor env~~ fix(cloud): reserve executor env for user requests only Jun 22, 2026

G4614 added 2 commits June 22, 2026 17:19

test(runner): cover reserved executor env in v2 jobs

e1c749f

test(rest): pin executor env reaches cloud boundary

9d2de78

G4614 marked this pull request as ready for review June 22, 2026 09:34

G4614 requested a review from a team as a code owner June 22, 2026 09:34

G4614 enabled auto-merge June 22, 2026 09:36

DorianZheng disabled auto-merge June 23, 2026 12:45

DorianZheng enabled auto-merge June 23, 2026 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cloud): reserve executor env for user requests only#682

fix(cloud): reserve executor env for user requests only#682
G4614 wants to merge 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize-only

G4614 commented Jun 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

G4614 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

G4614 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

G4614 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

G4614 commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading