fix(cloud): reserve executor env for user requests only#682
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR centralizes shared E2E test utilities, switches the E2E runner to explicitly invoke per-SDK ChangesE2E Test Suite Infrastructure Reorganization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c7cc31a to
24b3efa
Compare
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:
test_error_code_mapping.py 10 cases 4 xfails
test_quota_enforcement.py 5 cases 5 xfails (module-level)
test_runner_concurrency.py 6 cases 1 xfail
-- --
21 10
Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.
xfail coverage map (which PR removes which marker):
test_exec_after_box_removed_is_typed_error
→ PR C (runner exec error mapping: IsNotFound → 404)
test_invalid_argument_zero_cpu_returns_400 )
test_invalid_argument_negative_memory_returns_400 ) needs API
test_oversized_cpu_returns_400 ) ValidationPipe
5x test_quota_enforcement.py ) + per-sandbox
) quota fix
) (separate
) follow-up)
test_execution_invalid_command_returns_422
→ needs Rust spawn-failed → ErrExecution mapping fix
(separate follow-up; not in C or D)
Split out of boxlite-ai#681 — this is **PR B of 4**:
A (boxlite-ai#682) reorganize — merged before this rebases cleanly
B (this) 21 new cases
C runner exec error mapping fix
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:
exec on a removed box → 500 "spawn_failed: build failed"
exec on a stopped box → 500 (same shape)
exec on box mid-rebuild → 500 (same shape)
Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.
Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:
IsNotFound → 404 Not Found
IsStopped → 409 Conflict
IsInvalidState → 409 Conflict
(other / no SDK match) → 500 (unchanged)
Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.
Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.
Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).
Split out of boxlite-ai#681 — this is **PR C of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (this) runner exec error mapping fix + flip 1 xfail
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.
Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.
Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.
Deterministic ordering tests added inline in execution.rs cover:
- stdout-only exec: all chunks land before Exit
- mixed stdout/stderr: every chunk lands before Wait
- process exit before pump completion: terminal event waits
- pump completion before process exit: terminal event fires
immediately (no spurious wait)
Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.
Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.
Split out of boxlite-ai#681 — this is **PR D of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
D (this) FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24b3efa to
b5e6f30
Compare
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:
test_error_code_mapping.py 10 cases 4 xfails
test_quota_enforcement.py 5 cases 5 xfails (module-level)
test_runner_concurrency.py 6 cases 1 xfail
-- --
21 10
Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.
xfail coverage map (which PR removes which marker):
test_exec_after_box_removed_is_typed_error
→ PR C (runner exec error mapping: IsNotFound → 404)
test_invalid_argument_zero_cpu_returns_400 )
test_invalid_argument_negative_memory_returns_400 ) needs API
test_oversized_cpu_returns_400 ) ValidationPipe
5x test_quota_enforcement.py ) + per-sandbox
) quota fix
) (separate
) follow-up)
test_execution_invalid_command_returns_422
→ needs Rust spawn-failed → ErrExecution mapping fix
(separate follow-up; not in C or D)
Split out of boxlite-ai#681 — this is **PR B of 4**:
A (boxlite-ai#682) reorganize — merged before this rebases cleanly
B (this) 21 new cases
C runner exec error mapping fix
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:
exec on a removed box → 500 "spawn_failed: build failed"
exec on a stopped box → 500 (same shape)
exec on box mid-rebuild → 500 (same shape)
Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.
Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:
IsNotFound → 404 Not Found
IsStopped → 409 Conflict
IsInvalidState → 409 Conflict
(other / no SDK match) → 500 (unchanged)
Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.
Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.
Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).
Split out of boxlite-ai#681 — this is **PR C of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (this) runner exec error mapping fix + flip 1 xfail
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.
Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.
Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.
Deterministic ordering tests added inline in execution.rs cover:
- stdout-only exec: all chunks land before Exit
- mixed stdout/stderr: every chunk lands before Wait
- process exit before pump completion: terminal event waits
- pump completion before process exit: terminal event fires
immediately (no spurious wait)
Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.
Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.
Split out of boxlite-ai#681 — this is **PR D of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
D (this) FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/test/e2e/lib/e2e_helpers.py (1)
20-39: ⚡ Quick winConsider adding parameter type annotations for better IDE support and type safety.
The helper functions are well-implemented and correct, but adding explicit type annotations for the
streamandexparameters would improve the developer experience when importing and using these utilities. This is especially valuable for a shared module that will be imported across multiple test files.📝 Suggested type annotations
+from typing import AsyncIterator, Any + -async def collect_stream(stream) -> str: +async def collect_stream(stream: AsyncIterator[bytes] | AsyncIterator[str] | None) -> str: + """Collect and decode an async stream into a single string.""" if stream is None: return "" chunks: list[str] = [] async for ch in stream: chunks.append( ch.decode("utf-8", "replace") if isinstance(ch, bytes) else str(ch) ) return "".join(chunks) -async def drain(ex) -> tuple[str, str]: +async def drain(ex: Any) -> tuple[str, str]: """Drain stdout + stderr concurrently — required for REST exec.""" out_t = asyncio.create_task(collect_stream(ex.stdout())) err_t = asyncio.create_task(collect_stream(ex.stderr())) return await asyncio.gather(out_t, err_t) def stdout_line_count(s: str) -> int: + """Count non-empty lines in a string.""" return len([ln for ln in s.splitlines() if ln])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/test/e2e/lib/e2e_helpers.py` around lines 20 - 39, Add explicit parameter type annotations to improve IDE/type-checker support: annotate collect_stream(stream) with an appropriate async-iterable type (e.g., stream: Optional[AsyncIterable[Union[bytes, str]]]) and annotate drain(ex) with the expected execution object type (e.g., ex: Any or a small Protocol with stdout()/stderr() -> AsyncIterable[Union[bytes, str]]), and update the drain signature to return Tuple[str, str] (or import Tuple) if not already recognized; also add any necessary typing imports (Optional, AsyncIterable, Union, Any, Tuple) at the top. Ensure you keep the existing return types and semantics of collect_stream and drain while only adding these parameter type hints.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@scripts/test/e2e/lib/e2e_helpers.py`:
- Around line 20-39: Add explicit parameter type annotations to improve
IDE/type-checker support: annotate collect_stream(stream) with an appropriate
async-iterable type (e.g., stream: Optional[AsyncIterable[Union[bytes, str]]])
and annotate drain(ex) with the expected execution object type (e.g., ex: Any or
a small Protocol with stdout()/stderr() -> AsyncIterable[Union[bytes, str]]),
and update the drain signature to return Tuple[str, str] (or import Tuple) if
not already recognized; also add any necessary typing imports (Optional,
AsyncIterable, Union, Any, Tuple) at the top. Ensure you keep the existing
return types and semantics of collect_stream and drain while only adding these
parameter type hints.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: ffa6a850-42c0-4fe7-a972-75621dd42e1d
📒 Files selected for processing (27)
.github/workflows/e2e-stack.ymlmake/test.mkscripts/test/e2e/README.mdscripts/test/e2e/lib/e2e_helpers.pyscripts/test/e2e/pytest.iniscripts/test/e2e/run.shscripts/test/e2e/two_sided.shsdks/c/tests/e2e/e2e_basic.csdks/c/tests/e2e/test_c_entry.pysdks/go/tests/e2e/e2e_basic.gosdks/go/tests/e2e/test_go_entry.pysdks/node/tests/e2e/e2e_basic.tssdks/node/tests/e2e/test_node_entry.pysdks/python/pytest.inisdks/python/tests/e2e/conftest.pysdks/python/tests/e2e/test_box_management.pysdks/python/tests/e2e/test_errors.pysdks/python/tests/e2e/test_exec_options.pysdks/python/tests/e2e/test_exec_timeout.pysdks/python/tests/e2e/test_exec_unit_shape.pysdks/python/tests/e2e/test_execution_shutdown.pysdks/python/tests/e2e/test_lifecycle.pysdks/python/tests/e2e/test_p0_6_exec_stdout_race.pysdks/python/tests/e2e/test_path_verification.pysdks/python/tests/e2e/test_resize_tty.pysdks/python/tests/e2e/test_shutdown.pysrc/cli/tests/e2e/test_cli_entry.py
…xlite-ai#682) Adds Python type hints to the two helper functions that were missing parameter annotations, and docstrings where they were missing too. Per CodeRabbit nitpick on PR boxlite-ai#682 — quality-of-life only, no behavior change. `from __future__ import annotations` already in the file means the `|` union syntax works on Python 3.10 too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the existing local-FFI integration tests can't reach. Cases shipped with xfail(strict=True) markers for the known production bugs they pin — a follow-up fix-PR flipping a bug closed will trip the strict marker into a CI failure, surfacing the fact that the xfail can now be dropped. Files added: scripts/test/e2e/cases/test_error_code_mapping.py scripts/test/e2e/cases/test_quota_enforcement.py scripts/test/e2e/cases/test_runner_concurrency.py Replaces an earlier split-out attempt that stacked on a reorganize PR (boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn:
exec on a removed box -> 500 "spawn_failed: build failed"
exec on a stopped box -> 500 (same shape)
exec on box mid-rebuild -> 500 (same shape)
Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a
generic 500 without consulting the SDK's typed error helpers
(IsNotFound, IsStopped, IsInvalidState). Other handlers in this file
(signal/resize/kill/status) already classify these; POST /exec was
the one that got missed.
Fix: extend classifyExecError to recognise the SDK's typed errors and
map them to 404 (NotFound), 409 (Stopped, InvalidState) per the
canonical mapping at src/shared/src/errors.rs.
Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes
from 500 -> 4xx after this change. Re-validation: stack on top of
boxlite-ai#683 (which adds test_runner_concurrency.py) for the
exec-after-soft-deleted-box variant.
Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's box.Exec would occasionally return with stdout chunks still in flight — the user's callback got OnExit (or the wait gRPC reply) before the matching OnStdout/OnStderr chunks landed on the queue. From the caller's perspective, the exec had finished but its stdout was silently truncated. Root cause: execution_wait spawned an independent terminal task that pushed the Wait event as soon as wait_on_clone returned — with no drain barrier — so the wait reply could land in the event queue ahead of still-flushing stream pumps. Fix: make exit_pump the sole owner of terminal-event dispatch. Both execution_wait and register_exit fan into it; exit_pump awaits all stream pump receivers before pushing the terminal event. Queue order becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task. Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70% stdout-loss to 0%. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… pin known bugs) (#683) real e2e tests add-up ## What 21 new e2e cases under \`sdks/python/tests/e2e/\` pin contracts the existing local-FFI integration tests can't reach: | File | Cases | xfails | Topic | |---|---|---|---| | \`test_error_code_mapping.py\` | 10 | 4 | \`BoxliteError → HTTP\` mapping (\`src/shared/src/errors.rs:198-280\`) | | \`test_quota_enforcement.py\` | 5 | 5 | API create-boundary per-sandbox quota | | \`test_runner_concurrency.py\` | 6 | 1 | Cross-process state-machine races | All 10 xfails are \`strict=True\` — an unexpected xpass fails CI, so the moment any bug is silently fixed we notice. Each xfail's \`reason=\` carries a \`file:line\` pointer and short root-cause. ## xfail → fix PR map | xfail | Will flip in | |---|---| | \`test_exec_after_box_removed_is_typed_error\` | PR C (runner classifies \`IsNotFound → 404\`) | | \`test_invalid_argument_zero_cpu_returns_400\` | needs API \`ValidationPipe\` fix (separate follow-up, not in C/D) | | \`test_invalid_argument_negative_memory_returns_400\` | same | | \`test_oversized_cpu_returns_400\` | same + per-sandbox quota (separate follow-up) | | 5× \`test_quota_enforcement.py\` (module-level) | per-sandbox quota fix (separate follow-up) | | \`test_execution_invalid_command_returns_422\` | needs Rust \`spawn_failed → ErrExecution\` mapping (separate follow-up) | ## Stack | | what | this PR | |---|---|---| | A (#682) | reorganize only (rename + wiring) | ← prereq | | **B (this)** | **21 new e2e cases** | ← you are here | | C | runner exec error mapping fix + flip 1 xfail | depends on B | | D | FFI exec drain race fix (no PR-B xfails to flip) | depends on B | 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added end-to-end coverage validating API HTTP status ↔ error-code mappings, auth boundary behavior, unknown-resource and unregistered-image semantics, and state-transition error handling. * Added quota enforcement tests covering per-sandbox/org limits, boundary cases (including zero CPUs), and checks that quota rejections do not silently create resources. * Added concurrency/regression tests for parallel execs and creates, delete-during-exec behavior, repeated execs, and idempotent/typed stop semantics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
23ed4b8 to
6809d65
Compare
6809d65 to
eac9697
Compare
3e6c9ec to
4c76529
Compare
4c76529 to
509617d
Compare
BOXLITE_EXECUTOR env setting
reject requests from API create/recover, API REST exec proxy, runner HTTP create/recover/exec, and v2 job payloads
accept requests from runner inside / Go SDK
Tests