Skip to content

fix(cloud): reserve executor env for user requests only#682

Open
G4614 wants to merge 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize-only
Open

fix(cloud): reserve executor env for user requests only#682
G4614 wants to merge 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize-only

Conversation

@G4614

@G4614 G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

BOXLITE_EXECUTOR env setting
reject requests from API create/recover, API REST exec proxy, runner HTTP create/recover/exec, and v2 job payloads
accept requests from runner inside / Go SDK

Tests

  • yarn nx test api --testFile=reserved-env.util.spec.ts --testFile=box.service.box-id.spec.ts --testFile=boxlite-proxy.controller.spec.ts --runInBand
  • go test ./pkg/boxlite ./pkg/api/controllers ./pkg/runner/v2/executor
  • cargo build -p boxlite-c
  • bash scripts/build/build-guest.sh --profile debug --dest-dir /tmp/boxlite-pr682-runtime-debug
  • SKIP_GUEST_BUILD=1 bash scripts/build/build-runtime.sh --profile debug --dest-dir /tmp/boxlite-pr682-runtime-debug
  • BOXLITE_RUNTIME_DIR=/tmp/boxlite-pr682-runtime-debug go test -tags boxlite_dev ./pkg/boxlite -run TestIntegrationReservedExecutorEnvPolicyWithRealVM -count=1 -v

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b153fa49-56ae-41d0-9fd0-c87d64f29959

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR centralizes shared E2E test utilities, switches the E2E runner to explicitly invoke per-SDK sdks/*/tests/e2e directories, updates pytest import behavior to avoid source-wheel shadowing, rewires SDK/CLI test entry imports to the centralized lib, and expands CI path filters to trigger on SDK e2e/source changes.

Changes

E2E Test Suite Infrastructure Reorganization

Layer / File(s) Summary
E2E Helper Module Creation and Extraction
scripts/test/e2e/lib/e2e_helpers.py, scripts/test/e2e/pytest.ini, sdks/python/tests/e2e/conftest.py
Creates shared utilities module with collect_stream, drain, and stdout_line_count; sets pytest --import-mode=importlib; removes local helper implementations from conftest and imports shared helpers from centralized lib.
Test Runner and Orchestration Refactoring
make/test.mk, scripts/test/e2e/run.sh, scripts/test/e2e/two_sided.sh, .github/workflows/e2e-stack.yml, scripts/test/e2e/README.md
Updates test:e2e to call run.sh; run.sh runs python3 -m pytest with explicit per-SDK tests/e2e paths; two-sided harness uses run.sh; expands GitHub Actions path filters to include C/Go/Node/Python/CLI e2e and source paths; README updated for per-SDK layout and fixture setup.
Documentation and SDK Test Configuration
scripts/test/e2e/README.md, sdks/python/pytest.ini
Rewrites suite docs to describe SDK-agnostic end-to-end path, updates fixture/profile instructions, and adds norecursedirs = e2e to SDK pytest config to avoid recursing into cross-SDK e2e tests.
C/Go/Node SDK E2E Test Rewiring
sdks/c/tests/e2e/*, sdks/go/tests/e2e/*, sdks/node/tests/e2e/*, src/cli/tests/e2e/test_cli_entry.py
SDK and CLI e2e tests now reference local sibling e2e_basic sources, compute REPO consistently, and import path-verification helpers from centralized scripts/test/e2e/lib; Node smoke script uses local SDK build and improves cleanup handling.
Python E2E Test Helper Import Migration
sdks/python/tests/e2e/*
All Python e2e tests updated to import drain, collect_stream, and stdout_line_count from e2e_helpers instead of local conftest; import wiring and minor formatting/whitespace changes applied without behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • boxlite-ai/boxlite#678: Introduces the cross-SDK E2E regression suite infrastructure that this PR reorganizes and refactors.

Poem

🐰 Helpers gathered, paths realigned,

Tests hop home to SDKs designed.
Shared code snug in one small burrow,
Runner calls each suite in a row.
Hooray — the e2e carrots grow!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ⚠️ Warning The PR title 'fix(cloud): reserve executor env for user requests only' does not match the actual changes. The PR reorganizes e2e tests to per-SDK folders, not cloud executor environment handling. Update the title to accurately reflect the reorganization work, such as 'test(e2e): move test cases to per-SDK tests/ folders' as described in the PR objectives.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from c7cc31a to 24b3efa Compare June 9, 2026 04:06
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:

  test_error_code_mapping.py  10 cases  4 xfails
  test_quota_enforcement.py    5 cases  5 xfails (module-level)
  test_runner_concurrency.py   6 cases  1 xfail
                              --       --
                              21       10

Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.

xfail coverage map (which PR removes which marker):

  test_exec_after_box_removed_is_typed_error
    → PR C (runner exec error mapping: IsNotFound → 404)

  test_invalid_argument_zero_cpu_returns_400          )
  test_invalid_argument_negative_memory_returns_400   ) needs API
  test_oversized_cpu_returns_400                      ) ValidationPipe
  5x test_quota_enforcement.py                        ) + per-sandbox
                                                       ) quota fix
                                                       ) (separate
                                                       ) follow-up)

  test_execution_invalid_command_returns_422
    → needs Rust spawn-failed → ErrExecution mapping fix
      (separate follow-up; not in C or D)

Split out of boxlite-ai#681 — this is **PR B of 4**:
  A (boxlite-ai#682) reorganize     — merged before this rebases cleanly
  B (this)  21 new cases
  C         runner exec error mapping fix
  D         FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:

  exec on a removed box   → 500 "spawn_failed: build failed"
  exec on a stopped box   → 500 (same shape)
  exec on box mid-rebuild → 500 (same shape)

Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.

Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:

  IsNotFound       → 404 Not Found
  IsStopped        → 409 Conflict
  IsInvalidState   → 409 Conflict
  (other / no SDK match) → 500 (unchanged)

Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.

Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.

Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).

Split out of boxlite-ai#681 — this is **PR C of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (this) runner exec error mapping fix + flip 1 xfail
  D        FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.

Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.

Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.

Deterministic ordering tests added inline in execution.rs cover:

  - stdout-only exec: all chunks land before Exit
  - mixed stdout/stderr: every chunk lands before Wait
  - process exit before pump completion: terminal event waits
  - pump completion before process exit: terminal event fires
    immediately (no spurious wait)

Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.

Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.

Split out of boxlite-ai#681 — this is **PR D of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
  D (this) FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from 24b3efa to b5e6f30 Compare June 9, 2026 04:47
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:

  test_error_code_mapping.py  10 cases  4 xfails
  test_quota_enforcement.py    5 cases  5 xfails (module-level)
  test_runner_concurrency.py   6 cases  1 xfail
                              --       --
                              21       10

Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.

xfail coverage map (which PR removes which marker):

  test_exec_after_box_removed_is_typed_error
    → PR C (runner exec error mapping: IsNotFound → 404)

  test_invalid_argument_zero_cpu_returns_400          )
  test_invalid_argument_negative_memory_returns_400   ) needs API
  test_oversized_cpu_returns_400                      ) ValidationPipe
  5x test_quota_enforcement.py                        ) + per-sandbox
                                                       ) quota fix
                                                       ) (separate
                                                       ) follow-up)

  test_execution_invalid_command_returns_422
    → needs Rust spawn-failed → ErrExecution mapping fix
      (separate follow-up; not in C or D)

Split out of boxlite-ai#681 — this is **PR B of 4**:
  A (boxlite-ai#682) reorganize     — merged before this rebases cleanly
  B (this)  21 new cases
  C         runner exec error mapping fix
  D         FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:

  exec on a removed box   → 500 "spawn_failed: build failed"
  exec on a stopped box   → 500 (same shape)
  exec on box mid-rebuild → 500 (same shape)

Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.

Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:

  IsNotFound       → 404 Not Found
  IsStopped        → 409 Conflict
  IsInvalidState   → 409 Conflict
  (other / no SDK match) → 500 (unchanged)

Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.

Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.

Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).

Split out of boxlite-ai#681 — this is **PR C of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (this) runner exec error mapping fix + flip 1 xfail
  D        FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.

Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.

Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.

Deterministic ordering tests added inline in execution.rs cover:

  - stdout-only exec: all chunks land before Exit
  - mixed stdout/stderr: every chunk lands before Wait
  - process exit before pump completion: terminal event waits
  - pump completion before process exit: terminal event fires
    immediately (no spurious wait)

Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.

Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.

Split out of boxlite-ai#681 — this is **PR D of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
  D (this) FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as ready for review June 9, 2026 05:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/test/e2e/lib/e2e_helpers.py (1)

20-39: ⚡ Quick win

Consider adding parameter type annotations for better IDE support and type safety.

The helper functions are well-implemented and correct, but adding explicit type annotations for the stream and ex parameters would improve the developer experience when importing and using these utilities. This is especially valuable for a shared module that will be imported across multiple test files.

📝 Suggested type annotations
+from typing import AsyncIterator, Any
+
-async def collect_stream(stream) -> str:
+async def collect_stream(stream: AsyncIterator[bytes] | AsyncIterator[str] | None) -> str:
+    """Collect and decode an async stream into a single string."""
     if stream is None:
         return ""
     chunks: list[str] = []
     async for ch in stream:
         chunks.append(
             ch.decode("utf-8", "replace") if isinstance(ch, bytes) else str(ch)
         )
     return "".join(chunks)


-async def drain(ex) -> tuple[str, str]:
+async def drain(ex: Any) -> tuple[str, str]:
     """Drain stdout + stderr concurrently — required for REST exec."""
     out_t = asyncio.create_task(collect_stream(ex.stdout()))
     err_t = asyncio.create_task(collect_stream(ex.stderr()))
     return await asyncio.gather(out_t, err_t)


 def stdout_line_count(s: str) -> int:
+    """Count non-empty lines in a string."""
     return len([ln for ln in s.splitlines() if ln])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/lib/e2e_helpers.py` around lines 20 - 39, Add explicit
parameter type annotations to improve IDE/type-checker support: annotate
collect_stream(stream) with an appropriate async-iterable type (e.g., stream:
Optional[AsyncIterable[Union[bytes, str]]]) and annotate drain(ex) with the
expected execution object type (e.g., ex: Any or a small Protocol with
stdout()/stderr() -> AsyncIterable[Union[bytes, str]]), and update the drain
signature to return Tuple[str, str] (or import Tuple) if not already recognized;
also add any necessary typing imports (Optional, AsyncIterable, Union, Any,
Tuple) at the top. Ensure you keep the existing return types and semantics of
collect_stream and drain while only adding these parameter type hints.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/test/e2e/lib/e2e_helpers.py`:
- Around line 20-39: Add explicit parameter type annotations to improve
IDE/type-checker support: annotate collect_stream(stream) with an appropriate
async-iterable type (e.g., stream: Optional[AsyncIterable[Union[bytes, str]]])
and annotate drain(ex) with the expected execution object type (e.g., ex: Any or
a small Protocol with stdout()/stderr() -> AsyncIterable[Union[bytes, str]]),
and update the drain signature to return Tuple[str, str] (or import Tuple) if
not already recognized; also add any necessary typing imports (Optional,
AsyncIterable, Union, Any, Tuple) at the top. Ensure you keep the existing
return types and semantics of collect_stream and drain while only adding these
parameter type hints.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ffa6a850-42c0-4fe7-a972-75621dd42e1d

📥 Commits

Reviewing files that changed from the base of the PR and between e4bbd0e and b5e6f30.

📒 Files selected for processing (27)
  • .github/workflows/e2e-stack.yml
  • make/test.mk
  • scripts/test/e2e/README.md
  • scripts/test/e2e/lib/e2e_helpers.py
  • scripts/test/e2e/pytest.ini
  • scripts/test/e2e/run.sh
  • scripts/test/e2e/two_sided.sh
  • sdks/c/tests/e2e/e2e_basic.c
  • sdks/c/tests/e2e/test_c_entry.py
  • sdks/go/tests/e2e/e2e_basic.go
  • sdks/go/tests/e2e/test_go_entry.py
  • sdks/node/tests/e2e/e2e_basic.ts
  • sdks/node/tests/e2e/test_node_entry.py
  • sdks/python/pytest.ini
  • sdks/python/tests/e2e/conftest.py
  • sdks/python/tests/e2e/test_box_management.py
  • sdks/python/tests/e2e/test_errors.py
  • sdks/python/tests/e2e/test_exec_options.py
  • sdks/python/tests/e2e/test_exec_timeout.py
  • sdks/python/tests/e2e/test_exec_unit_shape.py
  • sdks/python/tests/e2e/test_execution_shutdown.py
  • sdks/python/tests/e2e/test_lifecycle.py
  • sdks/python/tests/e2e/test_p0_6_exec_stdout_race.py
  • sdks/python/tests/e2e/test_path_verification.py
  • sdks/python/tests/e2e/test_resize_tty.py
  • sdks/python/tests/e2e/test_shutdown.py
  • src/cli/tests/e2e/test_cli_entry.py

G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
…xlite-ai#682)

Adds Python type hints to the two helper functions that were missing
parameter annotations, and docstrings where they were missing too.

Per CodeRabbit nitpick on PR boxlite-ai#682 — quality-of-life only, no behavior
change. `from __future__ import annotations` already in the file means
the `|` union syntax works on Python 3.10 too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as draft June 9, 2026 06:59
@G4614

G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Closing — abandoning the per-SDK relocation approach. New e2e cases will stay under scripts/test/e2e/cases/ matching main's layout. Stacked PRs #683/#684/#685 will be rebased to drop b5e6f30; #686-#696 are already independent from this commit.

@G4614 G4614 closed this Jun 9, 2026
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the
existing local-FFI integration tests can't reach. Cases shipped with
xfail(strict=True) markers for the known production bugs they pin —
a follow-up fix-PR flipping a bug closed will trip the strict marker
into a CI failure, surfacing the fact that the xfail can now be dropped.

Files added:
  scripts/test/e2e/cases/test_error_code_mapping.py
  scripts/test/e2e/cases/test_quota_enforcement.py
  scripts/test/e2e/cases/test_runner_concurrency.py

Replaces an earlier split-out attempt that stacked on a reorganize PR
(boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land
under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn:

  exec on a removed box   -> 500 "spawn_failed: build failed"
  exec on a stopped box   -> 500 (same shape)
  exec on box mid-rebuild -> 500 (same shape)

Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a
generic 500 without consulting the SDK's typed error helpers
(IsNotFound, IsStopped, IsInvalidState). Other handlers in this file
(signal/resize/kill/status) already classify these; POST /exec was
the one that got missed.

Fix: extend classifyExecError to recognise the SDK's typed errors and
map them to 404 (NotFound), 409 (Stopped, InvalidState) per the
canonical mapping at src/shared/src/errors.rs.

Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes
from 500 -> 4xx after this change. Re-validation: stack on top of
boxlite-ai#683 (which adds test_runner_concurrency.py) for the
exec-after-soft-deleted-box variant.

Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's box.Exec would occasionally return with stdout chunks
still in flight — the user's callback got OnExit (or the wait gRPC
reply) before the matching OnStdout/OnStderr chunks landed on the
queue. From the caller's perspective, the exec had finished but its
stdout was silently truncated.

Root cause: execution_wait spawned an independent terminal task that
pushed the Wait event as soon as wait_on_clone returned — with no
drain barrier — so the wait reply could land in the event queue
ahead of still-flushing stream pumps.

Fix: make exit_pump the sole owner of terminal-event dispatch. Both
execution_wait and register_exit fan into it; exit_pump awaits all
stream pump receivers before pushing the terminal event. Queue order
becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task.

Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70%
stdout-loss to 0%.

Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DorianZheng pushed a commit that referenced this pull request Jun 9, 2026
… pin known bugs) (#683)

real e2e tests add-up

## What

21 new e2e cases under \`sdks/python/tests/e2e/\` pin contracts the
existing local-FFI integration tests can't reach:

| File | Cases | xfails | Topic |
|---|---|---|---|
| \`test_error_code_mapping.py\` | 10 | 4 | \`BoxliteError → HTTP\`
mapping (\`src/shared/src/errors.rs:198-280\`) |
| \`test_quota_enforcement.py\` | 5 | 5 | API create-boundary
per-sandbox quota |
| \`test_runner_concurrency.py\` | 6 | 1 | Cross-process state-machine
races |

All 10 xfails are \`strict=True\` — an unexpected xpass fails CI, so the
moment any bug is silently fixed we notice. Each xfail's \`reason=\`
carries a \`file:line\` pointer and short root-cause.

## xfail → fix PR map

| xfail | Will flip in |
|---|---|
| \`test_exec_after_box_removed_is_typed_error\` | PR C (runner
classifies \`IsNotFound → 404\`) |
| \`test_invalid_argument_zero_cpu_returns_400\` | needs API
\`ValidationPipe\` fix (separate follow-up, not in C/D) |
| \`test_invalid_argument_negative_memory_returns_400\` | same |
| \`test_oversized_cpu_returns_400\` | same + per-sandbox quota
(separate follow-up) |
| 5× \`test_quota_enforcement.py\` (module-level) | per-sandbox quota
fix (separate follow-up) |
| \`test_execution_invalid_command_returns_422\` | needs Rust
\`spawn_failed → ErrExecution\` mapping (separate follow-up) |

## Stack

| | what | this PR |
|---|---|---|
| A (#682) | reorganize only (rename + wiring) | ← prereq |
| **B (this)** | **21 new e2e cases** | ← you are here |
| C | runner exec error mapping fix + flip 1 xfail | depends on B |
| D | FFI exec drain race fix (no PR-B xfails to flip) | depends on B |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added end-to-end coverage validating API HTTP status ↔ error-code
mappings, auth boundary behavior, unknown-resource and
unregistered-image semantics, and state-transition error handling.
* Added quota enforcement tests covering per-sandbox/org limits,
boundary cases (including zero CPUs), and checks that quota rejections
do not silently create resources.
* Added concurrency/regression tests for parallel execs and creates,
delete-during-exec behavior, repeated execs, and idempotent/typed stop
semantics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 reopened this Jun 22, 2026
@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from 23ed4b8 to 6809d65 Compare June 22, 2026 06:11
@G4614 G4614 changed the title test(e2e): move test cases to per-SDK tests/ folders fix(cloud): reject reserved executor env Jun 22, 2026
@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from 6809d65 to eac9697 Compare June 22, 2026 07:22
@G4614 G4614 force-pushed the test/e2e-reorganize-only branch 2 times, most recently from 3e6c9ec to 4c76529 Compare June 22, 2026 08:01
@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from 4c76529 to 509617d Compare June 22, 2026 08:34
@G4614 G4614 changed the title fix(cloud): reject reserved executor env fix(cloud): reserve executor env for user requests only Jun 22, 2026
@G4614 G4614 marked this pull request as ready for review June 22, 2026 09:34
@G4614 G4614 requested a review from a team as a code owner June 22, 2026 09:34
@G4614 G4614 enabled auto-merge June 22, 2026 09:36
@DorianZheng DorianZheng disabled auto-merge June 23, 2026 12:45
@DorianZheng DorianZheng enabled auto-merge June 23, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant