fix(jailer): unblock the default sandbox (drop host-side Landlock + bind box/bin)#681
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughLandlock enforcement is refactored from an in-process ChangesLandlock LD_PRELOAD migration + FD hardening + secure defaults
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
note over BwrapSandbox,bwrap: Host side
BwrapSandbox->>BwrapSandbox: serialize_rules_for_env() → LANDLOCK_RULES_ENV
BwrapSandbox->>BwrapSandbox: ld_library_path_for_binary() → LD_LIBRARY_PATH
BwrapSandbox->>BwrapSandbox: locate libboxlite_landlock.so
BwrapSandbox->>bwrap: launch with --setenv LD_PRELOAD, SEAL_MARKER_ENV, LANDLOCK_RULES_ENV
end
rect rgba(34, 139, 34, 0.5)
note over bwrap,shim: Inside sandbox (post-namespace)
bwrap->>SEAL_CTOR: .init_array fires (LD_PRELOAD)
SEAL_CTOR->>apply_rules_from_env: call
apply_rules_from_env->>build_ruleset: paths + network flag
build_ruleset-->>restrict_self_raw: ruleset fd
restrict_self_raw->>restrict_self_raw: PR_SET_NO_NEW_PRIVS + landlock_restrict_self
restrict_self_raw-->>SEAL_CTOR: Ok or abort()
SEAL_CTOR-->>shim: return (Landlock active)
shim->>shim: run_shim()
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (no test-logic change): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Pure rename + supporting wiring. Zero new test logic, zero source changes. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:
test_error_code_mapping.py 10 cases 4 xfails
test_quota_enforcement.py 5 cases 5 xfails (module-level)
test_runner_concurrency.py 6 cases 1 xfail
-- --
21 10
Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.
xfail coverage map (which PR removes which marker):
test_exec_after_box_removed_is_typed_error
→ PR C (runner exec error mapping: IsNotFound → 404)
test_invalid_argument_zero_cpu_returns_400 )
test_invalid_argument_negative_memory_returns_400 ) needs API
test_oversized_cpu_returns_400 ) ValidationPipe
5x test_quota_enforcement.py ) + per-sandbox
) quota fix
) (separate
) follow-up)
test_execution_invalid_command_returns_422
→ needs Rust spawn-failed → ErrExecution mapping fix
(separate follow-up; not in C or D)
Split out of boxlite-ai#681 — this is **PR B of 4**:
A (boxlite-ai#682) reorganize — merged before this rebases cleanly
B (this) 21 new cases
C runner exec error mapping fix
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:
exec on a removed box → 500 "spawn_failed: build failed"
exec on a stopped box → 500 (same shape)
exec on box mid-rebuild → 500 (same shape)
Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.
Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:
IsNotFound → 404 Not Found
IsStopped → 409 Conflict
IsInvalidState → 409 Conflict
(other / no SDK match) → 500 (unchanged)
Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.
Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.
Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).
Split out of boxlite-ai#681 — this is **PR C of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (this) runner exec error mapping fix + flip 1 xfail
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.
Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.
Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.
Deterministic ordering tests added inline in execution.rs cover:
- stdout-only exec: all chunks land before Exit
- mixed stdout/stderr: every chunk lands before Wait
- process exit before pump completion: terminal event waits
- pump completion before process exit: terminal event fires
immediately (no spurious wait)
Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.
Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.
Split out of boxlite-ai#681 — this is **PR D of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
D (this) FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (forced by the renames): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Two collisions the rename exposed (both fixed in this PR): 1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/ and would import the new e2e/ subdir, whose conftest needs Python 3.11+ tomllib and a live API stack the unit-test workflow doesn't provide. Fix: norecursedirs = e2e in sdks/python/pytest.ini. The e2e runner passes the dir explicitly on the CLI which bypasses norecursedirs. 2. pytest's default --import-mode=prepend puts each test file's rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the rootpath becomes sdks/python/, which then shadows the installed boxlite wheel with the source-tree stub package (sdks/python/boxlite/__init__.py — missing the compiled .so so Rust-bound symbols like BoxliteRestOptions are absent). Every fixture using boxlite.BoxliteRestOptions failed at setup with AttributeError. Fix: --import-mode=importlib in scripts/test/e2e/pytest.ini (modern mode, no sys.path injection). Tests that previously did `from conftest import drain` couldn't keep doing that under importlib mode (conftest no longer on sys.path) — extracted drain/collect_stream/ stdout_line_count from conftest into scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification which already used the same lib/ + sys.path.insert pattern), and updated the 8 test files' imports. Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed (the 7 are real production bugs the suite is designed to catch: PR-C fixes runner exec error mapping, PR-D fixes FFI drain race). Before this PR's collision fix: 0 passed, 32 fixture-setup errors. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:
test_error_code_mapping.py 10 cases 4 xfails
test_quota_enforcement.py 5 cases 5 xfails (module-level)
test_runner_concurrency.py 6 cases 1 xfail
-- --
21 10
Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.
xfail coverage map (which PR removes which marker):
test_exec_after_box_removed_is_typed_error
→ PR C (runner exec error mapping: IsNotFound → 404)
test_invalid_argument_zero_cpu_returns_400 )
test_invalid_argument_negative_memory_returns_400 ) needs API
test_oversized_cpu_returns_400 ) ValidationPipe
5x test_quota_enforcement.py ) + per-sandbox
) quota fix
) (separate
) follow-up)
test_execution_invalid_command_returns_422
→ needs Rust spawn-failed → ErrExecution mapping fix
(separate follow-up; not in C or D)
Split out of boxlite-ai#681 — this is **PR B of 4**:
A (boxlite-ai#682) reorganize — merged before this rebases cleanly
B (this) 21 new cases
C runner exec error mapping fix
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:
exec on a removed box → 500 "spawn_failed: build failed"
exec on a stopped box → 500 (same shape)
exec on box mid-rebuild → 500 (same shape)
Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.
Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:
IsNotFound → 404 Not Found
IsStopped → 409 Conflict
IsInvalidState → 409 Conflict
(other / no SDK match) → 500 (unchanged)
Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.
Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.
Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).
Split out of boxlite-ai#681 — this is **PR C of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (this) runner exec error mapping fix + flip 1 xfail
D FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.
Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.
Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.
Deterministic ordering tests added inline in execution.rs cover:
- stdout-only exec: all chunks land before Exit
- mixed stdout/stderr: every chunk lands before Wait
- process exit before pump completion: terminal event waits
- pump completion before process exit: terminal event fires
immediately (no spurious wait)
Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.
Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.
Split out of boxlite-ai#681 — this is **PR D of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
D (this) FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a0a6a00 to
5dc504a
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/boxlite/build.rs (1)
849-851: 📐 Maintainability & Code Quality | 🔵 TrivialExecutable-name list is duplicated across the build/runtime boundary.
is_runtime_executablein build.rs lists["boxlite-shim", "boxlite-guest", "bwrap"], and the same set appears asRUNTIME_EXECUTABLESinsrc/boxlite/src/runtime/embedded.rs(line 24). While they currently match, the lists are maintained separately; if one is updated and the other is not, a newly-embedded executable could be extracted without the exec bit and silently fail to launch. Consider consolidating to a single shared definition (e.g., viainclude!) to enforce consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/boxlite/build.rs` around lines 849 - 851, The `is_runtime_executable` function in build.rs and the `RUNTIME_EXECUTABLES` constant in src/boxlite/src/runtime/embedded.rs maintain duplicate lists of executable names that could drift out of sync. Consolidate these two lists into a single shared definition (such as a separate module-level constant or data file) and use the include! macro or direct module reference to ensure both locations reference the same authoritative list of executables. This will prevent future inconsistencies where one list is updated without updating the other.Source: Coding guidelines
src/boxlite/src/jailer/sandbox/bwrap.rs (1)
171-179: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winShare the dir-first
LD_LIBRARY_PATHpolicy helper.This duplicates
ld_library_path_for_shiminsrc/boxlite/src/jailer/bwrap.rs. Since both encode the same library-resolution policy, move it to one shared helper to avoid ordering drift.As per coding guidelines, “Use DRY (Don't Repeat Yourself) in Rust when it's the same rule, policy, or transformation.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/boxlite/src/jailer/sandbox/bwrap.rs` around lines 171 - 179, The ld_library_path_for_binary function duplicates the same library-resolution policy as ld_library_path_for_shim located elsewhere. Extract the common logic that constructs a LD_LIBRARY_PATH with the binary directory first followed by any existing LD_LIBRARY_PATH into a single shared helper function. Update both ld_library_path_for_binary and ld_library_path_for_shim to call this shared helper instead of duplicating the logic, ensuring consistent policy enforcement across the codebase and preventing future ordering drift.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/src/jailer/landlock.rs`:
- Around line 220-225: The error handling for Landlock ruleset build failures in
the match expression on the Err(e) branch currently logs a warning and allows
execution to continue, which bypasses security protections. Instead of warning
and continuing when ruleset building fails, change this error case to return or
propagate the error, ensuring the operation fails fast rather than silently
continuing without Landlock enforcement. This aligns with the fail-closed
security principle where a ruleset build error should halt execution rather than
proceed in an unprotected state.
- Around line 195-197: The network enablement logic currently fails open by
defaulting to enabled when BOXLITE_LANDLOCK_NETWORK_ENABLED_ENV is missing or
malformed via the unwrap_or(true) fallback. When BOXLITE_LANDLOCK_RULES is
present, validate that BOXLITE_LANDLOCK_NETWORK_ENABLED exists and contains only
valid values (e.g., "0" or "1"), then return an error if the environment
variable is missing or contains an unexpected value instead of defaulting to
true. This ensures the configuration fails closed on malformed input rather than
silently enabling network access.
In `@src/boxlite/src/jailer/sandbox/bwrap.rs`:
- Around line 126-138: The error handling in the Landlock rules serialization
block (in the Err(e) branch) currently logs a warning and continues, which
allows command construction to proceed without the security layer. Change the
function signature to return a BoxliteResult or appropriate error type, and
propagate the serialization error (the `e` value) instead of just logging it, so
that the entire sandbox application or command construction fails if Landlock
rules cannot be serialized, implementing the fail-closed principle.
In `@src/shim/src/main.rs`:
- Around line 101-104: The Landlock setup block that calls apply_rules_from_env
can fail before the ExitInfo write path is installed, causing errors to bypass
structured exit reporting. Either capture exit_file before the Landlock block so
it remains available if apply_rules_from_env fails, or refactor the Landlock
error handling to use the same error-reporting mechanism as run_shim. Ensure
that any failure during Landlock initialization is properly reported through the
structured exit path rather than returning directly without calling the
exit_file write logic.
---
Nitpick comments:
In `@src/boxlite/build.rs`:
- Around line 849-851: The `is_runtime_executable` function in build.rs and the
`RUNTIME_EXECUTABLES` constant in src/boxlite/src/runtime/embedded.rs maintain
duplicate lists of executable names that could drift out of sync. Consolidate
these two lists into a single shared definition (such as a separate module-level
constant or data file) and use the include! macro or direct module reference to
ensure both locations reference the same authoritative list of executables. This
will prevent future inconsistencies where one list is updated without updating
the other.
In `@src/boxlite/src/jailer/sandbox/bwrap.rs`:
- Around line 171-179: The ld_library_path_for_binary function duplicates the
same library-resolution policy as ld_library_path_for_shim located elsewhere.
Extract the common logic that constructs a LD_LIBRARY_PATH with the binary
directory first followed by any existing LD_LIBRARY_PATH into a single shared
helper function. Update both ld_library_path_for_binary and
ld_library_path_for_shim to call this shared helper instead of duplicating the
logic, ensuring consistent policy enforcement across the codebase and preventing
future ordering drift.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b9743a7d-97b1-4daa-b067-c77733218a28
📒 Files selected for processing (12)
src/boxlite/build.rssrc/boxlite/src/jailer/bwrap.rssrc/boxlite/src/jailer/common/fd.rssrc/boxlite/src/jailer/landlock.rssrc/boxlite/src/jailer/mod.rssrc/boxlite/src/jailer/pre_exec.rssrc/boxlite/src/jailer/sandbox/bwrap.rssrc/boxlite/src/jailer/sandbox/composite.rssrc/boxlite/src/jailer/sandbox/mod.rssrc/boxlite/src/jailer/shim_copy.rssrc/boxlite/src/runtime/embedded.rssrc/shim/src/main.rs
Split out of boxlite-ai#681. These fixes are about embedded-runtime robustness under in-place binary rollout (the runner-rollout flow), independent of the boxlite-ai#652 default-sandbox-on change — they affect boxlite-shim/boxlite-guest, which every box needs regardless of whether the sandbox is enabled. - embedded.rs: stamp the runtime cache with the embedded manifest hash so a same-version binary swap (dev-channel rollout) invalidates stale extracted runtime binaries instead of silently reusing them; enforce +x on extracted runtime executables (boxlite-shim/boxlite-guest/bwrap). - build.rs: emit BOXLITE_MANIFEST_HASH and set executable mode on the embedded runtime binaries. - shim_copy.rs: re-mark an already-present copied shim as executable. Tests (two-side verified): cache-staleness predicate, runtime-exec perms enforcement, shim-copy perms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdks/python/src/advanced_options.rs (1)
134-152: 🎯 Functional Correctness | 🟡 Minor
seccomp_enableddefault diverges from core on non-Linux.The docstring claims this constructor matches
SecurityOptions::default(), but the core default gates seccomp withcfg!(target_os = "linux")(false on macOS), whereas this signature defaults it to unconditionaltrue. The sibling presetsstandard()andmaximum()in the core also cfg-gate it. TheFromimpl carries this value straight into the coreSecurityOptions, so on macOS a defaultSecurityOptions()producesseccomp_enabled=truewhere every other surface yieldsfalse— misleading on introspection even though seccomp is Linux-only.Note the assertion in
sdks/python/tests/test_security_options.py(assert opts.seccomp_enabled is True) would need platform-gating if you align with core.Proposed alignment with core
- seccomp_enabled=true, + seccomp_enabled=cfg!(target_os = "linux"),The proposed fix using
cfg!(target_os = "linux")is valid in pyo3 0.27.1—the macro expands at compile time before pyo3 processes the signature attribute, producing a valid boolean literal.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdks/python/src/advanced_options.rs` around lines 134 - 152, The `seccomp_enabled` parameter in the SecurityOptions constructor defaults unconditionally to `true`, but the core SecurityOptions::default() gates this with cfg!(target_os = "linux") to default it to `false` on non-Linux platforms. Change the `seccomp_enabled=true` default in the pyo3 signature to use `cfg!(target_os = "linux")` to match the core behavior, ensuring the Python SDK produces `seccomp_enabled=false` on macOS and other non-Linux platforms. Additionally, the assertion in the test file (sdks/python/tests/test_security_options.py) that asserts opts.seccomp_enabled is True will also need to be platform-gated using cfg! to only run on Linux.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@sdks/python/src/advanced_options.rs`:
- Around line 134-152: The `seccomp_enabled` parameter in the SecurityOptions
constructor defaults unconditionally to `true`, but the core
SecurityOptions::default() gates this with cfg!(target_os = "linux") to default
it to `false` on non-Linux platforms. Change the `seccomp_enabled=true` default
in the pyo3 signature to use `cfg!(target_os = "linux")` to match the core
behavior, ensuring the Python SDK produces `seccomp_enabled=false` on macOS and
other non-Linux platforms. Additionally, the assertion in the test file
(sdks/python/tests/test_security_options.py) that asserts opts.seccomp_enabled
is True will also need to be platform-gated using cfg! to only run on Linux.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 0b41ace4-0968-45bd-816d-126470710221
📒 Files selected for processing (2)
sdks/python/src/advanced_options.rssdks/python/tests/test_security_options.py
|
Addressed all four fail-open findings in 7994709 (the Landlock handoff must fail closed since the shim is now the only place Landlock is applied):
clippy |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/src/jailer/landlock.rs`:
- Line 199: The doc comment around line 199 describing the landlock behavior
still states that unsupported kernels "warn and continue", but this no longer
matches the implementation at line 246 where the Ok(None) branch now degrades
silently without emitting a warning. Update the documentation comment to
accurately reflect the current behavior by removing the reference to "warn" and
instead describe that unsupported kernels degrade silently without warnings.
In `@src/boxlite/src/jailer/sandbox/bwrap.rs`:
- Around line 160-171: The issue is that wrapped_args is built in bwrap.rs to
include "--seal-landlock" followed by the original args, but the
seal_landlock_and_reexec function in main.rs discards these args when performing
the re-exec. Modify the seal_landlock_and_reexec function to accept the original
arguments as a parameter and pass them through when re-execing the shim,
ensuring that any arguments provided to the wrapper are preserved and reach the
final shim process after Landlock enforcement.
In `@src/shim/src/main.rs`:
- Around line 144-155: The seal_landlock_and_reexec() function re-executes the
binary without forwarding the original command-line arguments that were passed
by the wrapper in bwrap.rs. The function currently calls
Command::new(&exe).exec() with no arguments, which silently drops all argv
entries beyond the program name. To fix this, collect the remaining command-line
arguments using std::env::args() (skipping the program name itself and the
--seal-landlock flag that was added by the wrapper), and pass them to the
Command::new(&exe) using the .args() method before calling .exec() to ensure all
arguments are properly forwarded across the exec boundary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 89fabf3c-73f8-4999-81c0-4499608076c2
📒 Files selected for processing (8)
sdks/c/src/tests.rssdks/python/src/advanced_options.rssrc/boxlite/src/jailer/landlock.rssrc/boxlite/src/jailer/mod.rssrc/boxlite/src/jailer/sandbox/bwrap.rssrc/boxlite/src/jailer/sandbox/landlock.rssrc/boxlite/src/jailer/sandbox/mod.rssrc/shim/src/main.rs
💤 Files with no reviewable changes (1)
- src/boxlite/src/jailer/sandbox/landlock.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- sdks/c/src/tests.rs
- sdks/python/src/advanced_options.rs
- src/boxlite/src/jailer/sandbox/mod.rs
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/boxlite/src/jailer/sandbox/mod.rs (1)
13-31: 📐 Maintainability & Code Quality | 🟡 MinorUpdate module docs to describe the actual stub binary architecture, not the
--seal-landlockflag flow.Lines 28–31 claim bwrap "re-exec's the shim as
<shim> --seal-landlock" and that this applies Landlock. The actual implementation (lines 163–184) wraps the shim with a separateboxlite-landlock-sealstub binary: bwrap exec's<boxlite-landlock-seal> <shim> [args]. The stub applies Landlock from environment rules, then exec's the shim—the shim itself never applies Landlock. Reword the composition example to match the stub binary flow and clarify that Landlock is applied by the intermediate stub, not by the shim.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/boxlite/src/jailer/sandbox/mod.rs` around lines 13 - 31, The documentation comment describing the Linux sandbox composition and Landlock application flow is inaccurate. Update the doc comment for the module (currently describing BwrapSandbox and the Landlock flow around lines 28-31) to reflect the actual implementation where bwrap exec's the boxlite-landlock-seal stub binary that wraps the shim, rather than claiming bwrap re-exec's the shim with a --seal-landlock flag. Clarify that the intermediate boxlite-landlock-seal stub binary is what applies Landlock rules from the environment and then re-exec's the actual shim, so the shim itself never applies Landlock. Update the composition example to accurately represent this stub binary architecture flow.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/build.rs`:
- Around line 902-907: The copy_prebuilt_binary call for "boxlite-landlock-seal"
silently warns and skips when the binary is absent, which violates the fail-fast
principle for critical components. Replace or wrap this call to explicitly fail
(e.g., error and exit) if the boxlite-landlock-seal binary cannot be found or
copied using Self::find_prebuilt_seal, since Landlock enforcement depends on
this stub being present in the embedded Linux runtime.
In `@src/boxlite/src/jailer/sandbox/bwrap.rs`:
- Around line 174-185: The Err branch of the find_binary match for SEAL_STUB_BIN
currently falls back to executing the binary directly without Landlock
protection, which is a security fail-open condition. Modify this branch to build
the command against the expected stub path (the path that should exist but
doesn't) rather than using the binary directly, so that bwrap's exec will fail
when attempting to run a non-existent binary. This ensures the box fails to
start rather than silently running without Landlock, making the error handling
consistent with the sentinel path at lines 139-150 and adhering to the
fail-closed principle.
---
Outside diff comments:
In `@src/boxlite/src/jailer/sandbox/mod.rs`:
- Around line 13-31: The documentation comment describing the Linux sandbox
composition and Landlock application flow is inaccurate. Update the doc comment
for the module (currently describing BwrapSandbox and the Landlock flow around
lines 28-31) to reflect the actual implementation where bwrap exec's the
boxlite-landlock-seal stub binary that wraps the shim, rather than claiming
bwrap re-exec's the shim with a --seal-landlock flag. Clarify that the
intermediate boxlite-landlock-seal stub binary is what applies Landlock rules
from the environment and then re-exec's the actual shim, so the shim itself
never applies Landlock. Update the composition example to accurately represent
this stub binary architecture flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b3495853-622b-41db-b89d-de96ce95b461
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
Cargo.tomlscripts/build/build-runtime.shsrc/boxlite/Cargo.tomlsrc/boxlite/build.rssrc/boxlite/src/jailer/landlock.rssrc/boxlite/src/jailer/mod.rssrc/boxlite/src/jailer/sandbox/bwrap.rssrc/boxlite/src/jailer/sandbox/mod.rssrc/landlock/Cargo.tomlsrc/landlock/src/bin/seal.rssrc/landlock/src/lib.rssrc/shim/src/main.rs
💤 Files with no reviewable changes (1)
- src/boxlite/src/jailer/mod.rs
✅ Files skipped from review due to trivial changes (2)
- src/landlock/Cargo.toml
- src/shim/src/main.rs
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/src/jailer/sandbox/bwrap.rs`:
- Around line 189-194: The tracing::error! macro call and the bwrap_cmd.build()
method call in the bwrap.rs file have formatting drift that doesn't meet the
repository's formatting standards. Run rustfmt on the
src/boxlite/src/jailer/sandbox/bwrap.rs file to automatically reformat the code
in the affected hunk to comply with the repository's formatting guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 96a2c4c9-a22d-47f1-b32f-3ab713edd798
📒 Files selected for processing (9)
scripts/build/build-runtime.shsrc/boxlite/build.rssrc/boxlite/src/jailer/landlock.rssrc/boxlite/src/jailer/sandbox/bwrap.rssrc/boxlite/src/jailer/sandbox/mod.rssrc/boxlite/src/jailer/shim_copy.rssrc/landlock/Cargo.tomlsrc/landlock/src/lib.rssrc/shim/src/main.rs
✅ Files skipped from review due to trivial changes (1)
- src/shim/src/main.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/boxlite/build.rs
- src/boxlite/src/jailer/landlock.rs
- scripts/build/build-runtime.sh
- src/landlock/src/lib.rs
- src/boxlite/src/jailer/sandbox/mod.rs
a345677 to
f99c4ee
Compare
Alternative Landlock mechanism to the net-only host-side approach in PR boxlite-ai#681. A full-AccessFs Landlock domain blocks every mount syscall, so it cannot be applied host-side before bwrap. This approach instead ships the rules through the environment and LD_PRELOADs libboxlite_landlock.so into the shim (via bwrap --setenv); the .so's .init_array constructor applies Landlock after bwrap's mounts, before the shim's main(). This branch carries ONLY the preload Landlock mechanism. The other boxlite-ai#652 startup fixes (pre_exec FD cleanup, shim library path, SDK secure-by-default) are kept in boxlite-ai#681, so this branch alone does not boot a box — it exists to preserve the preload design as an alternative. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f99c4ee to
190b560
Compare
190b560 to
1706740
Compare
|
Boot-verified on Linux 6.x: with |
…ind box/bin) The default sandbox (boxlite-ai#652) is unusable on Linux — the box never starts. Two independent bugs, both fixed here: 1. The host-side Landlock domain handles AccessFs, which blocks every mount syscall (verified across ABI v1-v5: handling any AccessFs right — even a single IoctlDev bit — makes bwrap's mount() return EPERM). The pre_exec hook runs in the bwrap process before its mounts, so it EPERMs bwrap's own mount(). A host-side filesystem Landlock domain fundamentally cannot coexist with bwrap's mounts. Fix: drop LandlockSandbox from the default stack — filesystem isolation is left to bwrap. LandlockSandbox and the ruleset builder (with their tests) are kept in place, unused, for a future post-mount path. 2. context() computes the bwrap bind list before copy_shim_to_box() creates box/bin, so box/bin (holding the copied shim) is never bound into the sandbox and bwrap can't exec the shim (execvp ENOENT). Fix: add box/bin to the bind list after the shim is copied. The alternative that keeps a full-AccessFs Landlock domain via an LD_PRELOAD cdylib (applied inside the shim's post-mount view) is in boxlite-ai#842. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1706740 to
94d2495
Compare
deny landlock permanently
and add the bin path inside binding
add up to #652