Summary
A detached box has no completion signal for the container's main process. When a box is started detached and its PID 1 exits (workload finished, batch job done), there is no host-side way to observe that: list_info().state.status stays Running (it tracks the microVM, which is still up), and there is no wait()/exit-code API at the box level. The only way to detect completion today is to exec into the box and pattern-match the resulting error. Tearing the box down then requires force=true because the box still counts as active.
This makes one-shot / batch workloads (run a command to completion, observe it finished, clean up) awkward to express through the SDK.
Where (as of 07fa30f9)
- Box status has no "container process exited" state —
BoxStatus::{Running, Stopped, …} tracks the VM lifecycle; start() on a Running box is a no-op (src/boxlite/src/litebox/box_impl.rs:202).
- The guest does know init exited, but only surfaces it as an exec-time error string:
src/guest/src/service/container.rs:267 and src/guest/src/service/exec/mod.rs:435 ("Container init process exited …").
remove() rejects an active box without force: src/boxlite/src/runtime/rt_impl.rs:947 — cannot remove active box {id} (status: …). Use force=true to stop first.
- No box-level
wait() / exit-code accessor exists (the exit_code field is on exec results only, e.g. sdks/python/boxlite/exec.py:27).
Repro
- Start a box detached whose command runs to completion and exits (e.g.
sh -c 'echo done').
- Poll
list_info() → status stays Running indefinitely.
- The only signal that the workload finished is
box.exec("true") failing with "Container init process exited".
runtime.remove(name) → InvalidState: cannot remove active box … Use force=true.
Expected
Either:
- A queryable completion signal — e.g. a distinct status (
Exited) or a wait() returning the init exit code for detached boxes; and/or
remove() on a box whose workload has exited treated as a clean (non-forced) removal.
Related — no typed error for lifecycle conflicts (was "A4")
A neighbouring rough edge: lifecycle-conflict conditions surface only as generic BoxliteError::InvalidState(String) (src/shared/src/errors.rs:56) with no stable code, so SDK consumers must substring-match messages (e.g. "already running" on a redundant start()). A typed/coded variant for these would remove the string-matching. Listed here because it shares the same root: lifecycle state isn't exposed in a structured way.
Current workaround (downstream)
apps/infra-local exec-probes for init exit and force-removes the box:
apps/infra-local/boxlite_local/orchestrator.py — _wait_one_shot_exit (exec-probe loop) and runtime.remove(name, force=True).
These are documented in that package's README "SDK gotchas" table. We'd happily drop them once a first-class completion signal exists.
Summary
A detached box has no completion signal for the container's main process. When a box is started detached and its PID 1 exits (workload finished, batch job done), there is no host-side way to observe that:
list_info().state.statusstaysRunning(it tracks the microVM, which is still up), and there is nowait()/exit-code API at the box level. The only way to detect completion today is toexecinto the box and pattern-match the resulting error. Tearing the box down then requiresforce=truebecause the box still counts as active.This makes one-shot / batch workloads (run a command to completion, observe it finished, clean up) awkward to express through the SDK.
Where (as of
07fa30f9)BoxStatus::{Running, Stopped, …}tracks the VM lifecycle;start()on aRunningbox is a no-op (src/boxlite/src/litebox/box_impl.rs:202).src/guest/src/service/container.rs:267andsrc/guest/src/service/exec/mod.rs:435("Container init process exited …").remove()rejects an active box without force:src/boxlite/src/runtime/rt_impl.rs:947—cannot remove active box {id} (status: …). Use force=true to stop first.wait()/ exit-code accessor exists (theexit_codefield is onexecresults only, e.g.sdks/python/boxlite/exec.py:27).Repro
sh -c 'echo done').list_info()→ status staysRunningindefinitely.box.exec("true")failing with "Container init process exited".runtime.remove(name)→InvalidState: cannot remove active box … Use force=true.Expected
Either:
Exited) or await()returning the init exit code for detached boxes; and/orremove()on a box whose workload has exited treated as a clean (non-forced) removal.Related — no typed error for lifecycle conflicts (was "A4")
A neighbouring rough edge: lifecycle-conflict conditions surface only as generic
BoxliteError::InvalidState(String)(src/shared/src/errors.rs:56) with no stable code, so SDK consumers must substring-match messages (e.g. "already running" on a redundantstart()). A typed/coded variant for these would remove the string-matching. Listed here because it shares the same root: lifecycle state isn't exposed in a structured way.Current workaround (downstream)
apps/infra-localexec-probes for init exit and force-removes the box:apps/infra-local/boxlite_local/orchestrator.py—_wait_one_shot_exit(exec-probe loop) andruntime.remove(name, force=True).These are documented in that package's README "SDK gotchas" table. We'd happily drop them once a first-class completion signal exists.