Skip to content

Big Smooth: bump idle timeout 30 min → 24 h (was masquerading as 'crashes unprompted')#86

Open
brentrager wants to merge 2 commits into
th-491e0c-pi-driverfrom
th-1b9b3e-idle-timeout-fix
Open

Big Smooth: bump idle timeout 30 min → 24 h (was masquerading as 'crashes unprompted')#86
brentrager wants to merge 2 commits into
th-491e0c-pi-driverfrom
th-1b9b3e-idle-timeout-fix

Conversation

@brentrager

Copy link
Copy Markdown
Contributor

Closes pearl th-1b9b3e.

What was happening

Big Smooth wasn't crashing — it was gracefully exiting on a 30-minute idle timer (server.rs:600+). The bench's /loop 1800s wakeup intervals tripped it every iteration, killing the daemon mid-session.

Smoking gun in the safehouse exec.log:

2026-06-03T21:07:17 INFO smooth_bigsmooth::server:
  Idle timeout reached (1800s), shutting down

Why this is a pi/opencode-parity fix

Pi + OpenCode have no daemon → no auto-shutdown. Smooth's daemon model meant every loop pause was implicitly a kill. User direction 2026-06-03: "make smooth competitive with [pi + opencode]."

Fix

24h default. Env override (SMOOTH_BIGSMOOTH_IDLE_TIMEOUT_SECS) still works (set to 0 to disable). Caveat: env must be set in the daemon process — in sandboxed mode that's the safehouse VM, not the host shell, so the env-var path needs separate plumbing if anyone wants to tighten the timeout below default in a bench scenario.

Test plan

  • Rebuilt safehouse binary via scripts/build-safehouse.sh; copied to ~/.smooth/runner-bin/safehouse
  • th down && th up && th status reports healthy
  • Run a 30-min idle period (e.g., scheduled wakeup) and verify daemon survives — will validate on the next /loop cycle

Root-caused the "Big Smooth crashes unprompted" symptom: it wasn't
crashing — it was gracefully shutting down on a 30-minute idle
timer (server.rs:600+). Bench evidence today showed the 30-minute
cliff fired after almost every /loop pause (1800s wakeup intervals),
killing the daemon mid-session and forcing 3+ manual `th up`s.

Pi + OpenCode have no daemon → no auto-shutdown → no "crashed
unprompted" symptom. Smooth's daemon model meant every loop pause
was implicitly a kill. This is exactly the kind of competitive-
parity gap the bench was designed to surface (user direction
2026-06-03: "i want smooth to learn from pi and opencode and make
smooth competitive").

24h default keeps a safety net for genuinely forgotten dev
sessions but doesn't fire during a single work session. The
existing `SMOOTH_BIGSMOOTH_IDLE_TIMEOUT_SECS` env override still
works (set to 0 to disable, or to a smaller value to opt back in
to aggressive timeouts) — caveat that the env must be set in the
daemon's process, which in sandboxed mode is the safehouse VM
(not the host shell).

Smoking-gun log line that closed the diagnosis:

  2026-06-03T21:07:17 INFO smooth_bigsmooth::server:
    Idle timeout reached (1800s), shutting down

Required rebuild path: scripts/build-safehouse.sh + cp the new
binary to ~/.smooth/runner-bin/safehouse, then th down + th up.
The shadow-bin mechanism (smooth-cli/src/main.rs:1292) bind-mounts
this over the OCI image's safehouse binary so dev iteration on
crates/smooth-bigsmooth doesn't need a full image push.
@changeset-bot

changeset-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 69c3fc7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Documents the existing `th up direct` mode that we'd been overlooking:
boots in ~0.3s on the host instead of the ~30s safehouse-microVM
startup. That's pi/opencode-parity boot time (they're both ~3s).

Bench evidence from today:

  smooth-direct    : 0.850 aggregate, ~0.3s boot
  smooth-sandboxed : 0.789 aggregate, ~30s boot (with variance)
  pi               : 1.000 aggregate, ~3s boot
  opencode         : >=0.93 aggregate, ~3s boot

Direct mode trade-off is no isolation — the agent runs as a host
subprocess against the host filesystem. Fine for dev machines + CI
runners you own + bench harnesses. Sandboxed remains the default
for untrusted dispatch.

Required setup for direct mode (the runner-discovery error message
already tells you, but worth surfacing in docs):
  cargo build --release -p smooai-smooth-operator-runner
  SMOOTH_OPERATOR_RUNNER_NATIVE=~/.cargo/shared-target/release/smooth-operator-runner th up direct

Pearl follow-ups still open after this:
  th-6e361d — pycache run-to-run variance (direct still showed 0.500
              on disk-bloat in one run; smooth's nondeterminism isn't
              purely a sandbox artifact)
  th-e74aa6 — runner-discovery UX paper-cut (separate from this work)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant