Skip to content

Halt container on unrecoverable startup errors instead of exit-and-loop #985

@jessica-claude

Description

@jessica-claude

Description

When docker_start.sh detects an unrecoverable error at startup (dependency conflict, corrupt venv, missing fd binary), it prints a clear error banner and exits non-zero. Under restart: unless-stopped (which the docs recommend), Docker restarts the container immediately — same error, same exit, forever. The user has to notice via docker logs and SSH in to fix it.

Docker has no mechanism for a container to say "don't restart me" (moby/moby#49397).

Real-world trigger

When hassette 0.40.0 shipped, a downstream project's uv.lock still pinned 0.39.1. The constraints file detected the mismatch and printed a DEPENDENCY CONFLICT banner, but the container looped until the lock file was manually updated.

Prior Art

Research in design/research/2026-06-07-docker-restart-loop-prevention/research.md.

LinuxServer.io (hundreds of containers) halts on init failure instead of exiting — the container stays alive in an idle state so Docker never triggers a restart. They chose this because restart loops cause "weird resource spikes" and mask the original error under noise.

Proposed Change

Replace exit 1 with sleep infinity at all 4 unrecoverable error paths in docker_start.sh. All 4 are permanent failures that won't resolve on retry:

Location Error Why unrecoverable
Venv health check (L23) import importlib.metadata fails Corrupt image or venv
uv export/install timeout (L91) 300s/120s timeout exceeded Missing lock file, unreachable git deps
Dependency conflict (L133) pip resolver conflict Lock file pins incompatible version
Missing fd binary (L180) INSTALL_DEPS=1 but no fd Image configuration error

The existing error banner pattern (the ───── box with problem description and remediation steps) stays as-is — just followed by sleep infinity instead of exit 1.

A separate issue will be created to investigate serving a minimal health endpoint and self-contained error page during the halt state, so the user can discover the problem in the browser instead of checking logs.

Acceptance Criteria

  • All 4 unrecoverable exit 1 paths in docker_start.sh halt (sleep infinity) instead of exiting
  • Error banner prints once, then the container stays alive and idle
  • Exit trap cleanup (rm -f of temp files) still runs before the sleep
  • docker logs shows a single clean error message, not repeated ones
  • Halt message tells the user to fix the problem and docker restart the container
  • Tests in test_docker_integration.py updated — tests asserting returncode != 0 for conflict scenarios verify halt behavior instead
  • Tests pass — existing tests updated, new tests added for changed behavior

Affected Areas

  • scripts/docker_start.sh — convert 4 exit 1 calls to halt pattern
  • tests/test_docker_integration.py — update tests that assert exit codes for conflict/failure scenarios (e.g., test_docker_constraint_conflict)
  • docs/pages/getting-started/docker/troubleshooting.md — document halt behavior and "fix then docker restart" workflow

Context

  • tini is PID 1 (Dockerfile L108) — handles zombie reaping during infinite sleep
  • Script runs as non-root hassette user (UID 1000)
  • Exit trap on L145 cleans up temp files — must fire before sleep, not be skipped by it
  • The error banner pattern in run_uv_install() (L94-129) should be reused for halt messaging

Related

  • #974 — WebSocket resilience budget restart loop (different trigger, same user-facing symptom)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corePart of the core Hassette framework, not necessarily user-facingsize:mediumModerate effort, a few hourstopic:errorsError handling, retries, error display, exception designtopic:lifecycleStartup/shutdown sequences, state machines, readiness, cleanuptype:enhancementNew feature or request

    Projects

    Status
    Refinement

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions