Skip to content

fix(vexa-bot): browser-session entrypoint exits when node exits (#258)#260

Open
DmitriyG228 wants to merge 1 commit intomainfrom
fix/browser-session-pod-lifecycle
Open

fix(vexa-bot): browser-session entrypoint exits when node exits (#258)#260
DmitriyG228 wants to merge 1 commit intomainfrom
fix/browser-session-pod-lifecycle

Conversation

@DmitriyG228
Copy link
Copy Markdown
Contributor

Summary

  • Fixes browser-session pod stays Running after node process exits — entrypoint keeps container alive forever for VNC #258: wait in the browser-session entrypoint blocks forever on x11vnc -forever and websockify, so the container never exits after node dist/docker.js returns. K8s sees the pod as Running indefinitely and "stop" from the dashboard appears to do nothing.
  • New env var VEXA_BROWSER_SESSION_VNC_KEEPALIVE_SECONDS (default 0) controls a bounded post-exit debug window. Default behavior: container exits as soon as node does.
  • Preserves the original exit code so K8s sees 0 for clean exits and ≠0 for crashes.

Diff

services/vexa-bot/core/entrypoint.sh (browser-session branch only — meeting-mode branch unchanged):

node dist/docker.js
EXIT_CODE=$?
KEEPALIVE="${VEXA_BROWSER_SESSION_VNC_KEEPALIVE_SECONDS:-0}"
if [[ "$KEEPALIVE" =~ ^[0-9]+$ ]] && [ "$KEEPALIVE" -gt 0 ]; then
  echo "[entrypoint] node dist/docker.js exited with code $EXIT_CODE — keeping container alive ${KEEPALIVE}s for VNC access"
  sleep "$KEEPALIVE"
else
  echo "[entrypoint] node dist/docker.js exited with code $EXIT_CODE — exiting (set VEXA_BROWSER_SESSION_VNC_KEEPALIVE_SECONDS>0 to debug)"
fi
exit "$EXIT_CODE"

Implementation note

The issue body sketched timeout $N wait — that doesn't actually work because wait is a shell builtin and timeout only runs binaries. sleep $N achieves the equivalent bounded keep-alive: the parent shell sleeps, then exits, and the kernel cleans up the still-running x11vnc / websockify / sshd children.

Non-numeric values fall through to the default branch ([[ ... =~ ^[0-9]+$ ]] guard) so a typo can't crash the script.

How to opt back into the post-exit debug window

In services/runtime-api/runtime_api/profiles.py profile YAML:

browser-session:
  image: ...
  env:
    VEXA_BROWSER_SESSION_VNC_KEEPALIVE_SECONDS: "3600"

profiles.py already passes the env block straight into the container — no code change needed there.

Test plan

  • bash -n services/vexa-bot/core/entrypoint.sh (syntax)
  • Smoke-test the three branches (default-0, valid N, non-numeric) — verified the if/else picks the right branch and EXIT_CODE is preserved
  • Build the runtime image and exercise a real browser session: confirm pod terminates within seconds of bot stop on staging
  • Set VEXA_BROWSER_SESSION_VNC_KEEPALIVE_SECONDS=60 in a profile, exercise stop, confirm container stays for ~60s then exits with the original exit code

🤖 Generated with Claude Code

`wait` (no args) blocks on every background child including
`x11vnc -forever` and `websockify`, both of which never return. So
after `node dist/docker.js` exited cleanly, the container stayed
alive forever and K8s saw the pod as Running. A "stop" from the
dashboard had no visible effect — the bot saved state and exited,
but the pod hung around until idle_timeout or manual deletion.

Default `VEXA_BROWSER_SESSION_VNC_KEEPALIVE_SECONDS=0` → exit
immediately on node-exit. Operators who want the post-exit VNC
debug window set the env var to a positive integer (e.g. 3600)
in the runtime profile. Exit code is preserved so K8s sees a
clean 0 vs. a crash.

Real-world impact (release-005 dogfood, 2026-04-25): two
browser-session pods staged Running for 6+ hours after node exited,
ignoring the user's Stop click.

Implementation note: `timeout N wait` doesn't work because `wait`
is a shell builtin, not a binary. `sleep N` achieves the same
bounded-keep-alive: the parent shell sleeps, then exits, and the
kernel reaps the still-running x11vnc / websockify children.

Closes #258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DmitriyG228
Copy link
Copy Markdown
Contributor Author

Planned for merge in the current hardening release (cycle 260424); closes #258.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

browser-session pod stays Running after node process exits — entrypoint keeps container alive forever for VNC

1 participant