CRITICAL BUGFIX (daemon,app): Session spawning via tmux always fails — PID mismatch timeout, hidden errors, orphaned windows (#272)#790
Open
ahundt wants to merge 14 commits intoslopus:mainfrom
Conversation
…s to user
Three bugs fixed:
1. Tmux sessions always timed out (critical):
- spawnInTmux returns #{pane_pid} = shell PID; spawned node reports
process.pid = different PID; webhook lookup by hostPid never matched.
- Fix: generate HAPPY_SPAWN_TOKEN (random hex) before tmux spawn, pass
it in tmuxEnv, store token→session in tokenToTrackedSession map.
createSessionMetadata.ts emits spawnToken from process.env.
onHappySessionWebhook resolves by token first, PID as fallback.
- Files: run.ts, types.ts, createSessionMetadata.ts
2. New session wizard silently swallowed all daemon errors:
- else branch at line 1065 threw a generic Error, discarding
result.errorMessage. Users saw "make sure daemon is running"
for all failures including z.ai missing Z_AI_AUTH_TOKEN.
- Fix: check result.type === 'error' and throw result.errorMessage;
catch block falls through to show actual message when not a
known network pattern.
- File: new/index.tsx
3. stopSession left orphaned tmux windows after stop:
- childProcess is undefined for tmux sessions so kill fell through
to process.kill(shellPid) without closing the tmux window.
- Fix: check session.tmuxSessionId first, call tmux.killWindow()
fire-and-forget to keep stopSession synchronous.
- File: run.ts
Also carries over prior branch fixes:
- runAcp.ts: mode.description null→undefined for formatOptionalDetail
- run.ts: strip CLAUDECODE env before spawning to prevent nested session error
TEMPORARY HACK: Always spawn new sessions into the 'main' tmux session instead of using the profile's TMUX_SESSION_NAME setting. This makes tmux integration testing straightforward. TODO: Remove this hardcoding and use profile.tmuxConfig.sessionName after testing is complete.
…s to user
Three bugs fixed:
1. Tmux sessions always timed out (critical):
- spawnInTmux returns #{pane_pid} = shell PID; spawned node reports
process.pid = different PID; webhook lookup by hostPid never matched.
- Fix: generate HAPPY_SPAWN_TOKEN (random hex) before tmux spawn, pass
it in tmuxEnv, store token→session in tokenToTrackedSession map.
createSessionMetadata.ts emits spawnToken from process.env.
onHappySessionWebhook resolves by token first, PID as fallback.
- Files: run.ts, types.ts, createSessionMetadata.ts
2. New session wizard silently swallowed all daemon errors:
- else branch at line 1065 threw a generic Error, discarding
result.errorMessage. Users saw "make sure daemon is running"
for all failures including z.ai missing Z_AI_AUTH_TOKEN.
- Fix: check result.type === 'error' and throw result.errorMessage;
catch block falls through to show actual message when not a
known network pattern.
- File: new/index.tsx
3. stopSession didn't kill tmux windows (orphaned windows):
- childProcess undefined for tmux sessions so kill fell through
to process.kill(shellPid) without closing the tmux window.
- Fix: check session.tmuxSessionId first, call tmux.killWindow()
fire-and-forget to keep stopSession synchronous.
- File: run.ts
Also fixed tmux command syntax:
- move -P -F flags before other arguments in new-window command
- File: tmux.ts
Three root causes prevented daemon-spawned tmux sessions from working: 1. Env var quoting corruption: spawn() with shell:false passes args directly to tmux, so wrapping -e values in double quotes made them literal (HOME="/Users/foo" instead of HOME=/Users/foo). This broke credential file resolution, causing the spawned process to prompt for interactive authentication instead of using existing keys. 2. Shell redirect failure: zsh noclobber prevents >> on non-existent files. Changed to > since these are always fresh temp files. 3. Missing spawnToken in Claude path: runClaude.ts built its own Metadata object without reading HAPPY_SPAWN_TOKEN from env, so the daemon's token-based webhook matching never resolved. Additional reliability improvements: - Replace fixed 200ms delay with waitForShellReady() that polls capture-pane for a prompt character (up to 5s) - Split send-keys into literal text (-l flag) + separate Enter key to prevent tmux from interpreting >> as key names - Add debug logging for send-keys results and pane verification
Remove temporary debugging code that was used during development: - Remove temp log file redirect (output now visible in tmux pane) - Remove timeout handler file reader (no temp file to read) - Remove send-keys result logging and pane verification capture - Remove unnecessary 50ms/200ms delays between send-keys calls - Revert timeout callback from async to sync (no longer reads files)
…etection
Replace capture-pane regex matching against prompt characters (%$#>±❯→)
with tmux's #{pane_current_command} format variable. Poll until it reports
a known shell name (zsh, bash, fish, etc.), which is deterministic and
works with any custom prompt theme (Prezto, Oh My Zsh, Starship, Powerlevel10k).
- Use process.execPath instead of bare `node` in tmux send-keys command to avoid PATH issues on systems using NVM, asdf, or other version managers where node isn't available until .zshrc finishes sourcing - Fix killWindow to use executeTmuxCommand's window parameter for proper `-t session:window` targeting instead of passing window name as a positional argument (which tmux ignores) - Pass parsed session name to getTmuxUtilities in stopSession to ensure the correct tmux session is targeted - Set CLAUDECODE='' in tmux env to prevent spawned Claude Code from refusing to start with "cannot be launched inside another session" (CLAUDECODE is set by Claude Code to detect nesting; the tmux server may inherit it even if we filter it from the -e flags) - Add nushell (nu), elvish, pwsh to knownShells for waitForShellReady
- Add -d flag to new-window so spawned sessions don't switch the user's active tmux window away from what they're working on - Add tmux version check (>= 3.0) before spawning, since the -e flag for per-window environment variables was added in tmux 3.0. Fails with a clear error message instead of a cryptic tmux parse error on older versions.
Quote process.execPath and cliPath so paths with spaces work. Point directly at dist/index.mjs to skip bin/happy.mjs re-exec. Re-add --no-warnings --no-deprecation flags. Fix stale comment in waitForShellReady.
…x spawn The non-tmux daemon spawn path does not set this flag either. Remote mode sessions handle permissions through the mobile app RPC flow, not terminal prompts, so forcing bypass is unnecessary and should be left to the user profile or mobile app selection.
Resolve session with priority: 1. Profile TMUX_SESSION_NAME env var (explicit user choice) 2. Daemon's own tmux session (if daemon runs inside tmux) 3. Session with the most windows (heuristic for main workspace) 4. spawnInTmux default (first existing session or create 'happy') Non-tmux fallback path is unchanged: if tmux is unavailable or spawn fails, regular detached process spawning is used.
executeTmuxCommand appended -t target AFTER all args, which broke
commands with positional arguments (e.g., display-message -p FORMAT).
tmux rejects: `display-message -p #{fmt} -t target` ("too many arguments")
but accepts: `display-message -t target -p #{fmt}`.
This silently broke waitForShellReady — every spawn wasted 5s on a
timeout because display-message with window target always returned
exit code 1. Now shell detection is instant (~0ms).
Also fix resolveTmuxSessionName priority 2 (daemon's own tmux session):
used executeTmuxCommand which forced -t happy, querying the wrong
session. Now uses execFile directly without -t so tmux reads $TMUX.
Tests added:
- 8 unit tests for tmux version string regex parsing
- 8 unit tests for session list parsing (resolveTmuxSessionName logic)
- 3 unit tests for spawnToken in createSessionMetadata
- 7 integration tests exercising real tmux: -d flag, env vars, killWindow,
pane_current_command detection, PID extraction, CLAUDECODE='', paths
Four critical improvements to ensure proper behavior across all flavors (claude, gemini, codex): A. Add missing await on Codex fs.writeFile (line 367) - Codex auth.json file was not being awaited before process spawn - This caused silent failures in Codex session spawning B. Fix execFile Promise timeout handling (lines 83-97) - tmux session resolution could hang indefinitely if tmux didn't respond - Added explicit timeout rejection with process.kill on timeout - Used pattern already in codebase for process health checks C. Unify agent command mapping with data-driven approach (lines 28-48) - Replace inconsistent ternary (tmux) and switch (non-tmux) patterns - Add AGENT_COMMAND_MAP constant and getAgentCommand() utility - Use same logic for both tmux and non-tmux spawn paths D. Complete Gemini authentication support (line 372-374) - Was falling through to Claude path (CLAUDE_CODE_OAUTH_TOKEN) - Add proper Gemini auth handling (GOOGLE_API_KEY) - Now handles all three flavors: claude, codex, gemini Additional improvements: - Smart window cleanup in stopSession (lines 745-766) - Check if process still alive via process.kill(pid, 0) before killing window - Only kill window if Claude still running, leave alone if user exited - Prevents loss of user's terminal after they quit the CLI All tests pass (450 pass, 1 pre-existing failure unrelated to these changes)
Windows CI fails with:
Cannot find module 'D:\a\happy\happy\node_modules\node_modules\typescript\bin\tsc'
Root cause: npx on Windows resolves tsc via a double-nested node_modules path in
Yarn 1.22 workspace context. The postinstall script also runs build scripts from
the wrong directory on Windows (Yarn 1.22 bug, GitHub Issue #6175).
Fix:
- packages/happy-wire/package.json: use 'tsc' directly instead of 'npx tsc'
(matches existing 'typecheck' script which already works on all platforms)
- packages/happy-cli/package.json: same change for consistency
- .github/workflows/cli-smoke-test.yml (Windows job only):
- SKIP_HAPPY_WIRE_BUILD=1 during install to skip broken postinstall on Windows
- explicit 'cd packages/happy-wire && yarn build' step after install so
happy-coder build can find happy-wire's dist files
58c01a6 to
e79aa55
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Tmux session spawning completely fails: sessions timeout during creation, error messages are hidden from users, and orphaned windows accumulate during cleanup.
Context: PR #272 added the new session wizard GUI with tmux session integration for the mobile app. PR #487 was intended to resolve #272's issues, but expanded scope significantly. When #487 scope diverged from the original discussion, development continued as a separate fork (https://github.com/happier-dev/happier). This PR provides the minimal functionality fixes needed for #272 to work reliably in the main project, focusing on root causes (PID mismatch timeout, error suppression, orphaned window cleanup, async safety, multi-flavor inconsistencies) rather than architectural re-scoping.
Root Causes & Fixes
1. PID mismatch timeout (15s always fires)
HAPPY_SPAWN_TOKEN(random token) instead — passed via env var, reported in webhook, matched by daemon2. Hidden daemon error messages
3. Orphaned tmux windows
stopSessiononly killed the shell process, never closed the tmux windowkillWindow()when tmux session exists. Check if process still alive withprocess.kill(pid, 0)— if Claude running, kill window; if user already exited, leave window alone (becomes independent terminal)4. Silent Codex auth failure
fs.writeFile()was not awaited before spawning processawait5. Timeout could hang daemon
process.kill()and rejection6. Inconsistent multi-flavor support
AGENT_COMMAND_MAP/getAgentCommand()utility; proper Gemini authSecondary Improvements
#{pane_current_command}(works with custom prompts, not just regex)-tflag insertion: moved from end (brokedisplay-message -p FORMAT -t session) to after command name$TMUXenv varprocess.execPath(fixes NVM/asdf/nix PATH issues)-d(no focus stealing)No Regressions