Skip to content

fix(tui): credential-isolated env-token auth (ends recurring 401) + reap defunct sessions#141

Merged
dtzp555-max merged 2 commits into
mainfrom
fix/tui-auth-robust
Jun 13, 2026
Merged

fix(tui): credential-isolated env-token auth (ends recurring 401) + reap defunct sessions#141
dtzp555-max merged 2 commits into
mainfrom
fix/tui-auth-robust

Conversation

@dtzp555-max

Copy link
Copy Markdown
Owner

Summary

Fixes TUI-mode's recurring Please run /login · API Error: 401 (the PI231 incident) — a credential-shadowing + refresh-token-corruption bug — and reaps leaked defunct claude sessions.

Root cause (proven live on PI231, claude 2.1.104 Linux): interactive claude prefers ~/.claude/.credentials.json over the CLAUDE_CODE_OAUTH_TOKEN env var (unlike -p mode, where the env token wins). OCP TUI's per-request spawn + kill-session cycle races claude's single-use refresh-token rotation → the refresh token gets corrupted to an empty string → permanent 401; claude /login only re-corrupts on the next spawn (the treadmill). macOS hosts read credentials from the Keychain, so this bit Linux/file-based hosts specifically — Mac mini was immune.

Fixes:

  1. buildTuiCmd passes CLAUDE_CODE_OAUTH_TOKEN to the spawned claude (necessary but, alone, insufficient — see #2).
  2. Credential-isolated home (the actual fix): when an env token is set, the TUI claude runs in a credential-free scratch HOME (<realHome>/.ocp-tui/home, overridable by OCP_TUI_HOME) seeded with onboarding + cwd-trust but no .credentials.json, so the env token is the only credential. claude never runs the refresh path → never corrupts it. Recurrence-proof: a future claude login can no longer break TUI.
  3. Defunct-session reaping: reapStaleTuiSessions issues tmux kill-server only when no foreign session remains + a 15-min idle-gated periodic reap.

When CLAUDE_CODE_OAUTH_TOKEN is unset → byte-for-byte the prior real-home + credentials.json behaviour (backward compatible).

ALIGNMENT

Class B (OCP-owned TUI spawn/home strategy; cli.js has no analogue). ADR 0007 PR-D amendment. No Class A wire path / .github/workflows/ / models.json touched; CI blacklist unaffected.

Verification

  • npm test245 passed, 0 failed.
  • Two independent fresh-context reviewers APPROVED (Iron Rule 10): 6394ca3 (token passing + reaping); 52ddb99 (credential isolation) — the second binary-verified that the seeded hasCompletedOnboarding + projects[cwd].hasTrustDialogAccepted flags exactly match claude's load-bearing gate keys, so no trust/onboarding dialog can hang a turn.
  • Decisive live test on PI231: with a corrupt credentials.json (empty refresh token) RESTORED in the real home, a TUI turn returns a real answer via the credential-free scratch home; the real-home credentials are left untouched.

🤖 Generated with Claude Code

dtzp555 and others added 2 commits June 13, 2026 12:19
…ct sessions

Root cause (PI231 incident): tmux does not forward the parent's env to the
pane, so the TUI claude never saw CLAUDE_CODE_OAUTH_TOKEN and fell back to
~/.claude/.credentials.json, whose single-use refresh token got corrupted to an
empty string by the per-request spawn + kill-session teardown racing claude's
token rotation -> permanent "Please run /login" 401 (re-login re-corrupted on
the next spawn). Connected leak: the pane's claude is a child of the tmux server
(not node), so kill-session left <defunct> zombies the server never reaped (25
over 30 days; tmux kill-server dropped it 25->3).

Fix 1: buildTuiCmd now adds CLAUDE_CODE_OAUTH_TOKEN=<shq-escaped> to the pane
env prefix when the env is set, so claude authenticates via the long-lived token
and never touches the credentials.json refresh path (matching stable hosts).
Unset -> no token added (credentials.json-only hosts unaffected).

Fix 2: reapStaleTuiSessions kill-servers after clearing our own sessions ONLY
when no foreign tmux session remains (never disrupts a co-hosted olp-tui-*).
kill-server is the only node-reachable action that ACTUALLY reaps -- server exit
reparents survivors to init, which waitpids them; a per-session kill cannot,
since node is not the zombies' parent. Added a 15-min periodic reap (server.mjs)
gated on TUI_MODE and on the TUI path being idle. Residual: a request whose pane
is created in the idle-check/kill-server window fails cleanly via the existing
honesty gates (documented).

ALIGNMENT: Class B (OCP-owned TUI spawn). cli.js does NOT perform either
operation -- there is no cli.js analogue for "how the TUI pane authenticates" or
"reaping tmux-server-owned zombies"; authorized by ADR 0007 (PR-C amendment) per
ALIGNMENT.md's Class B citation requirement. No Class A wire surface, no endpoint
shape, no alignment.yml token, and no models.json entry touched.

Tests: +6 in test-features.mjs (buildTuiCmd token set/unset/shq-injection;
reaper kill-server ours-only / foreign-present / no-server). 241 passed, 0 failed.

Co-Authored-By: Claude <claude-opus> <noreply@anthropic.com>
…n shadowing)

Passing CLAUDE_CODE_OAUTH_TOKEN to the spawned interactive `claude` (commit
6394ca3) is necessary but INSUFFICIENT to fix the PI231 401: interactive `claude`
PREFERS ~/.claude/.credentials.json over the env var (unlike `-p`, where the env
token wins), so a stale/corrupt credentials.json SHADOWS the env token. Decisive
live evidence on PI231 (claude 2.1.104):

  - env token passed + a broken ~/.claude/.credentials.json present → 401
    ("Please run /login · API Error: 401").
  - env token passed + credentials.json moved aside              → real answer.

Fix: when CLAUDE_CODE_OAUTH_TOKEN is set (and OCP_TUI_HOME is unset), run the TUI
`claude` in a CREDENTIAL-FREE scratch home (<HOME>/.ocp-tui/home) that has NO
credentials.json — no symlink, no copy. The env token is then the only credential
and is authoritative because nothing shadows it. This ALSO ends the original
refresh-corruption incident (25-zombie / empty-refresh-token) at the ROOT: with no
credentials file, claude never runs the token-refresh path, so the single-use
refresh token can never be rotated/corrupted by the per-request spawn+kill cycle.

This RESOLVES — not reintroduces — the ADR 0007 scratch-home concern. The old
caveat was about a SYMLINKED credentials.json being forked on token refresh; in
env-token mode there is no credentials file to fork and no refresh ever happens.

Mechanism: scratch HOME (not CLAUDE_CONFIG_DIR). The claude binary supports
CLAUDE_CONFIG_DIR, but it relocates transcripts to <CONFIG_DIR>/projects/ rather
than <HOME>/.claude/projects/, forking the transcript-resolution rule across modes
for no benefit. Scratch-HOME reuses the existing, tested prepareTuiHome/ehome
plumbing; readTuiTranscript reads from the same home claude runs under, so
transcripts land under the scratch home and findTranscriptPath globs them there.

Backward compatible: when CLAUDE_CODE_OAUTH_TOKEN is unset, behaviour is byte-for-
byte unchanged (real home + credentials.json) so hosts that intentionally rely on
credentials.json are unaffected. Explicit OCP_TUI_HOME still wins. Onboarding +
cwd-trust are seeded in the scratch .claude.json (hasCompletedOnboarding=true +
trust ONLY the scratch cwd) so no interactive trust/onboarding dialog can hang the
turn.

Changes:
- lib/tui/session.mjs: add resolveTuiHome() (pure) + DEFAULT_TUI_SCRATCH_HOME;
  prepareTuiHome() gains { envTokenMode } — skips the credentials symlink and seeds
  a minimal .claude.json; runTuiTurn derives envTokenMode = token set && ehome!==rhome.
- server.mjs: TUI_HOME computed via resolveTuiHome(); boot log surfaces the auth mode.
- test-features.mjs: env-token credential-free prepareTuiHome test (asserts NO
  credentials.json created/symlinked, .claude.json seeded with onboarding + cwd
  trust) + 3 resolveTuiHome decision tests; existing buildTuiCmd-token + reaper +
  legacy/real-home tests stay green (245 passed, 0 failed).
- docs/adr/0007: PR-D amendment (corrects the PR-C rationale + the original
  scratch-home caveat); README Troubleshooting #401 + env-var table + TUI section.

ALIGNMENT: Class B (OCP-owned TUI spawn). cli.js has no analogue for the TUI pane's
auth/home strategy — authorized by ADR 0007 (PR-D amendment) per ALIGNMENT.md's
Class B citation requirement. No Class A wire path, no alignment.yml blacklist
token, no models.json touched. server.mjs is touched only to wire TUI_HOME via
resolveTuiHome() and surface auth mode in the boot log.

Co-Authored-By: Claude <claude-opus> <noreply@anthropic.com>
@dtzp555-max dtzp555-max merged commit 60930f0 into main Jun 13, 2026
5 checks passed
@dtzp555-max dtzp555-max deleted the fix/tui-auth-robust branch June 13, 2026 06:54
dtzp555-max added a commit that referenced this pull request Jun 13, 2026
…g 401) + defunct-session reaping (#141) (#142)

Bump 3.20.0 → 3.20.1 + CHANGELOG. Ships the already-merged, twice-reviewed #141
(credential-isolated env-token home + zombie reaping). README/docs updated in #141.

Co-authored-by: dtzp555 <dtzp555@gmail.com>
Co-authored-by: Claude <claude-opus> <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants