fix(smithy): strip CLAUDECODE from spawned-claude env (closes #103)#104
Open
komoreka wants to merge 3 commits into
Open
fix(smithy): strip CLAUDECODE from spawned-claude env (closes #103)#104komoreka wants to merge 3 commits into
komoreka wants to merge 3 commits into
Conversation
… spawns When stoneforge runs from inside a Claude Code session (any user invoking sf from the bash tool, or a director running in Claude Code), the spawned claude subprocess inherits CLAUDECODE=1 from process.env. The explicit `CLAUDECODE: '1'` in the env spread re-asserts it. Recent versions of the claude binary refuse to start when CLAUDECODE is set, emitting "Claude Code cannot be launched inside another Claude Code session" to stderr and exiting 1. The Anthropic SDK surfaces this as "Claude Code process exited with code 1", which the spawner swallows as the cryptic "Session exited before init". Strip CLAUDECODE from the inherited env before passing to the SDK (headless) and node-pty (interactive). Stoneforge IS the parent of the spawned claude; the spawned claude is a fresh top-level session, not nested. This reverses the intent of stoneforge-ai#32 for the macOS/Linux case but keeps the spawn-env construction shape that stoneforge-ai#32 fixed for Windows. If there is a Windows-specific reason CLAUDECODE must be set, gate it on platform. Refs stoneforge-ai#32 (the original Windows fix that introduced CLAUDECODE='1').
…trip tests Move the env construction (process.env merge, overrides, CLAUDECODE strip, optional STONEFORGE_ROOT) from headless.ts and interactive.ts into a shared helper at packages/smithy/src/providers/claude/env.ts. Both providers now call buildClaudeSpawnEnv() rather than duplicating the shape. Adds 7 unit tests for the helper (env.bun.test.ts) covering the CLAUDECODE strip from inherited env, the strip from caller-supplied overrides, override precedence, STONEFORGE_ROOT injection, and input non-mutation. Also adds 2 integration-style tests in headless.bun.test.ts to verify the headless provider actually invokes the helper correctly (SDK options.env reflects the strip). Refs stoneforge-ai#103.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #103.
packages/smithy/src/providers/claude/headless.tsandinteractive.tssetCLAUDECODE: '1'in the env passed to the spawned claude subprocess. Modern claude versions readCLAUDECODEand refuse to start withClaude Code cannot be launched inside another Claude Code session. The Anthropic SDK surfaces this asClaude Code process exited with code 1, which the spawner swallows as the crypticSession exited before init.This blocks every spawn for users running stoneforge from inside a Claude Code session: anyone invoking
sffrom Claude Code's bash tool, or directors orchestrating workers from a Claude Code session.The fix strips
CLAUDECODEfrom the inherited env in both providers. Stoneforge IS the parent process; the spawned claude is a fresh top-level session, not nested. Issue #103 has the full diagnosis with a reproducer that captures the underlying SDK error.Implementation
packages/smithy/src/providers/claude/env.ts:buildClaudeSpawnEnv(processEnv, options)performs the merge, stripsCLAUDECODEregardless of source (inherited or caller-supplied viaoptions.environmentVariables), and setsSTONEFORGE_ROOTwhen provided.headless.tsandinteractive.tsnow call the helper instead of duplicating the env-build logic.env.bun.test.ts: 7 unit tests for the helper (CLAUDECODE strip from inherited, strip from overrides, override precedence, STONEFORGE_ROOT injection, input non-mutation).headless.bun.test.ts: 2 integration tests that verify the headless provider's SDK call actually receives an env without CLAUDECODE.Why CLAUDECODE was set in the first place
Issue #32 (closed) added
CLAUDECODE: '1'to fix Windows session permission/spawn issues. The intent was "signal to claude that this is a managed session". The current claude binary's check is the opposite: "ifCLAUDECODEis set, refuse to nest." Since stoneforge IS the parent (the spawned claude is fresh top-level),CLAUDECODEshould not be in the child env.Windows verification welcome but not blocking. I have not retested #32's Windows scenario. If a Windows contributor can confirm whether the unconditional strip is sufficient or whether the env needs to be platform-gated, that would be valuable. The Linux/macOS bug is reproducible 100% of the time and blocks Claude-Code-resident workflows entirely, so I'd rather ship and let any Windows regression re-file than block on verification.
Test plan
bun test packages/smithy/src/providers/claude— 37 pass, 0 fail (was 28 before, +9 new tests)NODE_ENV=development npx turbo run typecheck --filter=@stoneforge/smithy— cleanNODE_ENV=development pnpm build— cleansf agent start <worker-id> --prompt \"...\"from inside a Claude Code session. Pre-fix:Failed to start agent: Session exited before init. Post-fix:Spawned agent <id>, Status: running.@stoneforge/smithy: patch).Related
Session exited before initsurfaced the underlying claude stderr (Claude Code cannot be launched inside another Claude Code session), this would have been a 5-minute fix instead of a multi-hour debugging session.CLAUDECODE='1'(closed).🤖 Generated with Claude Code