Skip to content

fix(smithy): strip CLAUDECODE from spawned-claude env (closes #103)#104

Open
komoreka wants to merge 3 commits into
stoneforge-ai:masterfrom
komoreka:fix/claudecode-nesting-spawn
Open

fix(smithy): strip CLAUDECODE from spawned-claude env (closes #103)#104
komoreka wants to merge 3 commits into
stoneforge-ai:masterfrom
komoreka:fix/claudecode-nesting-spawn

Conversation

@komoreka
Copy link
Copy Markdown
Contributor

@komoreka komoreka commented May 5, 2026

Summary

Closes #103.

packages/smithy/src/providers/claude/headless.ts and interactive.ts set CLAUDECODE: '1' in the env passed to the spawned claude subprocess. Modern claude versions read CLAUDECODE and refuse to start with Claude Code cannot be launched inside another Claude Code session. The Anthropic SDK surfaces this as Claude Code process exited with code 1, which the spawner swallows as the cryptic Session exited before init.

This blocks every spawn for users running stoneforge from inside a Claude Code session: anyone invoking sf from Claude Code's bash tool, or directors orchestrating workers from a Claude Code session.

The fix strips CLAUDECODE from the inherited env in both providers. Stoneforge IS the parent process; the spawned claude is a fresh top-level session, not nested. Issue #103 has the full diagnosis with a reproducer that captures the underlying SDK error.

Implementation

  • New shared helper packages/smithy/src/providers/claude/env.ts: buildClaudeSpawnEnv(processEnv, options) performs the merge, strips CLAUDECODE regardless of source (inherited or caller-supplied via options.environmentVariables), and sets STONEFORGE_ROOT when provided.
  • Both headless.ts and interactive.ts now call the helper instead of duplicating the env-build logic.
  • Tests:
    • env.bun.test.ts: 7 unit tests for the helper (CLAUDECODE strip from inherited, strip from overrides, override precedence, STONEFORGE_ROOT injection, input non-mutation).
    • headless.bun.test.ts: 2 integration tests that verify the headless provider's SDK call actually receives an env without CLAUDECODE.

Why CLAUDECODE was set in the first place

Issue #32 (closed) added CLAUDECODE: '1' to fix Windows session permission/spawn issues. The intent was "signal to claude that this is a managed session". The current claude binary's check is the opposite: "if CLAUDECODE is set, refuse to nest." Since stoneforge IS the parent (the spawned claude is fresh top-level), CLAUDECODE should not be in the child env.

Windows verification welcome but not blocking. I have not retested #32's Windows scenario. If a Windows contributor can confirm whether the unconditional strip is sufficient or whether the env needs to be platform-gated, that would be valuable. The Linux/macOS bug is reproducible 100% of the time and blocks Claude-Code-resident workflows entirely, so I'd rather ship and let any Windows regression re-file than block on verification.

Test plan

  • bun test packages/smithy/src/providers/claude — 37 pass, 0 fail (was 28 before, +9 new tests)
  • NODE_ENV=development npx turbo run typecheck --filter=@stoneforge/smithy — clean
  • NODE_ENV=development pnpm build — clean
  • Manual: with daemon running on this branch in a project, ran sf agent start <worker-id> --prompt \"...\" from inside a Claude Code session. Pre-fix: Failed to start agent: Session exited before init. Post-fix: Spawned agent <id>, Status: running.
  • Changeset added (@stoneforge/smithy: patch).

Related

🤖 Generated with Claude Code

komoreka added 3 commits May 5, 2026 16:14
… spawns

When stoneforge runs from inside a Claude Code session (any user invoking
sf from the bash tool, or a director running in Claude Code), the
spawned claude subprocess inherits CLAUDECODE=1 from process.env. The
explicit `CLAUDECODE: '1'` in the env spread re-asserts it. Recent
versions of the claude binary refuse to start when CLAUDECODE is set,
emitting "Claude Code cannot be launched inside another Claude Code
session" to stderr and exiting 1. The Anthropic SDK surfaces this as
"Claude Code process exited with code 1", which the spawner swallows as
the cryptic "Session exited before init".

Strip CLAUDECODE from the inherited env before passing to the SDK
(headless) and node-pty (interactive). Stoneforge IS the parent of the
spawned claude; the spawned claude is a fresh top-level session, not
nested.

This reverses the intent of stoneforge-ai#32 for the macOS/Linux case but keeps the
spawn-env construction shape that stoneforge-ai#32 fixed for Windows. If there is a
Windows-specific reason CLAUDECODE must be set, gate it on platform.

Refs stoneforge-ai#32 (the original Windows fix that introduced CLAUDECODE='1').
…trip tests

Move the env construction (process.env merge, overrides, CLAUDECODE strip,
optional STONEFORGE_ROOT) from headless.ts and interactive.ts into a
shared helper at packages/smithy/src/providers/claude/env.ts. Both
providers now call buildClaudeSpawnEnv() rather than duplicating the
shape.

Adds 7 unit tests for the helper (env.bun.test.ts) covering the
CLAUDECODE strip from inherited env, the strip from caller-supplied
overrides, override precedence, STONEFORGE_ROOT injection, and input
non-mutation. Also adds 2 integration-style tests in headless.bun.test.ts
to verify the headless provider actually invokes the helper correctly
(SDK options.env reflects the strip).

Refs stoneforge-ai#103.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: sf agent start fails with 'Session exited before init' when invoked from inside a Claude Code session (CLAUDECODE inherited)

1 participant