Skip to content

fix(smithy): spawn honors workspace defaultModels at session start#113

Open
komoreka wants to merge 2 commits into
stoneforge-ai:masterfrom
komoreka:fix/spawn-honors-workspace-default-model
Open

fix(smithy): spawn honors workspace defaultModels at session start#113
komoreka wants to merge 2 commits into
stoneforge-ai:masterfrom
komoreka:fix/spawn-honors-workspace-default-model

Conversation

@komoreka
Copy link
Copy Markdown
Contributor

@komoreka komoreka commented May 8, 2026

Summary

session-manager.resolveExecutablePath has a 3-tier precedence chain (agent metadata → workspace defaults → provider built-in) but model resolution skipped the middle tier — it only read agent.metadata.model, never the workspace-level defaultModels[providerName] setting.

Two structural gaps reinforced this:

  1. The server's ServerAgentDefaults type only persisted defaultExecutablePaths and fallbackChain. defaultModels and defaultProvider weren't stored at all.
  2. useAgentDefaultsSettings in smithy-web was localStorage-only. Even if the operator set "Default model = Opus" in the settings UI, the value lived in their browser only — invisible to the daemon.

So the UI and the spawn path operated on disjoint state. Operators saw the workspace setting reflected in the UI dropdown but the daemon kept spawning Claude with the SDK's built-in default. Workaround was per-agent overrides via PATCH /api/agents/<id> for every agent.

Fix

Three pieces, mirroring how the existing resolveExecutablePath already works for executable paths:

  1. Extend ServerAgentDefaults with defaultModels and defaultProvider. Persisted server-side via the existing /api/settings/agent-defaults endpoint with read + write validation. Backwards compatible with workspaces that have agent-defaults set via just defaultExecutablePaths.

  2. New resolveModel(providerName, agentModel?) helper on SessionManagerImpl. Private method analogous to resolveExecutablePath. Used in both startSession and resumeSession so fresh starts and reconnects honor the same precedence:

    • options.model (call-site override) wins
    • then agent.metadata.model
    • then defaults.defaultModels[providerName]
    • then undefined → provider/SDK built-in default
  3. Migrate useAgentDefaultsSettings to server-backed. Mirrors useExecutablePathSettings's pattern: fetch on mount, debounced PUT on change, localStorage as a hint cache only.

The CLI handler sf agent start was bypassing session-manager.startSession and calling spawner.spawn directly, so the fix is also wired into the CLI handler (second commit).

Files changed

  • packages/smithy/src/services/settings-service.ts — type extension + read/write validation
  • packages/smithy/src/services/settings-service.bun.test.ts — 7 new tests covering field persistence and validation
  • packages/smithy/src/server/routes/settings.ts — accepts and validates the new fields on PUT
  • packages/smithy/src/runtime/session-manager.tsresolveModel helper, used in both spawn paths
  • packages/smithy/src/runtime/session-manager.bun.test.ts — 6 new tests covering all resolution tiers
  • packages/smithy/src/cli/commands/agent.tssf agent start CLI also reads workspace defaults
  • apps/smithy-web/src/api/hooks/useSettings.ts — server-backed migration of the hook

Live verification

Captured during local dogfooding:

# Worker spawned via sf agent start (CLI path):
node-pty/spawn-helper ... bash -l -c 'claude' --dangerously-skip-permissions \
  --session-id 'c3ded67a-...' --model 'opus'

# Director auto-resumed by daemon after server restart (session-manager path):
/opt/homebrew/bin/claude --dangerously-skip-permissions --resume ae387c5a-... \
  --model opus Server restarted...

Both paths now pass --model from the workspace default. Inside the resulting director session, /model shows Opus (claude binary recognizes the opus alias).

Test plan

  • bun test src/services/settings-service.bun.test.ts — 33 pass
  • bun test src/runtime/session-manager.bun.test.ts -t 'model resolution' — 6 pass
  • bun test src/runtime/session-manager.bun.test.ts — 89 pass (full file, no regressions)
  • bun test packages/smithy/src — 1582 pass / 0 fail / 19 skip
  • turbo run typecheck — 16/16 pass
  • pnpm build — 13/13 pass

Migration note

Workspaces that previously set "Default model" via the localStorage-only UI will start fresh on first load after this lands (server-side store is empty). One-time disruption — users re-set in the UI and it's now persisted server-side where the daemon can read it.

Known small follow-up

The model registry endpoint (/api/providers/claude-code/models) returns an id: "default" sentinel meaning "use system default". If a user picks that sentinel in the dropdown, the literal string "default" gets passed to claude via --model default, which the binary doesn't recognize. Either filter out the "default" id at spawn time (pass undefined instead) or document the IDs the dropdown stores. Out of scope for this PR; happy to file a follow-up.

komoreka added 2 commits May 8, 2026 13:25
The session manager's resolveExecutablePath had a 3-tier fallback
(agent metadata → workspace defaults → provider built-in) but model
resolution skipped the middle tier. Agents with no per-agent model
override fell straight to the SDK default (sonnet for Claude Code)
regardless of the workspace's "default model" setting in the UI.

Concretely: every time the operator restarted sf serve smithy, the
director's claude session reverted to sonnet despite the UI showing
opus. They had to /model opus manually each time.

Three pieces:

1. Extend ServerAgentDefaults with defaultModels and defaultProvider.
   Persisted server-side via the existing /api/settings/agent-defaults
   endpoint. Validated on read and write. Backwards compatible with
   workspaces that have agent-defaults set via just defaultExecutablePaths.

2. New resolveModel(providerName, agentModel?) helper on
   SessionManagerImpl, mirroring resolveExecutablePath. Used in both
   startSession and resumeSession so fresh starts and reconnects honor
   the same precedence:
     1. options.model (call-site override)
     2. agent metadata model
     3. workspace defaults defaultModels[providerName]
     4. undefined → provider/SDK built-in default

3. Migrate useAgentDefaultsSettings in smithy-web from localStorage-only
   to server-backed (mirrors useExecutablePathSettings). The UI now
   round-trips defaultProvider + defaultModels through the daemon, so
   operator preference reaches the spawn path.

Tests cover all four resolution tiers, per-provider keying (codex
default doesn't shadow claude-code agents), the no-settingsService
fallback, and persistence through getAgentDefaults/setAgentDefaults.
The CLI handler for `sf agent start` calls spawner.spawn() directly,
bypassing session-manager.startSession where the resolveModel helper
lives. So manual `sf agent start <id>` invocations didn't pick up the
workspace defaultModels fallback even after the rest of this PR.
Daemon-driven dispatches went through session-manager and worked.

Mirror the same precedence chain in the CLI:
  1. --model flag
  2. agent metadata.model
  3. workspace defaults defaultModels[providerName]
  4. undefined → provider/SDK built-in default

Settings are read directly from the workspace's .stoneforge/stoneforge.db
via createSettingsService — no daemon round-trip needed since the CLI
runs in the workspace cwd anyway.

Verified live: ran `sf agent start <worker> --mode interactive` against
a workspace with defaultModels.claude-code='claude-opus-4-5-20251101';
ps showed the spawned claude process invoked with --model claude-opus-4-5-20251101.
The director's auto-resume after daemon restart was also captured with
the same --model flag, confirming both paths now honor the workspace
default.
@komoreka
Copy link
Copy Markdown
Contributor Author

komoreka commented May 8, 2026

Cross-reference: this PR has light overlap with issue #112 (cross-provider fallbackChain). They touch some of the same files (settings-service.ts, routes/settings.ts, useSettings.ts) but address independent gaps:

This PR works around #112's Part B (full-replace bug) on the client side: the migrated useAgentDefaultsSettings does a GET-then-PUT to preserve defaultExecutablePaths and fallbackChain it doesn't own. So my changes don't depend on Part B landing, and they don't block it either — when #112 ships its server-side merge, my client GET becomes redundant but harmless.

No conflict with PR #110 (fix/silent-rate-limit-uses-agent-provider) — that one only touches dispatch-daemon.ts, no overlap.

Happy to bundle a small server-side fix for #112's Part B into this PR if maintainers prefer fewer round-trips, or keep them separate for cleaner review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant