Skip to content

feat(smithy,smithy-web): dispatch stuck-queue warning (#59)#98

Open
komoreka wants to merge 8 commits into
stoneforge-ai:masterfrom
komoreka:feat/dispatch-stuck-warning
Open

feat(smithy,smithy-web): dispatch stuck-queue warning (#59)#98
komoreka wants to merge 8 commits into
stoneforge-ai:masterfrom
komoreka:feat/dispatch-stuck-warning

Conversation

@komoreka
Copy link
Copy Markdown
Contributor

@komoreka komoreka commented May 5, 2026

Summary

Closes #59. Detects and surfaces dispatch stuck-queue conditions when workers are unavailable.

  • DispatchDaemon.getDispatchHealth() reports readyUnassignedTasks, availableWorkers, stuck. Stuck = ready unassigned > 0 AND no available workers (worker = agentRole === 'worker', not disabled, session not terminated).
  • Daemon logs a single [dispatch] STUCK line on the healthy→stuck transition and a single [dispatch] RESUMED on the way back. No periodic warns to keep noise out of busy logs.
  • GET /api/daemon/status includes a health field (hasStuckQueue, readyUnassignedTasks, availableWorkers).
  • smithy-web shows a dismissible amber banner on the agents and workspaces pages when the API reports a stuck queue, polling every 5s.

Why this design

The original issue suggested either a startup-only line, periodic warns, or a --require-agents flag. After dogfooding, periodic warns drowned in heartbeats and a startup-only line missed the common case (worker dies mid-session). Transition logging gives one clear line at the moment something changes, and the banner gives a visible signal in the UI without being modal.

Pool-routing observation from the issue is filed separately as #94 to keep this PR scoped.

Test plan

  • bun test packages/smithy/src/services/dispatch-daemon.bun.test.ts (4 detection + 2 transition cases)
  • pnpm --filter @stoneforge/smithy test src/server/routes/daemon.test.ts (3 vitest cases incl. throw path)
  • Workspace turbo typecheck clean
  • Manual: started sf serve smithy against a repo with ready tasks and no attached workers, observed STUCK log + amber banner; attached a worker, observed RESUMED log + banner cleared.

🤖 Generated with Claude Code

komoreka and others added 8 commits May 5, 2026 08:55
… detection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds DispatchHealth interface to api/types.ts, extends DaemonStatusResponse
in useDaemon.ts with an optional health field (also tightens poll to 5s),
and creates the DispatchHealthBanner component that renders an amber warning
when hasStuckQueue is true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pper

Banner now takes an optional className applied to its outer element so
each mount site controls its own page-specific padding. The workspaces
route previously wrapped the banner in a div with px-6 pt-4 which left
16px of empty padding when the banner self-hides (queue healthy or
dismissed). The wrapper is gone; the banner mounts directly with
className="mx-6 mt-4" passed in.
Replaces the per-tick rate-limited warn with state-transition logging:
- Healthy → Stuck: single STUCK warn line with the count and a clear hint.
- Stuck → Healthy: single RESUMED info line confirming dispatch is flowing.
- No periodic reminders. A long-running stuck state does not spam the log.

Drops the stuckWarnTickInterval config option (was 20 ticks, 100s spacing
in production). Distinctive STUCK/RESUMED prefixes make the lines findable
in a chatty log stream.

Tests:
- Verifies STUCK warn fires on first stuck tick AND does not fire again
  on subsequent stuck ticks (was the periodic-spam regression risk).
- Verifies RESUMED info fires on the next healthy tick after stuck.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dispatch daemon silently accumulates ready tasks when no agents are attached

1 participant