Skip to content

fix(coordinator): make bootstrap non-blocking#63

Merged
xnoto merged 18 commits intomainfrom
fix/coordinator-bootstrap-nonblocking
Mar 7, 2026
Merged

fix(coordinator): make bootstrap non-blocking#63
xnoto merged 18 commits intomainfrom
fix/coordinator-bootstrap-nonblocking

Conversation

@xnoto
Copy link
Owner

@xnoto xnoto commented Mar 3, 2026

Summary

Overhaul session discovery, coordinator startup, and daemon lifecycle to fix broken TUI detection and eliminate token-wasting message floods.

Session Discovery (SQLite)

  • Replace HTTP API and CLI session discovery with direct SQLite queries against OpenCode's shared database (~/.local/share/opencode/opencode.db)
  • Remove XDG_CONFIG_HOME isolation from hub server — it prevented TUI instances from connecting
  • Move coordinator model to project-level opencode.json instead of hub server config isolation
  • Remove AGENT_HUB_COORDINATOR_MODEL env var (breaking change)

Startup Cleanup

  • Purge all stale agent registration files on daemon startup — agents are ephemeral and re-register via MCP
  • Only orient sessions created AFTER daemon start (removed 24-hour SESSION_RECENT_WINDOW_MS fallback)
  • Scope message delivery to post-startup sessions only, preventing injection into dead TUIs

Coordinator Bootstrap

  • Make coordinator startup non-blocking with async bootstrap instruction queueing
  • Add readiness verification with configurable timeouts (AGENT_HUB_COORDINATOR_READY_TIMEOUT)
  • Refresh AGENTS.md from templates by default (opt-out via AGENT_HUB_COORDINATOR_PRESERVE_LOCAL_AGENTS_MD)

Watch Dashboard

  • Show archived messages in Recent Messages panel (previously only showed pending messages which disappear in seconds)
  • Add markers: unread, read, archived
  • Watch archive directory for live updates

Session GC

  • gc_session_agents() now cleans mappings for sessions inactive beyond AGENT_STALE_SECONDS, not just missing ones
  • Consistent staleness checks across oriented sessions and agent mappings

Code Cleanup

  • Remove dead _parse_session_id_from_json_output() and its 8 tests
  • Add test coverage for get_sessions_from_db() (SQLite discovery)
  • Simplify README (688 → 253 lines)
  • Add integration testing guide to CONTRIBUTING.md

Breaking Changes

  • AGENT_HUB_COORDINATOR_MODEL env var removed. Set model in ~/.agent-hub/coordinator/opencode.json.
  • Session discovery no longer uses hub server HTTP API. Queries SQLite directly.

Validation

  • uv run pytest — 150 passed
  • uv run ruff check && ruff format --check — clean
  • Live tested: daemon restart with 22 stale agents and 141 SQLite sessions → 0 spurious orientations, 0 message floods. New TUI oriented cleanly with 3 injections.

🤖 Generated with Claude Code

xnoto and others added 18 commits March 3, 2026 09:51
…efault model

- Remove COORDINATOR_MODEL setting; coordinator now uses global default
- Add _setup_hub_server_config() to isolate hub server config via XDG_CONFIG_HOME
- Update start_hub_server() to use custom config with opencode/minimax-m2.5-free default
- Add get_sessions_from_cli() and _merge_session_sources() for hybrid session discovery
- Update get_sessions_uncached() to merge HTTP API and CLI session sources
…st_watch formatting

The coordinator model is now configured via hub server's isolated opencode.json
config rather than the AGENT_HUB_COORDINATOR_MODEL env var. Updated docs to
reflect this change and fixed ruff formatting in test_watch.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… isolation

The hub server's HTTP listing API only returns sessions it manages
internally — TUI sessions created by independent processes are invisible.
This broke detection and orientation of new TUI sessions entirely.

Changes:
- Replace CLI and HTTP session discovery with direct SQLite queries
  against OpenCode's shared database (~/.local/share/opencode/opencode.db)
- Remove XDG_CONFIG_HOME isolation from hub server startup so TUI
  instances discover and connect to it for prompt_async injection
- Move coordinator model setting to project-level opencode.json
  (contrib/coordinator/opencode.json) instead of hub server config
- Remove _setup_hub_server_config(), get_sessions_from_cli(),
  _merge_session_sources() — replaced by get_sessions_from_db()

Verified: daemon now detects new TUI sessions within 5 seconds and
successfully injects orientation messages + coordinator notifications.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…discovery tests

Remove unused function and its 8 tests. Add test coverage for
get_sessions_from_db, the new primary session discovery mechanism.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ture

Document manual testing workflow (daemon + watch + TUI sessions) and
update architecture overview to reflect SQLite-based session discovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cut from 688 to 253 lines. Remove verbose How It Works subsections,
ASCII diagrams, test results, metrics tables, and table of contents.
Keep practical content: prerequisites, installation, configuration,
known limitations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e sessions

Watch dashboard now shows recent messages from both pending and archive
directories, with markers distinguishing unread (●), read ( ), and
archived (▪) messages. Also watches archive dir for live updates.

Session agent GC now considers sessions stale if their time_updated is
older than AGENT_STALE_SECONDS, not just if they're missing from the DB.
This ensures dead TUI sessions (closed but not archived in SQLite) get
cleaned up within the stale threshold.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the 24-hour SESSION_RECENT_WINDOW_MS fallback that would orient
pre-existing sessions if they were "recently updated." On daemon restart,
every non-archived session in SQLite was getting oriented — triggering
coordinator messages and wasting tokens on dead sessions.

Now strictly: created_ms < DAEMON_START_TIME_MS → skip. Clean slate on
every daemon restart with zero token waste on historical sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes to prevent token-wasting message floods on daemon restart:

1. Purge all agent registration files on startup. Agents are ephemeral
   and re-register via MCP — stale files from previous runs caused the
   coordinator to broadcast to dead agents.

2. Filter message delivery to only sessions created after daemon start
   (plus coordinator). Prevents injecting into old dead TUI sessions
   even if a stale agent somehow references them.

Tested: daemon restart with 22 stale agents and 141 sessions in SQLite
resulted in 0 spurious orientations and 0 message floods. New TUI
session was oriented cleanly with 3 total injections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xnoto xnoto merged commit 211b0a6 into main Mar 7, 2026
14 checks passed
@xnoto xnoto deleted the fix/coordinator-bootstrap-nonblocking branch March 7, 2026 03:44
xnoto pushed a commit that referenced this pull request Mar 7, 2026
🤖 I have created a release *beep* *boop*
---


##
[1.3.2](v1.3.1...v1.3.2)
(2026-03-07)


### Bug Fixes

* **coordinator:** make bootstrap non-blocking
([#63](#63))
([211b0a6](211b0a6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant