Skip to content

feat(mcp): HTTP+SSE transport with singleton server and event bus (#258)#273

Merged
kumaakh merged 73 commits into
mainfrom
feat/mcp-sse-transport
May 28, 2026
Merged

feat(mcp): HTTP+SSE transport with singleton server and event bus (#258)#273
kumaakh merged 73 commits into
mainfrom
feat/mcp-sse-transport

Conversation

@kumaakh
Copy link
Copy Markdown
Contributor

@kumaakh kumaakh commented May 19, 2026

Closes #258.

Summary

Replaces fleet's stdio-only MCP transport with a default singleton HTTP+SSE server that multiple LLM clients share, while keeping stdio as a backward-compat fallback. Both transports co-exist.

  • HTTP+SSE transport (src/services/http-transport.ts) - one McpServer per client session over StreamableHTTPServerTransport, bound to 127.0.0.1 only, default port 7523 with fallback. Multiple Claude/Gemini clients connect concurrently to one fleet service per machine.
  • Typed event bus (src/services/event-bus.ts) - internal pub/sub; events are broadcast to all connected SSE clients as notifications/message.
  • Singleton lifecycle (src/services/singleton.ts) - atomic startup claim (fs.openSync wx) prevents start races; PID + /health double-check detects a running instance; stale server.json/lock cleanup.
  • --transport http|stdio flag - default http; stdio path unchanged, no regression.
  • credential_store_set completion event - emits credential:stored the moment the OOB secret is delivered, no polling.
  • Provider install configs - apra-fleet install writes HTTP transport config for Claude, Gemini, Copilot and Codex (stdio config when --transport stdio).
  • Docs: README "Transport" section + docs/architecture.md "Transport Layer".

Transport choice

Uses StreamableHTTPServerTransport (not the deprecated SSEServerTransport). Verified both Claude Code and Gemini CLI support Streamable HTTP as of 2026-05.

Validation

  • 4 phases, doer/reviewer reviewed and APPROVED at every checkpoint.
  • Build clean; full suite 1332 passing / 6 skipped across 84 files.
  • SEA binary compatibility verified; multi-session broadcast proven end-to-end.

Follow-ups (filed as backlog)

  • LOW: em-dashes in src/services/tool-registry.ts (ASCII cleanup).
  • LOW: test indentation cleanup.
  • Deferred per plan: per-session event targeting, singleton idle-shutdown policy.

Out of scope: the Anthropic client-side change to surface notifications/message as conversation injections (external ask; server side is spec-compliant).

Generated via apra-fleet doer/reviewer sprint.

kumaakh pushed a commit that referenced this pull request May 19, 2026
Add PLAN.md with the implementation plan for making apra-fleet behave
like a normal OS service -- start/stop/restart/status verbs, per-user
service registration folded into install/uninstall, cross-platform
support for Windows (schtasks), Linux (systemd --user), and macOS
(launchd LaunchAgent), all without elevation. Extends PR #273.
Bot and others added 28 commits May 28, 2026 01:49
4-phase plan: event bus + HTTP transport, server refactor with
--transport flag, credential_store_set event wiring + install config,
and documentation. Singleton model with per-session McpServer.
CHANGES NEEDED -- 3 blocking findings:
- HIGH-1: provider mcp.json config formats underspecified in Task 7
- HIGH-2: singleton startup race condition unaddressed in Task 5
- HIGH-3: SEA binary compatibility not verified
APPROVED -- all 3 prior HIGH findings resolved:
- HIGH-1: concrete provider configs for Claude/Gemini/Copilot/Codex, port 7523
- HIGH-2: atomic startup lock via fs.openSync(path, 'wx')
- HIGH-3: SEA verification task added to Phase 1
Bot and others added 22 commits May 28, 2026 01:52
Add runStart and runStop CLI verbs. start checks for a running instance
(idempotent), uses the service manager when a unit is installed, otherwise
spawns a detached process redirected to LOG_FILE_PATH. stop posts /shutdown,
polls up to 5s, falls back to taskkill (Windows) or SIGTERM. Both wired into
src/index.ts dispatch.
Add runRestart: calls runStop then runStart. Wire into index.ts dispatch.
Also commit progress.json update for T7.
Add runStatus: reads server.json, GET /health for live metrics (version,
uptime, sessions), queries service manager for unit state. Formats output
with State/PID/Port/URL/Version/Uptime/Sessions/Service fields. Wired into
index.ts dispatch.
18 vitest tests covering start (already-running idempotent, service-managed
start, detached spawn, timeout failure), stop (not-running idempotent,
/shutdown POST, cleanup), restart (stop-then-start, idempotent when stopped),
and status (running/stopped states, service labels, health fields).
Update --help to list start/stop/restart/status verbs.
Add tests/cli-verbs.test.ts with 18 tests covering runStart (already
running idempotent, service manager path, spawn path, failure exit),
runStop (not running idempotent, /shutdown post, file cleanup), runRestart
(stop then start), and runStatus (stopped/running states, service labels,
health fields). Update --help to list start/stop/restart/status verbs.
install: in SEA+HTTP mode, register the service unit and start it as a
new numbered step after Beads. Adds Service line to the Done summary.
totalSteps updated; beads step uses baseSteps so numbering is correct.

uninstall: replace hard killApraFleet with svcMgr.stop() (graceful POST
/shutdown + poll + fallback) in the --force path; always call
svcMgr.unregister() before file removal (idempotent, tolerates not-found).
Update T10 and VERIFY P2 entries to reference 37a28b6 (spy-based rewrite
that fixed node:fs factory-mock leakage in fileParallelism:false mode).
Full suite: 86 files, 1383 passed, 13 skipped, 0 failed.
13 tests covering T11+T12: install calls register+start in SEA+HTTP mode,
skips for stdio or dev mode, warns non-fatally on register failure, shows
correct step numbering; uninstall calls stop then unregister in correct
order, skips both in dry-run, swallows unregister errors, guards server-
running check without --force.
npm run build: clean. npm test: 87 files, 1396 passed, 13 skipped, 0 failed.
@kumaakh kumaakh force-pushed the feat/mcp-sse-transport branch from 5d2203f to 5836328 Compare May 28, 2026 05:54
Bot added 2 commits May 28, 2026 02:00
… process-utils, agy transport tests (#258)

R1: wrap each step in shutdown() in its own try/catch and always call
process.exit(0), preventing SIGTERM from triggering systemd/launchd
restart loop on graceful stop.

R2: extract isPidAlive and postShutdown into src/utils/process-utils.ts;
remove 4 duplicate isPidAlive copies (stop.ts, singleton.ts,
service-manager/index.ts, task-cleanup.ts) and 2 postShutdown copies
(stop.ts, service-manager/index.ts).

R3: add --transport http and --transport stdio test cases for agy to
tests/install-multi-provider.test.ts to match the pattern used by
claude, gemini, codex, and copilot.
@kumaakh kumaakh merged commit 8cf0da0 into main May 28, 2026
7 checks passed
@kumaakh kumaakh deleted the feat/mcp-sse-transport branch May 28, 2026 06:16
kumaakh added a commit that referenced this pull request May 29, 2026
…) (#273)

* docs(mcp): implementation plan for HTTP+SSE transport (#258)

4-phase plan: event bus + HTTP transport, server refactor with
--transport flag, credential_store_set event wiring + install config,
and documentation. Singleton model with per-session McpServer.

* review: plan review for HTTP+SSE transport (#258)

CHANGES NEEDED -- 3 blocking findings:
- HIGH-1: provider mcp.json config formats underspecified in Task 7
- HIGH-2: singleton startup race condition unaddressed in Task 5
- HIGH-3: SEA binary compatibility not verified

* docs(mcp): revise plan per review -- transport decision, race fix, SEA, provider configs (#258)

* feat(mcp): typed event bus for fleet pub/sub (#258)

* chore: update progress for T1 completion

* review: plan re-review for HTTP+SSE transport (#258)

APPROVED -- all 3 prior HIGH findings resolved:
- HIGH-1: concrete provider configs for Claude/Gemini/Copilot/Codex, port 7523
- HIGH-2: atomic startup lock via fs.openSync(path, 'wx')
- HIGH-3: SEA verification task added to Phase 1

* feat(mcp): HTTP transport with multi-session support (#258)

* test(mcp): verify HTTP transport in SEA binary (#258)

* chore: mark VERIFY Phase 1 completed in progress.json

* review: Phase 1 core abstractions (#258)

* refactor(mcp): extract tool registration into shared module (#258)

* chore: mark task 5 completed in progress.json

* feat(mcp): --transport flag and dual startup paths (#258)

* feat(mcp): singleton lifecycle detection with atomic claim (#258)

* chore: update progress.json -- task 5/6 complete, VERIFY Phase 2 done

* chore: mark VERIFY server refactor + dual transport completed (#258)

* chore: record VERIFY commit SHA in progress.json (#258)

* review: Phase 2 server refactor and dual transport (#258)

* feat(mcp): emit credential:stored event on OOB secret delivery (#258)

* chore: record T7 commit SHA in progress.json (#258)

* feat(mcp): provider-specific HTTP transport install configs (#258)

* chore: record T8 commit SHA in progress.json (#258)

* test(mcp): transport integration tests + Gemini client verification (#258)

* chore: record T9 commit SHA in progress.json (#258)

* chore: mark VERIFY event wiring + client config completed (#258)

* review: Phase 3 event wiring and client config (#258)

* docs(mcp): document HTTP+SSE transport, singleton model, event bus (#258)

* chore: record T10 completion + VERIFY checkpoint results (#258)

* review: Phase 4 docs + final sprint review (#258)

* cleanup: remove fleet control files

* docs(service): OS service lifecycle implementation plan

Add PLAN.md with the implementation plan for making apra-fleet behave
like a normal OS service -- start/stop/restart/status verbs, per-user
service registration folded into install/uninstall, cross-platform
support for Windows (schtasks), Linux (systemd --user), and macOS
(launchd LaunchAgent), all without elevation. Extends PR #273.

* review: OS service lifecycle plan review

* docs(service): revise plan per review -- dev-path, branch, macOS idempotency, stop semantics

* review: OS service lifecycle plan re-review

* feat(service): POST /shutdown endpoint and service constants (#258)

* progress: mark T1 complete (ef84f92)

* feat(service): T2 ServiceManager interface and factory

ServiceManager interface (register, unregister, start, stop, query,
isInstalled) + ServiceStatus type in types.ts. Factory getServiceManager()
selects per-platform adapter (win32/linux/darwin), falling back to
NoopServiceManager on unsupported platforms. gracefulStopByServerJson()
reads server.json and POST /shutdown with 5s pid-poll + SIGTERM fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(service): add platform adapter stubs to unblock build

Minimal throw-not-implemented stubs for WindowsServiceManager,
LinuxServiceManager, MacOSServiceManager. Created by PM after token
outage interrupted fleet-dev mid-sprint. T3/T4/T5 will replace these
with real schtasks/systemd/launchd implementations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(http-transport): declare claude/channel MCP capability

Adds experimental: { 'claude/channel': {} } to the McpServer capabilities
on each session. Enables server-to-client push via notifications/claude/channel
over the existing SSE stream. POC validated: server can inject messages into
a Claude Code session unprompted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(service): add T6.5 capability logging task, mark T1/T2 complete

Extends sprint plan with T6.5 (MCP session capability logging, beads 78g):
log clientInfo, capabilities, and channel flag on session init/close.
Marks T1 (ef84f92) and T2 (9963198) as completed in progress.json.
Notes stubs committed for T3/T4/T5 to unblock build after token outage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(service): Windows Scheduled Task adapter (#258)

* feat(service): Linux systemd user unit adapter (#258)

* feat(service): macOS launchd LaunchAgent adapter (#258)

* test(service): service manager adapter unit tests (#258)

* feat(http-transport): log MCP session init and close with client caps (#258)

* progress: mark T6/T6.5 complete, update T3-T5 notes

* progress: VERIFY blocked on build/test approval

* feat(service): service manager unit tests (#258)

* progress: VERIFY passed -- 85 files, 1365 tests green

* progress: mark VERIFY id=8 completed

* review: Phase 1 platform service foundation code review

* feat(service): start and stop CLI commands (#258)

Add runStart and runStop CLI verbs. start checks for a running instance
(idempotent), uses the service manager when a unit is installed, otherwise
spawns a detached process redirected to LOG_FILE_PATH. stop posts /shutdown,
polls up to 5s, falls back to taskkill (Windows) or SIGTERM. Both wired into
src/index.ts dispatch.

* feat(service): restart CLI command (#258)

Add runRestart: calls runStop then runStart. Wire into index.ts dispatch.
Also commit progress.json update for T7.

* feat(service): status CLI command (#258)

Add runStatus: reads server.json, GET /health for live metrics (version,
uptime, sessions), queries service manager for unit state. Formats output
with State/PID/Port/URL/Version/Uptime/Sessions/Service fields. Wired into
index.ts dispatch.

* test(service): CLI verb tests and help update (#258)

18 vitest tests covering start (already-running idempotent, service-managed
start, detached spawn, timeout failure), stop (not-running idempotent,
/shutdown POST, cleanup), restart (stop-then-start, idempotent when stopped),
and status (running/stopped states, service labels, health fields).
Update --help to list start/stop/restart/status verbs.

* progress: VERIFY P2 complete -- 86 files, 1376 passed, 18 new CLI verb tests green

* feat(service): CLI verb tests and --help update (#258)

Add tests/cli-verbs.test.ts with 18 tests covering runStart (already
running idempotent, service manager path, spawn path, failure exit),
runStop (not running idempotent, /shutdown post, file cleanup), runRestart
(stop then start), and runStatus (stopped/running states, service labels,
health fields). Update --help to list start/stop/restart/status verbs.

* feat(service): extend install/uninstall with service lifecycle (#258)

install: in SEA+HTTP mode, register the service unit and start it as a
new numbered step after Beads. Adds Service line to the Done summary.
totalSteps updated; beads step uses baseSteps so numbering is correct.

uninstall: replace hard killApraFleet with svcMgr.stop() (graceful POST
/shutdown + poll + fallback) in the --force path; always call
svcMgr.unregister() before file removal (idempotent, tolerates not-found).

* progress: VERIFY P2 final -- 86 files, 1383 passed, 0 failed

Update T10 and VERIFY P2 entries to reference 37a28b6 (spy-based rewrite
that fixed node:fs factory-mock leakage in fileParallelism:false mode).
Full suite: 86 files, 1383 passed, 13 skipped, 0 failed.

* test(service): install/uninstall service integration tests (#258)

13 tests covering T11+T12: install calls register+start in SEA+HTTP mode,
skips for stdio or dev mode, warns non-fatally on register failure, shows
correct step numbering; uninstall calls stop then unregister in correct
order, skips both in dry-run, swallows unregister errors, guards server-
running check without --force.

* chore: VERIFY P3 -- install/uninstall integration complete

npm run build: clean. npm test: 87 files, 1396 passed, 13 skipped, 0 failed.

* docs(readme): document service model and start/stop/restart/status verbs (#258)

* docs(arch): document service manager architecture (#258)

* chore: VERIFY P4 -- documentation complete, 87 files 1396 passed

* fix(service): quote args in Windows bat wrapper to support paths with spaces (#258)

* fix(service): always run gracefulStop before systemd check in LinuxServiceManager (#258)

* fix(service): XML-escape path values in macOS plist builder (#258)

* fix(service): use SIGKILL not SIGTERM for force-kill fallback on Unix (#258)

* fix(service): delegate stop to ServiceManager when service is installed (#258)

* fix(service): rollback register() if start() fails during install (#258)

* test(service): update bat wrapper test to expect quoted args (#258)

* ci(llms-full): regen after rebase on main

* fix(service): address reviewer findings -- shutdown exit code, shared process-utils, agy transport tests (#258)

R1: wrap each step in shutdown() in its own try/catch and always call
process.exit(0), preventing SIGTERM from triggering systemd/launchd
restart loop on graceful stop.

R2: extract isPidAlive and postShutdown into src/utils/process-utils.ts;
remove 4 duplicate isPidAlive copies (stop.ts, singleton.ts,
service-manager/index.ts, task-cleanup.ts) and 2 postShutdown copies
(stop.ts, service-manager/index.ts).

R3: add --transport http and --transport stdio test cases for agy to
tests/install-multi-provider.test.ts to match the pattern used by
claude, gemini, codex, and copilot.

---------

Co-authored-by: Bot <bot@apra-fleet.dev>
Co-authored-by: Akhil Kumar <akhil@Akhils-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
kumaakh added a commit that referenced this pull request May 29, 2026
…) (#273)

* docs(mcp): implementation plan for HTTP+SSE transport (#258)

4-phase plan: event bus + HTTP transport, server refactor with
--transport flag, credential_store_set event wiring + install config,
and documentation. Singleton model with per-session McpServer.

* review: plan review for HTTP+SSE transport (#258)

CHANGES NEEDED -- 3 blocking findings:
- HIGH-1: provider mcp.json config formats underspecified in Task 7
- HIGH-2: singleton startup race condition unaddressed in Task 5
- HIGH-3: SEA binary compatibility not verified

* docs(mcp): revise plan per review -- transport decision, race fix, SEA, provider configs (#258)

* feat(mcp): typed event bus for fleet pub/sub (#258)

* chore: update progress for T1 completion

* review: plan re-review for HTTP+SSE transport (#258)

APPROVED -- all 3 prior HIGH findings resolved:
- HIGH-1: concrete provider configs for Claude/Gemini/Copilot/Codex, port 7523
- HIGH-2: atomic startup lock via fs.openSync(path, 'wx')
- HIGH-3: SEA verification task added to Phase 1

* feat(mcp): HTTP transport with multi-session support (#258)

* test(mcp): verify HTTP transport in SEA binary (#258)

* chore: mark VERIFY Phase 1 completed in progress.json

* review: Phase 1 core abstractions (#258)

* refactor(mcp): extract tool registration into shared module (#258)

* chore: mark task 5 completed in progress.json

* feat(mcp): --transport flag and dual startup paths (#258)

* feat(mcp): singleton lifecycle detection with atomic claim (#258)

* chore: update progress.json -- task 5/6 complete, VERIFY Phase 2 done

* chore: mark VERIFY server refactor + dual transport completed (#258)

* chore: record VERIFY commit SHA in progress.json (#258)

* review: Phase 2 server refactor and dual transport (#258)

* feat(mcp): emit credential:stored event on OOB secret delivery (#258)

* chore: record T7 commit SHA in progress.json (#258)

* feat(mcp): provider-specific HTTP transport install configs (#258)

* chore: record T8 commit SHA in progress.json (#258)

* test(mcp): transport integration tests + Gemini client verification (#258)

* chore: record T9 commit SHA in progress.json (#258)

* chore: mark VERIFY event wiring + client config completed (#258)

* review: Phase 3 event wiring and client config (#258)

* docs(mcp): document HTTP+SSE transport, singleton model, event bus (#258)

* chore: record T10 completion + VERIFY checkpoint results (#258)

* review: Phase 4 docs + final sprint review (#258)

* cleanup: remove fleet control files

* docs(service): OS service lifecycle implementation plan

Add PLAN.md with the implementation plan for making apra-fleet behave
like a normal OS service -- start/stop/restart/status verbs, per-user
service registration folded into install/uninstall, cross-platform
support for Windows (schtasks), Linux (systemd --user), and macOS
(launchd LaunchAgent), all without elevation. Extends PR #273.

* review: OS service lifecycle plan review

* docs(service): revise plan per review -- dev-path, branch, macOS idempotency, stop semantics

* review: OS service lifecycle plan re-review

* feat(service): POST /shutdown endpoint and service constants (#258)

* progress: mark T1 complete (ef84f92)

* feat(service): T2 ServiceManager interface and factory

ServiceManager interface (register, unregister, start, stop, query,
isInstalled) + ServiceStatus type in types.ts. Factory getServiceManager()
selects per-platform adapter (win32/linux/darwin), falling back to
NoopServiceManager on unsupported platforms. gracefulStopByServerJson()
reads server.json and POST /shutdown with 5s pid-poll + SIGTERM fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(service): add platform adapter stubs to unblock build

Minimal throw-not-implemented stubs for WindowsServiceManager,
LinuxServiceManager, MacOSServiceManager. Created by PM after token
outage interrupted fleet-dev mid-sprint. T3/T4/T5 will replace these
with real schtasks/systemd/launchd implementations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(http-transport): declare claude/channel MCP capability

Adds experimental: { 'claude/channel': {} } to the McpServer capabilities
on each session. Enables server-to-client push via notifications/claude/channel
over the existing SSE stream. POC validated: server can inject messages into
a Claude Code session unprompted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(service): add T6.5 capability logging task, mark T1/T2 complete

Extends sprint plan with T6.5 (MCP session capability logging, beads 78g):
log clientInfo, capabilities, and channel flag on session init/close.
Marks T1 (ef84f92) and T2 (9963198) as completed in progress.json.
Notes stubs committed for T3/T4/T5 to unblock build after token outage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(service): Windows Scheduled Task adapter (#258)

* feat(service): Linux systemd user unit adapter (#258)

* feat(service): macOS launchd LaunchAgent adapter (#258)

* test(service): service manager adapter unit tests (#258)

* feat(http-transport): log MCP session init and close with client caps (#258)

* progress: mark T6/T6.5 complete, update T3-T5 notes

* progress: VERIFY blocked on build/test approval

* feat(service): service manager unit tests (#258)

* progress: VERIFY passed -- 85 files, 1365 tests green

* progress: mark VERIFY id=8 completed

* review: Phase 1 platform service foundation code review

* feat(service): start and stop CLI commands (#258)

Add runStart and runStop CLI verbs. start checks for a running instance
(idempotent), uses the service manager when a unit is installed, otherwise
spawns a detached process redirected to LOG_FILE_PATH. stop posts /shutdown,
polls up to 5s, falls back to taskkill (Windows) or SIGTERM. Both wired into
src/index.ts dispatch.

* feat(service): restart CLI command (#258)

Add runRestart: calls runStop then runStart. Wire into index.ts dispatch.
Also commit progress.json update for T7.

* feat(service): status CLI command (#258)

Add runStatus: reads server.json, GET /health for live metrics (version,
uptime, sessions), queries service manager for unit state. Formats output
with State/PID/Port/URL/Version/Uptime/Sessions/Service fields. Wired into
index.ts dispatch.

* test(service): CLI verb tests and help update (#258)

18 vitest tests covering start (already-running idempotent, service-managed
start, detached spawn, timeout failure), stop (not-running idempotent,
/shutdown POST, cleanup), restart (stop-then-start, idempotent when stopped),
and status (running/stopped states, service labels, health fields).
Update --help to list start/stop/restart/status verbs.

* progress: VERIFY P2 complete -- 86 files, 1376 passed, 18 new CLI verb tests green

* feat(service): CLI verb tests and --help update (#258)

Add tests/cli-verbs.test.ts with 18 tests covering runStart (already
running idempotent, service manager path, spawn path, failure exit),
runStop (not running idempotent, /shutdown post, file cleanup), runRestart
(stop then start), and runStatus (stopped/running states, service labels,
health fields). Update --help to list start/stop/restart/status verbs.

* feat(service): extend install/uninstall with service lifecycle (#258)

install: in SEA+HTTP mode, register the service unit and start it as a
new numbered step after Beads. Adds Service line to the Done summary.
totalSteps updated; beads step uses baseSteps so numbering is correct.

uninstall: replace hard killApraFleet with svcMgr.stop() (graceful POST
/shutdown + poll + fallback) in the --force path; always call
svcMgr.unregister() before file removal (idempotent, tolerates not-found).

* progress: VERIFY P2 final -- 86 files, 1383 passed, 0 failed

Update T10 and VERIFY P2 entries to reference 37a28b6 (spy-based rewrite
that fixed node:fs factory-mock leakage in fileParallelism:false mode).
Full suite: 86 files, 1383 passed, 13 skipped, 0 failed.

* test(service): install/uninstall service integration tests (#258)

13 tests covering T11+T12: install calls register+start in SEA+HTTP mode,
skips for stdio or dev mode, warns non-fatally on register failure, shows
correct step numbering; uninstall calls stop then unregister in correct
order, skips both in dry-run, swallows unregister errors, guards server-
running check without --force.

* chore: VERIFY P3 -- install/uninstall integration complete

npm run build: clean. npm test: 87 files, 1396 passed, 13 skipped, 0 failed.

* docs(readme): document service model and start/stop/restart/status verbs (#258)

* docs(arch): document service manager architecture (#258)

* chore: VERIFY P4 -- documentation complete, 87 files 1396 passed

* fix(service): quote args in Windows bat wrapper to support paths with spaces (#258)

* fix(service): always run gracefulStop before systemd check in LinuxServiceManager (#258)

* fix(service): XML-escape path values in macOS plist builder (#258)

* fix(service): use SIGKILL not SIGTERM for force-kill fallback on Unix (#258)

* fix(service): delegate stop to ServiceManager when service is installed (#258)

* fix(service): rollback register() if start() fails during install (#258)

* test(service): update bat wrapper test to expect quoted args (#258)

* ci(llms-full): regen after rebase on main

* fix(service): address reviewer findings -- shutdown exit code, shared process-utils, agy transport tests (#258)

R1: wrap each step in shutdown() in its own try/catch and always call
process.exit(0), preventing SIGTERM from triggering systemd/launchd
restart loop on graceful stop.

R2: extract isPidAlive and postShutdown into src/utils/process-utils.ts;
remove 4 duplicate isPidAlive copies (stop.ts, singleton.ts,
service-manager/index.ts, task-cleanup.ts) and 2 postShutdown copies
(stop.ts, service-manager/index.ts).

R3: add --transport http and --transport stdio test cases for agy to
tests/install-multi-provider.test.ts to match the pattern used by
claude, gemini, codex, and copilot.

---------

Co-authored-by: Bot <bot@apra-fleet.dev>
Co-authored-by: Akhil Kumar <akhil@Akhils-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
kumaakh added a commit that referenced this pull request May 30, 2026
…) (#273)

* docs(mcp): implementation plan for HTTP+SSE transport (#258)

4-phase plan: event bus + HTTP transport, server refactor with
--transport flag, credential_store_set event wiring + install config,
and documentation. Singleton model with per-session McpServer.

* review: plan review for HTTP+SSE transport (#258)

CHANGES NEEDED -- 3 blocking findings:
- HIGH-1: provider mcp.json config formats underspecified in Task 7
- HIGH-2: singleton startup race condition unaddressed in Task 5
- HIGH-3: SEA binary compatibility not verified

* docs(mcp): revise plan per review -- transport decision, race fix, SEA, provider configs (#258)

* feat(mcp): typed event bus for fleet pub/sub (#258)

* chore: update progress for T1 completion

* review: plan re-review for HTTP+SSE transport (#258)

APPROVED -- all 3 prior HIGH findings resolved:
- HIGH-1: concrete provider configs for Claude/Gemini/Copilot/Codex, port 7523
- HIGH-2: atomic startup lock via fs.openSync(path, 'wx')
- HIGH-3: SEA verification task added to Phase 1

* feat(mcp): HTTP transport with multi-session support (#258)

* test(mcp): verify HTTP transport in SEA binary (#258)

* chore: mark VERIFY Phase 1 completed in progress.json

* review: Phase 1 core abstractions (#258)

* refactor(mcp): extract tool registration into shared module (#258)

* chore: mark task 5 completed in progress.json

* feat(mcp): --transport flag and dual startup paths (#258)

* feat(mcp): singleton lifecycle detection with atomic claim (#258)

* chore: update progress.json -- task 5/6 complete, VERIFY Phase 2 done

* chore: mark VERIFY server refactor + dual transport completed (#258)

* chore: record VERIFY commit SHA in progress.json (#258)

* review: Phase 2 server refactor and dual transport (#258)

* feat(mcp): emit credential:stored event on OOB secret delivery (#258)

* chore: record T7 commit SHA in progress.json (#258)

* feat(mcp): provider-specific HTTP transport install configs (#258)

* chore: record T8 commit SHA in progress.json (#258)

* test(mcp): transport integration tests + Gemini client verification (#258)

* chore: record T9 commit SHA in progress.json (#258)

* chore: mark VERIFY event wiring + client config completed (#258)

* review: Phase 3 event wiring and client config (#258)

* docs(mcp): document HTTP+SSE transport, singleton model, event bus (#258)

* chore: record T10 completion + VERIFY checkpoint results (#258)

* review: Phase 4 docs + final sprint review (#258)

* cleanup: remove fleet control files

* docs(service): OS service lifecycle implementation plan

Add PLAN.md with the implementation plan for making apra-fleet behave
like a normal OS service -- start/stop/restart/status verbs, per-user
service registration folded into install/uninstall, cross-platform
support for Windows (schtasks), Linux (systemd --user), and macOS
(launchd LaunchAgent), all without elevation. Extends PR #273.

* review: OS service lifecycle plan review

* docs(service): revise plan per review -- dev-path, branch, macOS idempotency, stop semantics

* review: OS service lifecycle plan re-review

* feat(service): POST /shutdown endpoint and service constants (#258)

* progress: mark T1 complete (ef84f92)

* feat(service): T2 ServiceManager interface and factory

ServiceManager interface (register, unregister, start, stop, query,
isInstalled) + ServiceStatus type in types.ts. Factory getServiceManager()
selects per-platform adapter (win32/linux/darwin), falling back to
NoopServiceManager on unsupported platforms. gracefulStopByServerJson()
reads server.json and POST /shutdown with 5s pid-poll + SIGTERM fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(service): add platform adapter stubs to unblock build

Minimal throw-not-implemented stubs for WindowsServiceManager,
LinuxServiceManager, MacOSServiceManager. Created by PM after token
outage interrupted fleet-dev mid-sprint. T3/T4/T5 will replace these
with real schtasks/systemd/launchd implementations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(http-transport): declare claude/channel MCP capability

Adds experimental: { 'claude/channel': {} } to the McpServer capabilities
on each session. Enables server-to-client push via notifications/claude/channel
over the existing SSE stream. POC validated: server can inject messages into
a Claude Code session unprompted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(service): add T6.5 capability logging task, mark T1/T2 complete

Extends sprint plan with T6.5 (MCP session capability logging, beads 78g):
log clientInfo, capabilities, and channel flag on session init/close.
Marks T1 (ef84f92) and T2 (9963198) as completed in progress.json.
Notes stubs committed for T3/T4/T5 to unblock build after token outage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(service): Windows Scheduled Task adapter (#258)

* feat(service): Linux systemd user unit adapter (#258)

* feat(service): macOS launchd LaunchAgent adapter (#258)

* test(service): service manager adapter unit tests (#258)

* feat(http-transport): log MCP session init and close with client caps (#258)

* progress: mark T6/T6.5 complete, update T3-T5 notes

* progress: VERIFY blocked on build/test approval

* feat(service): service manager unit tests (#258)

* progress: VERIFY passed -- 85 files, 1365 tests green

* progress: mark VERIFY id=8 completed

* review: Phase 1 platform service foundation code review

* feat(service): start and stop CLI commands (#258)

Add runStart and runStop CLI verbs. start checks for a running instance
(idempotent), uses the service manager when a unit is installed, otherwise
spawns a detached process redirected to LOG_FILE_PATH. stop posts /shutdown,
polls up to 5s, falls back to taskkill (Windows) or SIGTERM. Both wired into
src/index.ts dispatch.

* feat(service): restart CLI command (#258)

Add runRestart: calls runStop then runStart. Wire into index.ts dispatch.
Also commit progress.json update for T7.

* feat(service): status CLI command (#258)

Add runStatus: reads server.json, GET /health for live metrics (version,
uptime, sessions), queries service manager for unit state. Formats output
with State/PID/Port/URL/Version/Uptime/Sessions/Service fields. Wired into
index.ts dispatch.

* test(service): CLI verb tests and help update (#258)

18 vitest tests covering start (already-running idempotent, service-managed
start, detached spawn, timeout failure), stop (not-running idempotent,
/shutdown POST, cleanup), restart (stop-then-start, idempotent when stopped),
and status (running/stopped states, service labels, health fields).
Update --help to list start/stop/restart/status verbs.

* progress: VERIFY P2 complete -- 86 files, 1376 passed, 18 new CLI verb tests green

* feat(service): CLI verb tests and --help update (#258)

Add tests/cli-verbs.test.ts with 18 tests covering runStart (already
running idempotent, service manager path, spawn path, failure exit),
runStop (not running idempotent, /shutdown post, file cleanup), runRestart
(stop then start), and runStatus (stopped/running states, service labels,
health fields). Update --help to list start/stop/restart/status verbs.

* feat(service): extend install/uninstall with service lifecycle (#258)

install: in SEA+HTTP mode, register the service unit and start it as a
new numbered step after Beads. Adds Service line to the Done summary.
totalSteps updated; beads step uses baseSteps so numbering is correct.

uninstall: replace hard killApraFleet with svcMgr.stop() (graceful POST
/shutdown + poll + fallback) in the --force path; always call
svcMgr.unregister() before file removal (idempotent, tolerates not-found).

* progress: VERIFY P2 final -- 86 files, 1383 passed, 0 failed

Update T10 and VERIFY P2 entries to reference 37a28b6 (spy-based rewrite
that fixed node:fs factory-mock leakage in fileParallelism:false mode).
Full suite: 86 files, 1383 passed, 13 skipped, 0 failed.

* test(service): install/uninstall service integration tests (#258)

13 tests covering T11+T12: install calls register+start in SEA+HTTP mode,
skips for stdio or dev mode, warns non-fatally on register failure, shows
correct step numbering; uninstall calls stop then unregister in correct
order, skips both in dry-run, swallows unregister errors, guards server-
running check without --force.

* chore: VERIFY P3 -- install/uninstall integration complete

npm run build: clean. npm test: 87 files, 1396 passed, 13 skipped, 0 failed.

* docs(readme): document service model and start/stop/restart/status verbs (#258)

* docs(arch): document service manager architecture (#258)

* chore: VERIFY P4 -- documentation complete, 87 files 1396 passed

* fix(service): quote args in Windows bat wrapper to support paths with spaces (#258)

* fix(service): always run gracefulStop before systemd check in LinuxServiceManager (#258)

* fix(service): XML-escape path values in macOS plist builder (#258)

* fix(service): use SIGKILL not SIGTERM for force-kill fallback on Unix (#258)

* fix(service): delegate stop to ServiceManager when service is installed (#258)

* fix(service): rollback register() if start() fails during install (#258)

* test(service): update bat wrapper test to expect quoted args (#258)

* ci(llms-full): regen after rebase on main

* fix(service): address reviewer findings -- shutdown exit code, shared process-utils, agy transport tests (#258)

R1: wrap each step in shutdown() in its own try/catch and always call
process.exit(0), preventing SIGTERM from triggering systemd/launchd
restart loop on graceful stop.

R2: extract isPidAlive and postShutdown into src/utils/process-utils.ts;
remove 4 duplicate isPidAlive copies (stop.ts, singleton.ts,
service-manager/index.ts, task-cleanup.ts) and 2 postShutdown copies
(stop.ts, service-manager/index.ts).

R3: add --transport http and --transport stdio test cases for agy to
tests/install-multi-provider.test.ts to match the pattern used by
claude, gemini, codex, and copilot.

---------

Co-authored-by: Bot <bot@apra-fleet.dev>
Co-authored-by: Akhil Kumar <akhil@Akhils-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: switch MCP transport from stdio to HTTP+SSE for server-push and event-driven workflows

1 participant