feat(health): restore /health endpoint (port from v1)#2619
Open
johnmathews wants to merge 1 commit into
Open
Conversation
Restores v1's /health surface that was dropped in the v2 rewrite. Three modules: - src/health.ts — pure-function snapshot composer producing channel/queue/ task/cursor-age status. Takes injected dependencies (HealthDeps) so the pure-assembly half is trivially testable; the I/O-bound side lives in health-snapshot.ts. Exposes formatHealthText() for chat-side reuse. - src/health-snapshot.ts — supplies the runtime I/O (channel registry, delivery loop state, DB, per-session inbound DBs) and hands it to collectHealth(). Walks active session inbound DBs once per call to derive task counts; v2 has no central task table, so kind='task' rows in messages_in are the source of truth. - src/health-server.ts — loopback HTTP server on 127.0.0.1, 200/503 from snapshot.healthy, no caching. Port from HEALTH_PORT env var, default 3002. Wiring in src/index.ts: startHealthServer(port, snapshotHealth) on startup, server.close() in shutdown(). delivery.ts and host-sweep.ts expose getDeliveryPollsRunning() / isHostSweepRunning() — tiny additive accessors the snapshot uses to report messageLoopRunning honestly. The endpoint is loopback-only by design: the webhook server on 0.0.0.0:3000 is what's externally reachable; /health is for local probes and host-side status commands, never public. Tests: 19 in src/health.test.ts + 4 in src/health-server.test.ts.
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores v1's
/healthendpoint that was dropped in the v2 rewrite. Loopback-only HTTP probe — no public surface — composing channel/queue/task/cursor-age status from existing runtime state. Production-hardened on my fork (multi-hour uptime). Closer to a regression fix than a new feature.What's in the PR
New host modules:
src/health.ts— pure-function snapshot composer (collectHealth) plus a text formatter (formatHealthText) that's useful for any host-side status command. Takes injected dependencies (HealthDeps) so the pure-assembly half is trivially unit-testable.src/health-snapshot.ts— supplies the runtime I/O (channel registry, delivery/sweep loop state, central DB, per-session inbound DBs) and hands it tocollectHealth. v2 has no central task table, so it walks each active session'sinbound.dbonce per request, countingkind='task'rows for active/paused/recent-failures and minimumprocess_afterfor the next scheduled run. Trivially fast at single-digit session counts.src/health-server.ts—http.createServerbound to127.0.0.1. Returns 200/503 fromsnapshot.healthy, no caching. Port fromHEALTH_PORTenv var, default3002.Modified host modules (additive only):
src/index.ts— wiresstartHealthServer(port, snapshotHealth)after the CLI socket server inmain()and tears it down at the top ofshutdown(). No other lifecycle changes.src/host-sweep.ts— addsisHostSweepRunning(): boolean(5 LOC). Pure accessor for the existingrunningmodule flag, read by the snapshot so it can reportmessageLoopRunninghonestly.src/delivery.ts— addsgetDeliveryPollsRunning(): boolean(5 LOC). Pure accessor for the existingactivePolling && sweepPollingflags.Why loopback-only
The webhook server on
0.0.0.0:3000is the only externally reachable surface;/healthis for local probes (process supervisor, systemd, future host-side status commands) and intentionally never public. Binding127.0.0.1makes that explicit at the kernel level rather than relying on documentation.Stats
Tests
src/health.test.ts(snapshot composition, age formatter, text formatter, edge cases)src/health-server.test.ts(200 OK, 503 unhealthy, 404 non-/health, 500 on snapshot throw)pnpm exec tsc --noEmit: cleanPaired-but-not-included: systemd watchdog
On my fork this commit was bundled with a
src/watchdog.tsmodule that sendssd_notifyREADY=1/WATCHDOG=1/STOPPING=1, and asetup/service.tschange addingType=notify+WatchdogSec=30sto the systemd unit. I'm deliberately not upstreaming that half:Whether to add
sd_notifysupport is a separate design decision — it should land (or not) as its own PR with the unit-file change attached.