Skip to content

fix(cli): require daemon transport for read commands#933

Merged
wesm merged 14 commits into
mainfrom
middleman/issue-929-agents-view-fails-to-load-history
Jul 1, 2026
Merged

fix(cli): require daemon transport for read commands#933
wesm merged 14 commits into
mainfrom
middleman/issue-929-agents-view-fails-to-load-history

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Read commands still had compatibility paths that could open SQLite directly when daemon discovery did not yield HTTP. That undermined the daemon-first migration and left upgraded users exposed to old schema/read-only failures instead of letting the writable daemon own migrations.

This makes daemon-capable reads resolve through daemon transport: they use reachable daemons, start or replace the local daemon where appropriate, and fail with restart guidance when the daemon is unreachable or incompatible instead of falling back. The remaining direct DB access is limited to commands whose contract is explicitly local, offline, or writer-owned, such as raw source export and offline archive queries.

generated by a clanker

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (2d24df4)

Summary verdict: Changes have medium-risk functional regressions in daemon-backed read paths; no security issues were reported.

Medium

  • cmd/agentsview/health.go:82
    health [session-id] no longer resolves the short IDs printed by the health list. The old path used resolveSessionID, which handled partial and ambiguity-aware matching; resolveServiceSessionID only tries exact IDs and agent-prefixed exact IDs.
    Fix: Preserve health-specific partial ID resolution, either through a service/HTTP ID-resolution endpoint or equivalent ambiguity-aware lookup before svc.Get.

  • cmd/agentsview/health.go:59
    Health list now silently filters out one-shot and automated sessions. service.ListFilter{Limit: limit} maps zero bools to ExcludeOneShot and ExcludeAutomated, while the previous db.SessionFilter{Limit: limit} included them.
    Fix: Set IncludeOneShot: true and IncludeAutomated: true when listing health sessions unless explicit filtering flags are added.

  • internal/server/huma_routes_metadata.go:65
    agentsview stats --include-github-outcomes loses GitHub PR stats over the new daemon path. The CLI still resolves GHToken, but the HTTP backend cannot send it, and the server builds StatsFilter without a token, so computeOutcomeStats never runs PR aggregation.
    Fix: When IncludeGitHubOutcomes is true, populate GHToken server-side using the existing s.githubToken(ctx) path.

  • cmd/agentsview/transport.go:193
    Read intent can return an already-running older compatible daemon without replacement. Because /api/v1/session-stats is newly added, agentsview stats after a client upgrade can hit an old daemon that lacks the route and fail with 404 instead of upgrading/restarting or giving restart guidance.
    Fix: Apply the existing older-daemon replacement check to transportIntentRead, or map missing required read endpoints to a clear daemon-restart error.


Panel: ci_default_security | Synthesis: codex, 12s | Members: codex_default (codex/default, done, 8m20s), codex_security (codex/security, done, 3m15s) | Total: 11m47s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (415d956)

Summary verdict: Two medium-severity regressions need attention; no high or critical findings were reported.

Medium

  • cmd/agentsview/transport.go:203

    • Problem: The read-intent daemon upgrade path overwrites cfg.NoSync with the old runtime's NoSync value. For --no-sync archive queries, resolveArchiveQueryTransport sets cfg.NoSync = true, but an older daemon with NoSync=false clears it here and restarts a syncing daemon.
    • Fix: Preserve an explicit no-sync request, e.g. use cfg.NoSync = cfg.NoSync || tr.Runtime.NoSync, and apply the same fix to the incompatible-runtime branch at line 223.
  • cmd/agentsview/health.go:122

    • Problem: resolveHealthSessionID resolves IDs by scanning one List page capped at 500 root sessions. Exact IDs for older sessions outside that page, or child/subagent sessions excluded from the list, are now reported as not found even though svc.Get could retrieve them.
    • Fix: Try svc.Get(ctx, partial) for exact IDs before list-based partial matching, and use a dedicated lookup or paginated path if partial matching must cover more than the first root page.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 6m10s), codex_security (codex/security, done, 3m5s) | Total: 9m24s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (6ccdfd7)

The PR has medium-severity regressions around daemon-backed read paths and ambiguous session ID resolution.

Medium

  • cmd/agentsview/health.go:122
    resolveHealthSessionID returns an exact svc.Get match before checking for short-ID collisions. If one session ID is exactly abcdef12 and another displays as short ID abcdef12, health abcdef12 silently picks the exact row instead of reporting ambiguity. Resolve partial matches before returning exact matches, preserving the existing exact-vs-shortID collision behavior.

  • cmd/agentsview/stats.go:143
    agentsview stats now always uses resolveService, so a discovered read-only pg serve daemon is preferred. The new /api/v1/session-stats endpoint returns db.ErrReadOnly for PG-backed NewReadOnlyBackend, causing stats to fail whenever pg serve is running. Either implement stats for the PG/read-only backend or make openStatsService avoid unsupported read-only daemon transports.

  • cmd/agentsview/archive_query_backend.go:75
    archiveQuerySkipReadOnlyDaemon rejects read-only daemons, breaking usage daily and usage statusline when a PG pg serve daemon is active even though the PG-backed server implements /api/v1/usage/summary. The error also suggests --pg, but these commands do not expose that flag. Let usage commands use supported read-only daemon endpoints, or add a real PG/direct fallback with accurate guidance.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 7m29s), codex_security (codex/security, done, 5m1s) | Total: 12m40s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (9891055)

agentsview stats has one medium regression around PostgreSQL daemon routing; no critical or high findings were reported.

Medium

  • cmd/agentsview/stats.go:143: stats now always resolves through the discovered daemon, but /api/v1/session-stats is only backed by local SQLite. If a read-only pg serve daemon is registered, agentsview stats now fails with read-only/not-implemented instead of using the local SQLite archive as it did before.
    • Fix: Either implement session stats for the PostgreSQL store, or have openStatsService bypass/read around read-only daemon transports until PG parity exists.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 10m6s), codex_security (codex/security, done, 5m32s) | Total: 15m46s

@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (27f1840)

Summary verdict: two medium-severity regressions need attention; no critical or high findings were reported.

Medium

  • Location: cmd/agentsview/transport.go:176
    Problem: Read-intent transport now waits for daemon startup via detectTransportContext, but only archive-write intent adopts a freshly generated auth token after another background launch finishes. With require_auth enabled, a concurrent read command that loaded config before the launcher wrote auth_token will wait for the daemon, fail to authenticate the runtime probe, and report the daemon as unreachable instead of using it.
    Fix: Apply the background-launch wait/adopt path to transportIntentRead as well, or otherwise reload/adopt auth config after waiting for an external daemon startup before probing.

  • Location: internal/server/huma_routes_metadata.go:18 and internal/service/http.go:304
    Problem: stats now uses the daemon endpoint for writable daemons, but /api/v1/session-stats is registered with the default Huma write timeout and the HTTP backend calls it through the 30s client. Large stats windows or --include-github-outcomes can legitimately run longer, introducing a timeout regression versus the previous direct local stats path.
    Fix: Register session-stats as a long-running route and use the HTTP backend’s no-timeout/long-running client for Stats.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 9m5s), codex_security (codex/security, done, 3m14s) | Total: 12m29s

…ly open

0.35 added a strict read-only schema check (#826) alongside the new
tool_calls.file_path column (#846). Read-only commands never run
migrations, so an archive last written by 0.34.x is missing that column.
When the upgrade was done while an older daemon still owned the archive,
every read-only open failed with a bare "schema missing
tool_calls.file_path" and no way forward (issue #929); reverting to
0.34.5, which predates the check, was the only escape.

OpenReadOnly now returns a typed SchemaUpgradeRequiredError, and the CLI
maps it to actionable guidance: the pending migration only runs on a
writable open, so the user must let the daemon restart to upgrade the
archive. The historical error text is preserved so existing diagnostics
still match.
Read-oriented CLI commands were still able to bypass daemon discovery and open the SQLite archive directly. That made the daemon-first migration incomplete: upgraded users could keep hitting unmigrated archives or stale daemon state through a compatibility path that is no longer supported.

Daemon-capable reads now use the discovered daemon, start or replace it when appropriate, or fail with explicit guidance instead of falling back to local DB access. The only remaining local read paths are the explicit offline/archive cases and raw source-file export behavior that do not go through the daemon transport contract.

Validation: go fmt ./...; CGO_ENABLED=1 go test -tags fts5 ./cmd/agentsview/... ./internal/service/... ./internal/server/... ./internal/db/... -count=1; CGO_ENABLED=1 go vet -tags fts5 ./...
Daemon-backed read commands need to keep the same user-facing behavior as the former direct SQLite path while still forcing reads through the daemon. Health output should include the same one-shot and automated sessions it reported before, health detail should accept the short IDs printed by that list, and stats with GitHub outcomes should use the daemon's server-side token instead of losing outcome aggregation.

Read intents also need to replace older compatible daemons, because newly added read endpoints can otherwise hit a stale process and fail with missing-route errors instead of launching the upgraded daemon.
The daemon replacement path must preserve an explicit no-sync request from archive reads instead of inheriting only the old daemon runtime flags. Otherwise replacing an older daemon can unexpectedly restart syncing work for a command that deliberately asked not to sync.

Health detail also needs exact session lookup before partial list matching, because exact IDs can refer to sessions outside the root health list page, including child or subagent sessions that remain individually addressable.
Background replacement can decide to stop an older daemon while another foreground startup is still publishing its replacement runtime. On slower Windows CI, that startup can finish before the background path observes the start lock, leaving the replacement code to act on the stale original decision.

Refresh the current replacement target immediately before stopping a daemon so newly published writable runtimes are reused or reprobed instead of being stopped through an outdated runtime record.
Daemon-backed reads need to preserve health ID semantics and avoid stale daemon endpoints even when autostart is disabled. Exact health lookup now still checks displayed short-ID collisions, and read intent reports restart guidance instead of using an older daemon that may lack new read endpoints.

Replacement refresh also keeps a previously selected target when the fresh probe temporarily cannot ping it but the runtime record remains stop-confirmed, avoiding false target changes in the stop/start path.
Session stats still depend on the SQLite-only v1 stats implementation. Routing the stats command through a read-only pg serve daemon therefore regressed local archive users by turning a previously local query into a remote not-implemented/read-only failure.

Keep writable daemon routing intact, but bypass read-only daemon transports for stats until PostgreSQL has session-stats parity. The regression test locks in that a pg serve runtime is ignored and the local SQLite archive remains the source of truth.
The Windows Go job failed in the daemon replacement test because the helper subprocess can report that it acquired the kit start lock before the parent-side probe observes that lock. The test then enters the replacement path and trips the stop guard, even though the behavior under test is the wait/reuse branch.

Make the helper return only after the parent can observe the external start lock. This keeps the coverage on the same replacement contract while removing the Windows-only synchronization race in the harness.
Read-intent commands can wait for a concurrently starting authenticated background daemon, but they previously kept the stale empty token loaded before that daemon persisted config. That made the freshly started daemon look unreachable and pushed read commands into the direct-read-only failure path.

Reload the background launch config before retrying detection when a read intent lands in direct-read-only with no token. Also bound the projects daemon request so a stalled endpoint cannot hang the CLI indefinitely.
Daemon-backed stats now exposes the session-stats API path to CLI users, so invalid filters need to remain user-correctable validation failures instead of being reported as internal server faults. Mark stats filter validation errors explicitly and map them to HTTP 400 while leaving database and aggregation failures on the internal-error path.
Health detail resolution needs to behave like the former direct archive path after read commands move behind daemon transports. Filtering the first health list page misses older substring matches, so partial ID resolution now uses a dedicated service/API lookup with SQLite, DuckDB, and PostgreSQL implementations.

The archive-query read-only daemon path also needs command-appropriate guidance: usage daily and statusline do not support --pg, so only session-usage reject paths keep that suggestion.
Health short-ID lookup should preserve the old strings.Contains semantics after moving behind the daemon API. Raw LIKE patterns made underscores and percent signs act as wildcards and let SQLite differ from PostgreSQL and DuckDB on case handling.

Use literal, case-sensitive substring predicates across stores, document the service contract, and move the resolver endpoint outside the /sessions/{id} namespace so it cannot shadow a real session ID.
The daemon-backed stats command depends on /api/v1/session-stats. Without an API version bump, an upgraded CLI can treat an already-running older daemon as compatible and fail against a missing route instead of replacing it.

Bump the daemon/server API version so endpoint capability changes participate in the existing compatibility and replacement path.
The partial-ID resolver now promises literal, case-sensitive matching across stores. Add PostgreSQL pgtest coverage for wildcard characters and case variants so PG cannot drift from SQLite and DuckDB semantics.
@mariusvniekerk mariusvniekerk force-pushed the middleman/issue-929-agents-view-fails-to-load-history branch from 5c1b53e to ade0b2a Compare June 30, 2026 23:14
@roborev-ci

roborev-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

roborev: Combined Review (ade0b2a)

Summary verdict: one medium issue remains; no high or critical findings were reported.

Medium

  • internal/service/http.go:317 - agentsview stats --include-github-outcomes still resolves a CLI-side GitHub token, but the HTTP stats backend never sends StatsFilter.GHToken to the daemon. When stats uses a discovered daemon, per-invocation AGENTSVIEW_GITHUB_TOKEN or gh auth token from the CLI can be silently ignored, so PR outcome fields may be missing unless the daemon itself has the token.

    Fix: Preserve existing CLI semantics by either using the direct local stats path when GHToken is set, or by passing the token to the local daemon through a protected header/body field and having /api/v1/session-stats prefer it. Add a test for daemon-backed stats with a CLI-provided GitHub token.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 9m57s), codex_security (codex/security, done, 3m8s) | Total: 13m15s

@mariusvniekerk

Copy link
Copy Markdown
Collaborator Author

For the latest roborev finding on agentsview stats --include-github-outcomes: accepting this as an intentional daemon credential-boundary tradeoff.

Daemon-backed stats should use daemon-side GitHub credentials. Forwarding a per-invocation CLI token through /api/v1/session-stats would expand the credential surface for additive outcome metadata, including potential exposure through request logging, tracing, debugging proxies, or future middleware. The core stats remain correct; only GitHub PR outcome enrichment depends on the daemon having its own token configured.

So the current behavior is acceptable: CLI-local GitHub tokens are not forwarded over the daemon API. Users who need daemon-backed GitHub outcome stats should configure the daemon environment/token.

generated by a clanker

@wesm wesm merged commit 9defe23 into main Jul 1, 2026
18 checks passed
@wesm wesm deleted the middleman/issue-929-agents-view-fails-to-load-history branch July 1, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants