Skip to content

Require explicit pg-sync source URLs#4576

Open
KyleAMathews wants to merge 9 commits into
mainfrom
require-pg-sync-url
Open

Require explicit pg-sync source URLs#4576
KyleAMathews wants to merge 9 commits into
mainfrom
require-pg-sync-url

Conversation

@KyleAMathews

@KyleAMathews KyleAMathews commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

pg-sync observations (the observe_pg_sync Horton tool) now require an explicit Electric shape endpoint URL, and the server validates that endpoint at registration time instead of failing silently. This fixes the core bug where pg-sync observations registered "successfully" but never delivered any row changes to the agent.

Root cause

Two independent defects made pg-sync observations silently dead:

  1. Per-request metadata polluted the source identity. sourceRefForPgSync() hashed the entire canonical options object, including the metadata field (principal, tenant, wakeId, runtimeConsumerId). Since wakeId is unique per wake, every re-observation minted a brand-new sourceRef → a brand-new empty durable stream → a brand-new bridge that subscribed from offset: now. The change that triggered the wake lived on the previous wake's stream, so the agent's handle DB was always empty. Old bridges leaked, one ShapeStream per wake.

  2. Registration succeeded before anything touched Electric. The server returned 200 and the tool reported success before the bridge ever connected. The real connection happened later inside the bridge's ShapeStream subscription, and any failure (wrong URL, unreachable host, nonexistent table) landed in an infinite recoverStream() backoff loop with only a server-side warn. A bare host URL like http://localhost:30000 is especially deadly: Electric answers 200 with an empty body on its root path, so even an "ok" check passed while the shape API actually lives at /v1/shape.

Approach

Stable identity. sourceRefForPgSync() now strips metadata before hashing, so identity derives only from the observed shape (url, table, columns, where, params, replica). Re-registrations reuse the same bridge and stream. The server's register() reuses this same helper, so client and server agree on sourceRef by construction.

Registration-time probe. PgSyncBridgeManager.register() fetches the shape log once before persisting a registry row or starting a bridge:

const probeUrl = buildShapeProbeUrl(source.url, options) // offset=now, same param encoding as the bridge
const response = await fetchFn(probeUrl, { signal: AbortSignal.timeout(this.probeTimeoutMs) })
if (!response.ok) throw new PgSyncSourceValidationError(/* surfaces Electric's status + body */)
if (!response.headers.get('electric-handle')) throw new PgSyncSourceValidationError(/* suggests /v1/shape */)

A genuine shape response always carries the electric-handle header — that's what distinguishes a real shape endpoint from Electric's root path. PgSyncSourceValidationError maps to HTTP 400 in the router, so the failure reaches the agent as actionable feedback (it can correct the URL/table and retry) rather than dying in a retry loop. The probe runs only for not-yet-running sources, so the hot path stays a single round-trip.

Prompt guidance. The Horton system prompt and tool descriptions now tell the model the url is an HTTP(S) shape endpoint (with example), not a postgres:// string, that there's no default, and to ask the user rather than guess.

Key invariants

  • sourceRefForPgSync(options) depends only on shape identity, never on request context. Same shape ⇒ same ref ⇒ same bridge/stream, regardless of who or which wake registers it.
  • A successful registration implies the shape endpoint was reachable and returned a real shape log (had electric-handle).
  • The probe encodes shape params identically to the bridge's ShapeStream, so "probe passed" predicts "bridge can connect" (caveat under Trade-offs).
  • No partial registration: a failed probe writes no registry row, ensures no stream, constructs no ShapeStream.

Non-goals

  • No backwards compatibility. The URL was previously optional with a server-side default; that default is removed. Registry rows persisted before this change (without a url) can never resume, so server startup now deletes them with one clear log line instead of warn-spamming every boot — affected agents must re-observe. This was an explicit decision (the user-facing contract was already broken).
  • The probe is a registration-time liveness/validity check, not a permanent guarantee — the bridge's existing retry loop still handles Electric going down later.

Trade-offs

  • Probe param encoding is a parallel implementation. buildShapeProbeUrl re-derives the Electric TS client's query encoding (comma-joined arrays, params[n]). Unlike the client it does not quote column identifiers, so probe and stream encoding can diverge for exotic column names. Documented in a comment; acceptable because the probe is a validity check, not the data path.
  • electric-handle heuristic. A proxy/CDN in front of Electric that strips response headers would fail the probe even if the shape works. Low risk, and the error message points at /v1/shape, so it's debuggable.

Verification

# All pg-sync-related suites (the 2 dashboard-asset failures in oss-server-router are pre-existing and unrelated)
pnpm vitest run \
  packages/agents-server/test/pg-sync-bridge-manager.test.ts \
  packages/agents-server/test/pg-sync-router.test.ts \
  packages/agents-server/test/manifest-side-effects.test.ts \
  packages/agents-server/test/wake-registry.test.ts \
  packages/agents-runtime/test/process-wake.test.ts \
  packages/agents-runtime/test/runtime-server-client-pg-sync.test.ts \
  packages/agents/test/observe-pg-sync-tool.test.ts --reporter=basic

# Typecheck
pnpm -F @electric-ax/agents-server -F @electric-ax/agents-runtime -F @electric-ax/agents typecheck

Manually verified end-to-end against a live Electric instance: a bare URL (http://localhost:30000) now fails fast with a /v1/shape suggestion, the agent retries with the correct URL, registration succeeds, and row changes flow through to wakes.

Files changed

Core fix

  • agents-runtime/src/observation-sources.ts — strip metadata from the identity hash.
  • agents-server/src/pg-sync-bridge-manager.tsbuildShapeProbeUrl, probeSource, PgSyncSourceValidationError, electric-handle check; delete legacy url-less rows at startup; drop the dead second param of pgSyncMessageToDurableEvent; warn when a change message is dropped.
  • agents-server/src/routing/pg-sync-router.ts — require url; map PgSyncSourceValidationError → 400; log unexpected 500s.
  • agents-server/src/entity-registry.tsdeletePgSyncBridge.
  • agents-server/src/manifest-side-effects.ts — resolve pgSync source URL from config.streamUrl; drop the tenant-less fallback path.
  • agents-runtime/src/process-wake.tswithRegisteredManifestEntry helper rebinds entities/pgSync sources to their server-assigned ref/streamUrl.
  • agents-runtime/src/setup-context.ts, types.ts — propagate streamUrl on the observation handle.
  • agents-server/src/wake-registry.ts — read pg-sync old_value into change.oldValue.

Tool & prompt

  • agents/src/tools/observe-pg-sync.ts — required url; return the server handle's streamUrl (throw if absent, no guessed fallback).
  • agents/src/agents/horton.ts — "Observing Postgres tables" prompt section.

Tests — probe failure taxonomy, sourceRef stability under differing metadata, legacy-row deletion, manifest config.streamUrl write side, wake-registry old_value mapping, router 400/500 mapping.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit 590e831.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit 590e831.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.93103% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.78%. Comparing base (176bec8) to head (590e831).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/process-wake.ts 84.84% 5 Missing ⚠️
...ckages/agents-server/src/pg-sync-bridge-manager.ts 91.80% 5 Missing ⚠️
packages/agents-server/src/entity-registry.ts 0.00% 2 Missing ⚠️
packages/agents-runtime/src/setup-context.ts 75.00% 1 Missing ⚠️
packages/agents/src/tools/observe-pg-sync.ts 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4576      +/-   ##
==========================================
- Coverage   58.35%   56.78%   -1.58%     
==========================================
  Files         370      327      -43     
  Lines       40674    38008    -2666     
  Branches    11553    10955     -598     
==========================================
- Hits        23735    21582    -2153     
+ Misses      16865    16391     -474     
+ Partials       74       35      -39     
Flag Coverage Δ
packages/agents 71.31% <80.00%> (-0.07%) ⬇️
packages/agents-mcp ?
packages/agents-mobile 75.49% <ø> (ø)
packages/agents-runtime 82.56% <84.61%> (+0.09%) ⬆️
packages/agents-server 75.03% <90.27%> (+0.17%) ⬆️
packages/agents-server-ui 6.25% <ø> (ø)
packages/electric-ax 46.42% <ø> (ø)
packages/experimental ?
packages/react-hooks ?
packages/start ?
packages/typescript-client ?
packages/y-electric ?
typescript 56.78% <87.93%> (-1.58%) ⬇️
unit-tests 56.78% <87.93%> (-1.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

KyleAMathews and others added 8 commits June 11, 2026 14:05
Exclude per-request metadata (wakeId, principal, tenant) from the
sourceRef hash so re-registrations reuse the same bridge and stream,
and fetch the shape log once before registering so bad URLs fail the
registration with Electric's error instead of dying silently in the
bridge retry loop. Add Horton prompt guidance for the now-required
shape URL.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Electric returns 200 on non-shape paths like its root, so an ok status
alone can't validate a source URL. The probe now requires the
electric-handle header and suggests the /v1/shape path when missing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Extract shared manifest-rebind helper in process-wake; log unlogged
registration 500s; delete legacy url-less pg-sync registry rows at
startup instead of warn-spamming every boot; drop the tenant-less
streamUrl fallbacks (observe tool and legacy manifest entries); warn
when change messages are dropped; remove the dead second parameter of
pgSyncMessageToDurableEvent; harden tests (manifest streamUrl write
side, wake-registry old_value mapping, fetch unstub).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant