Skip to content

[Bugfix #1094] Fail loud on unverifiable builder id instead of silent misroute to main#1095

Merged
waleedkadous merged 6 commits into
mainfrom
builder/bugfix-1094
Jun 24, 2026
Merged

[Bugfix #1094] Fail loud on unverifiable builder id instead of silent misroute to main#1095
waleedkadous merged 6 commits into
mainfrom
builder/bugfix-1094

Conversation

@waleedkadous

Copy link
Copy Markdown
Contributor

Summary

detectCurrentBuilderId() (packages/codev/src/agent-farm/commands/send.ts) silently fell back to the bare worktree directory name (e.g. bugfix-2461) whenever it could not resolve the canonical builder id from the workspace state.db — even though CWD was confirmed to be inside .builders/<id>/. That bare name matches no builders.id (canonical: builder-bugfix-2461), so Tower's affinity resolver (lookupBuilderSpawningArchitectundefined) drops to the "non-builder sender → main first" branch and the builder's afx send architect is silently delivered to main instead of its spawning architect.

This violates "fail fast, never implement fallbacks": a fatal environmental fault was laundered into a subtle, plausible-looking misroute.

Root Cause

In a confirmed .builders/<id>/ context, three paths returned the bare worktree name:

  1. !existsSync(dbPath) — workspace state.db missing
  2. catch { return worktreeDirName }new Database() threw (the real incident: a Node ABI mismatch, ABI 127 on PATH vs ABI 147 the native module was built for)
  3. no matching builders row

"Can't read the DB" is an error condition, not a "this isn't a builder" condition — but the function treated both identically.

Fix

  1. detectCurrentBuilderId fails loud. It now throws BuilderIdResolutionError on all three unverifiable paths instead of returning a bare name. The DB-open-failure message names the likely better-sqlite3 ABI mismatch (NODE_MODULE_VERSION) and points at reinstalling codev under the current node. Contract is now crisp: returns the canonical id, returns null (not a builder), or throws (is a builder but id unverifiable).
  2. send() aborts loudly. It wraps the detection and calls fatal() on throw, so afx send errors with a clear message rather than shipping an unverified from that Tower silently routes to main.
  3. Tower-side defense-in-depth. resolveAgentInWorkspace now emits a console.warn when a builder-shaped sender has no state.db row (and isn't a currently-registered architect) before falling through to main — so any other source of an unverified builder-like sender becomes visible instead of silent.

The happy path is unchanged: with a readable state.db, the canonical id resolves exactly as before (verified against the real main workspace state.db — this worktree resolves to builder-bugfix-1094).

Out of scope (noted for follow-up): proposal item #4's broader "one clear ABI message across the whole CLI (e.g. afx status)" — this PR surfaces the actionable ABI message on the afx send path where the bug lives.

Test Plan

  • bugfix-774-detect-builder-id.test.ts: the two tests that encoded the bare-name fallback now assert a BuilderIdResolutionError throw; added a DB-open-failure regression test (unopenable state.db ⇒ throws, does not return the bare name) plus describeStateDbOpenFailure message tests (ABI hint vs generic hint).
  • bugfix-1094-tower-guard.test.ts (new): looksLikeBuilderId heuristic + the warn-then-main-fallback path; asserts no spurious warning for architect senders or no-sender.
  • send.test.ts: isolated from CWD-based builder detection (runs from tmpdir()) so from resolves deterministically to architect; it was previously built on the now-removed silent fallback.
  • Full suite green: 167 files / 3347 tests pass, 0 fail. porch check (build + tests) passes.

Related: #758 (deferred end-to-end Tower-process affinity-routing test) — an E2E test exercising afx send architect from a real builder worktree would have caught this.

Fixes #1094

…ilent misroute

detectCurrentBuilderId() silently fell back to the bare worktree directory
name (e.g. bugfix-2461) whenever it could not read the workspace state.db
inside a confirmed .builders/<id>/ context. That non-canonical id matches no
builders.id, so Tower's affinity resolver (lookupBuilderSpawningArchitect →
undefined) drops to the "non-builder sender → main first" branch and a
builder's `afx send architect` is silently delivered to main. The real
trigger was a Node ABI mismatch making `new Database()` throw, but any
state.db open failure (corruption, permissions, lock) produced the same
silent misroute.

Fix:
- detectCurrentBuilderId now throws BuilderIdResolutionError on all three
  unverifiable paths (state.db missing / unopenable / no matching row).
  DB-open failures get an actionable message that names the likely
  better-sqlite3 ABI mismatch and points at reinstalling codev.
- send() wraps the detection and calls fatal() on throw, so afx send aborts
  loudly rather than shipping an unverified `from`.
- Tower-side defense-in-depth: resolveAgentInWorkspace warns when a
  builder-shaped sender has no state.db row (and isn't a registered
  architect) before falling through to main.

Tests:
- bugfix-774 fallback tests now assert throws; added DB-open-failure
  regression + describeStateDbOpenFailure message tests.
- new bugfix-1094-tower-guard test for looksLikeBuilderId + the warn path.
- send.test.ts: isolate from CWD-based builder detection (chdir to tmpdir)
  so `from` resolves deterministically; it was built on the old silent
  fallback.

Fixes #1094
@waleedkadous waleedkadous merged commit a57fedb into main Jun 24, 2026
6 checks passed
@waleedkadous waleedkadous deleted the builder/bugfix-1094 branch June 24, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

afx send: detectCurrentBuilderId silently falls back to bare worktree name on state.db read failure → builder messages misroute to main

1 participant