Skip to content

fix(autopilot): scope lock file to GBRAIN_HOME#1227

Closed
rafaelreis-r wants to merge 7 commits into
garrytan:masterfrom
rafaelreis-r:fix/autopilot-lock-per-gbrain-home
Closed

fix(autopilot): scope lock file to GBRAIN_HOME#1227
rafaelreis-r wants to merge 7 commits into
garrytan:masterfrom
rafaelreis-r:fix/autopilot-lock-per-gbrain-home

Conversation

@rafaelreis-r
Copy link
Copy Markdown

Fixes #1226.

Problem

gbrain autopilot hardcodes the lock file to $HOME/.gbrain/autopilot.lock, ignoring GBRAIN_HOME. When multiple brains share a host, only one autopilot ever runs — the others silently exit on every launchd/systemd respawn, producing tens of thousands of invisible failures.

See #1226 for the full story (46,388 silent failures observed in ~3 days on our dual-brain setup).

Fix

Two-line change: derive the lock path from GBRAIN_HOME when set, fall back to $HOME/.gbrain for single-brain installs.

const gbrainHome = process.env.GBRAIN_HOME || join(process.env.HOME || '', '.gbrain');
const lockPath = join(gbrainHome, 'autopilot.lock');

Same fallback updates applied to the mkdirSync call so the directory creation is also scoped correctly.

Verification

  • bun run typecheck clean.
  • Tested locally on a dual-brain mac-mini setup: both autopilots now coexist with separate autopilot.lock files under their respective GBRAIN_HOME dirs.
  • Single-brain installs unaffected (env var unset → same fallback path as before).

Compatibility

  • Backwards compatible: existing single-brain setups behave identically.
  • No schema or config changes.
  • Stale lock takeover logic (>10min mtime) unchanged.

rafaelreisR and others added 7 commits May 17, 2026 17:11
Adds src/core/fts-language.ts with getFtsLanguage(), a centralized helper
that reads the GBRAIN_FTS_LANGUAGE env var (default 'english').

Refactors postgres-engine.ts and pglite-engine.ts to use the helper in
their search queries, replacing four hardcoded 'english' literals across
searchKeyword and searchKeywordChunks.

Why: non-English brains lose stemming and stop-word removal because the
tokenizer is wired to English regardless of content language. A user
storing Portuguese pages currently gets dramatically worse keyword
search than an equivalent English brain. This PR fixes the *query side*
of the problem with zero behavior change for the default case.

The trigger functions in schema.sql/schema-embedded.ts/pglite-schema.ts
still hardcode 'english' for the write side — that's covered in a
follow-up PR (recreate triggers idempotently from a migration).

Validation:
- VALID_CONFIG_NAME regex (/^[a-z][a-z0-9_]*$/) blocks SQL injection
  since Postgres tsvector functions don't accept parameterized config
  names — the value must be interpolated into the query string.
- Invalid values fall back to 'english' with a one-time warning.
- Cached after first read; tests reset via resetFtsLanguageCache().

Tests: 14 unit tests covering defaults, cache, validation rules, and
SQL-injection guard.

Backward-compatible: 100% — default behavior identical when env unset.
(cherry picked from commit 43ffe13)
…language

Builds on PR garrytan#1 (GBRAIN_FTS_LANGUAGE env var) by extending configurability
to the *write side*: the trigger functions that populate
pages.search_vector and content_chunks.search_vector now use the language
from getFtsLanguage() instead of hardcoded 'english'.

Implementation: schema migration v33 (handler-based, not static SQL).
The handler reads getFtsLanguage() at apply time and issues
CREATE OR REPLACE FUNCTION for the two trigger functions, atomically
swapping their bodies. The triggers themselves don't need recreation
because they reference the function by name.

Backfill: when the configured language differs from 'english', v33
also re-tokenizes existing rows under the new tokenizer (UPDATE-to-self
on pages, direct UPDATE on content_chunks). Skipped for 'english' to
avoid wasted I/O when defaults are kept.

Validation strategy: the language string flows through getFtsLanguage(),
which enforces /^[a-z][a-z0-9_]*$/ before interpolation \u2014 SQL injection
is structurally impossible. Tests include a deliberate injection attempt
('english\'; DROP TABLE pages; --') that verifies the fallback to
'english' kicks in and no DROP TABLE appears in any emitted SQL.

Validated against a real Postgres brain (2782 pages, 4372 chunks):
  - apply-migrations succeeds with GBRAIN_FTS_LANGUAGE=pt_br
  - search 'opera\u00e7\u00f5es' (with diacritics) returns hits using pt_br stemmer
  - re-running migrate is idempotent (CREATE OR REPLACE)
  - re-running with same env is a no-op (version stays 33)

Tests: 7 unit tests covering registration, handler shape, default-vs-non-default
backfill behavior, and SQL injection guard. Combined with PR garrytan#1's helper
tests (14): 21/21 pass.

Limitation: changing GBRAIN_FTS_LANGUAGE *after* v33 has been applied
requires resetting config.version to 32 to re-apply (documented in README).
PR garrytan#3 in this series introduces 'gbrain reindex --search-vector' to
recreate-and-backfill on demand without the version-stamp dance.

Backward-compatible: 100% \u2014 default GBRAIN_FTS_LANGUAGE='english' produces
identical trigger output to the pre-v33 schema.

(cherry picked from commit d73b7e1)
Completes the GBRAIN_FTS_LANGUAGE story (PRs garrytan#1, garrytan#2 in this series) by
giving users an explicit way to recreate FTS trigger functions and
backfill existing rows after changing the language env var.

Why: schema migration v33 (PR garrytan#2) stamps the trigger functions with
GBRAIN_FTS_LANGUAGE on first apply and then the migrations runner
considers v33 'done'. Users who later change the env var would need
to manually reset config.version to re-trigger v33 \u2014 fragile and
undocumented. This CLI command is the documented escape hatch:
explicit, gated, idempotent.

Behavior:
  - Reads GBRAIN_FTS_LANGUAGE via the same getFtsLanguage() helper as
    the engines and v33 migration, so all three sources of truth stay
    in lockstep.
  - --dry-run shows row counts (pages + chunks affected) without
    touching the DB.
  - --yes / -y skips interactive prompt; required in non-TTY contexts.
  - --json emits a structured result envelope (status, language, counts,
    durationMs) for scripting.
  - Trigger recreate is atomic via CREATE OR REPLACE FUNCTION, so the
    two writes are individually atomic; backfill is two UPDATEs (pages
    UPDATE-to-self re-fires the trigger; content_chunks gets a direct
    vector compute).

Validated against a real Postgres brain (2782 pages, 4372 chunks):
  - --dry-run reports correct counts, exits 0 without writes
  - --yes completes in ~7-8s, search 'opera\u00e7\u00f5es' continues to work
    afterward
  - --json output parses cleanly

Tests: 6 unit tests covering --dry-run shortcuts, default vs non-default
language behavior, SQL injection guard (same as PRs garrytan#1/garrytan#2), and edge
cases (empty inventory, durationMs presence). With PRs garrytan#1+garrytan#2:
27/27 unit tests pass.

Trade-offs considered:
  - Could persist language in the config table instead of relying on
    env var. Decided against: env var is the established pattern in
    GBrain (GBRAIN_EMBED_MODEL, GBRAIN_BRAIN_ID, GBRAIN_DATABASE_URL
    etc.) and adding a config table row creates ambiguity about which
    wins (env vs DB). Single source of truth via env is simpler.
  - Could auto-detect language drift (compare configured vs trigger
    body in pg_proc) and warn at startup. Out of scope for this PR;
    file as a follow-up if there's demand.

Backward-compatible: command is additive. Default behavior of the
brain (with no language env var set) is unchanged.

(cherry picked from commit adf11ec)
… detection, evals, E2E, filing rules, USAGE docs

(cherry picked from commit 396aba4)
The lock file path was hardcoded to $HOME/.gbrain/autopilot.lock,
ignoring GBRAIN_HOME. When two brains share a host (e.g. main brain
and a side brain), only the first autopilot to acquire the lock runs;
the second sees a fresh lock (<10min) and silently exits with code 0.

Under launchd KeepAlive=true + ThrottleInterval=5s, the second
autopilot enters a respawn loop that produces no work but is invisible
because the exit is clean. We observed 46,388 silent failures in 3
days on a dual-brain setup before tracing it to this lock.

Fix: derive the lock path from GBRAIN_HOME when set, falling back to
$HOME/.gbrain. Each brain now gets its own lock and they coexist.
@garrytan
Copy link
Copy Markdown
Owner

Closing in favor of #1253 (v0.37.7.0 fix wave) which re-implements the same fix against current master via gbrainPath('autopilot.lock') — the canonical GBRAIN_HOME-aware helper from src/core/config.ts. Same fix shape, same outcome.

The implementation lives at src/commands/autopilot.ts:122-130 with a regression test at test/autopilot-lock-path.test.ts. Your contribution is acknowledged via Co-Authored-By: rafaelreis-r trailer. Thank you for the quick patch — closing #1226 honestly.

@garrytan garrytan closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

autopilot lock file ignores GBRAIN_HOME — second brain silently respawns forever

4 participants