feat(auto-enrich): Phase 1 sensor + recipe scaffold by LexClaw · Pull Request #1236 · garrytan/gbrain

LexClaw · 2026-05-20T17:51:32Z

Phase 1 of the auto-enrichment recipe (sensor + scaffold).

What this PR ships

recipes/auto-enrich.md - recipe manifest (parses cleanly via gbrain integrations show)
recipes/auto-enrich/scripts/detect_sparse.py - sensor via CLI composition (gbrain list -> get -> backlinks)
recipes/auto-enrich/scripts/auto_enrich_lib.py - Heartbeat helper + subprocess wrapper
recipes/auto-enrich/scripts/run_sensor.sh - thin shell wrapper, emits heartbeat
recipes/auto-enrich/config.yaml - tunables
recipes/auto-enrich/tests/test_detect_sparse.py - 9 tests, all passing
recipes/auto-enrich/README.md - human-readable docs

What this PR does NOT ship

Phase 2: research strategy + Cal dispatch (sub-card filed)
Phase 3: quality gate + synthesize + cron + smoke (sub-card filed)

Live verification

python3 -m pytest recipes/auto-enrich/tests/ -q -> 9 passed
gbrain integrations list shows: 'auto-enrich Nightly Cal dispatch... ACTIVE'
gbrain integrations status auto-enrich -> ACTIVE, Last event: sensor_run (0h ago)
gbrain integrations show auto-enrich parses cleanly

Known follow-up (Phase 1.5, not blocking)

The sensor's default candidate_pool_per_type=50 means ~200 candidates * 2 subprocesses each = ~400 gbrain calls per run. Hits 60s+ wall time on a large brain (63k pages). Phase 1.5 should add per-fetch timeout + concurrent execution or reduce default pool. Not blocking Phase 2.

Hard rule compliance

No em dashes in human-facing docs (HR-8)
No fabricated CLI verbs (verified against gbrain --help)
Subprocess pattern follows web_lib.py (no Python client import attempted)

Card

Phase 1 card kn7dkpzjznxhq978fkx7r7c7kh8738tz on Mission Control board.

…s via CLI composition

rayers · 2026-05-21T09:48:02Z

Reviewed locally against my brain (586K pages, 317 concept + 778 person, M365/iMessage-heavy). Phase 1 is solid — clean phase boundary, no DB writes, 9/9 pytest pass in 0.04s, architecture matches the recipes/web-to-brain pattern. Heartbeat schema lines up with what gbrain integrations show / status reads. Live sensor run with default --types concept,entity,person,company returned exactly the kind of true-sparse candidates I'd want surfaced — 30-50 char person stubs ranking at 0.98+.

A few smaller things worth surfacing in case any are useful:

1. never_enriched floors every untouched page at ~0.6, independent of body length.
On a --types concept run, top result was a true sparse 1496-char page (good). Candidates 2 and 3 were 8821-char and 9938-char concept pages — well-developed, just disconnected and never enriched. They scored 0.6 from links=0 (0.3) + never_enriched (0.3), with body contributing 0.0.

Until Phase 3 starts writing last_enriched somewhere, every page floors at 0.6 and well-developed pages without backlinks rank alongside true stubs. Options:

Gate the candidate pool with a body_length floor before scoring (skip pages > target_body_length)
Suppress the age penalty during a bootstrap window before any last_enriched exists in the corpus
Document the bootstrap behavior so the first cron runs aren't surprising

2. recipes/auto-enrich/requirements.txt would help.
pyyaml + pytest aren't declared anywhere. Cloning fresh and running python3 -m pytest recipes/auto-enrich/tests/ -q hits ModuleNotFoundError: No module named 'yaml'. Trivial to add a requirements file scoped to the recipe.

3. GBRAIN_BIN env var would help dev iteration.
auto_enrich_lib.run_gbrain hard-codes ["gbrain", *args]. For testing against a dev build I had to PATH-swap. Two-line change: os.environ.get("GBRAIN_BIN", "gbrain") at the top of the argv list.

4. Discovery path is asserted but not exercised end-to-end.
The README correctly explains the flat-manifest contract (recipes/auto-enrich.md is what loadAllRecipes walks). I couldn't verify discovery from this branch because gbrain integrations show auto-enrich runs against the installed CLI's bundle, not the PR clone. Worth a smoke-test step in the deliverable doc, or a CI check that adds the recipe and asserts gbrain integrations list shows it.

None of this blocks merge. #2 (requirements) feels like it'd be cheap to fold into this PR if you're game; #1, #3, #4 are happy to land separately or with Phase 2.

Nice work landing this with no writes and tight test coverage.

Lex added 3 commits May 20, 2026 13:43

feat(auto-enrich): scaffold recipe directory + config defaults

9951404

feat(auto-enrich): sensor detects sparse + orphan + stale entity page…

a8c57c0

…s via CLI composition

feat(auto-enrich): heartbeat writes + filesystem-based recipe discovery

6810cf6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(auto-enrich): Phase 1 sensor + recipe scaffold#1236

feat(auto-enrich): Phase 1 sensor + recipe scaffold#1236
LexClaw wants to merge 3 commits into
garrytan:masterfrom
LexClaw:feat/auto-enrich-phase1

LexClaw commented May 20, 2026

Uh oh!

rayers commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LexClaw commented May 20, 2026

What this PR ships

What this PR does NOT ship

Live verification

Known follow-up (Phase 1.5, not blocking)

Hard rule compliance

Card

Uh oh!

rayers commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants