Bug Description
gbrain doctor --remediate can loop indefinitely / appear hung when a remediation step completes but the condition that generated that recommendation does not clear.
In my current repo-backed brain, the planner keeps recommending sync.repo because health.stale_pages remains nonzero (21) after sync/extract. The remediation loop recomputes recommendations between steps and reintroduces the same remediation ID, so the command can keep submitting/waiting on sync.repo until an external timeout kills it.
Environment
- Repo:
garrytan/gbrain
- Local package/version:
gbrain 0.36.4.0
- Engine: Supabase/Postgres-backed brain
- Source: repo-backed markdown brain at
/home/shan/brain/knowledge
- Current health before/after manual remediation:
brain_score: 84
missing_embeddings: 0
stale_pages: 21
orphan_pages: 104
link_coverage: 0.909090909090909
timeline_coverage: 0.636363636363636
Steps to Reproduce
gbrain doctor --remediation-plan --target-score 84 --max-usd 5 --json
# plan includes sync.repo because stale_pages > 0
gbrain doctor --remediate --yes --target-score 84 --max-usd 5
# command produces no useful progress output and can hang until external timeout
A safer bounded repro shows the repeat clearly:
timeout 120s gbrain doctor --remediate --yes --target-score 84 --max-usd 5 --max-jobs 2 --json
Observed output:
{
"brain_score_initial": 84,
"brain_score_final": 84,
"brain_score_target": 84,
"target_reached": true,
"submitted": [
{
"step": 1,
"id": "sync.repo",
"job_id": 5665,
"status": "completed"
},
{
"step": 2,
"id": "sync.repo",
"job_id": 5665,
"status": "completed"
}
],
"aborted_count": 0
}
Expected Behavior
After a remediation ID completes once in a doctor_run_id, one of the following should happen:
- The completed remediation ID is suppressed for the remainder of that run unless its input/content hash changes; or
- If the same check remains unhealthy after a completed remediation, the step is marked
blocked / non_clearing / failed_to_clear and the run moves on or exits; or
sync.repo should not claim to remediate stale_pages when stale_pages is not actually expected to clear via sync.
The CLI should also print progress before waiting on each job in non-JSON mode so it does not look dead.
Actual Behavior
sync.repo completes successfully.
health.stale_pages remains at 21.
- The remediation loop recomputes recommendations and reintroduces
sync.repo.
- With no
--max-jobs, the command can appear to hang / run until external timeout.
- In non-JSON mode, no useful progress is printed before/during the wait.
Relevant Code Pointers
src/commands/doctor.ts:
runRemediate() recomputes recommendations from fresh health after each step:
const freshHealth = await engine.getHealth();
recs = computeRecommendations(freshHealth, ctx).filter((r) => r.status === 'remediable');
- There does not appear to be a same-run suppression set for completed remediation IDs.
src/core/brain-score-recommendations.ts:
sync.repo fires when health.stale_pages > 0.
Workaround
Manual maintenance works fine:
gbrain sync --source default
gbrain embed --stale
gbrain extract all --dir /home/shan/brain/knowledge
gbrain doctor
That cleared missing embeddings and ran full extraction without runaway generation, but stale_pages remained 21, so autonomous remediation still wants to run sync.repo again.
Bug Description
gbrain doctor --remediatecan loop indefinitely / appear hung when a remediation step completes but the condition that generated that recommendation does not clear.In my current repo-backed brain, the planner keeps recommending
sync.repobecausehealth.stale_pagesremains nonzero (21) after sync/extract. The remediation loop recomputes recommendations between steps and reintroduces the same remediation ID, so the command can keep submitting/waiting onsync.repountil an external timeout kills it.Environment
garrytan/gbraingbrain 0.36.4.0/home/shan/brain/knowledgebrain_score: 84missing_embeddings: 0stale_pages: 21orphan_pages: 104link_coverage: 0.909090909090909timeline_coverage: 0.636363636363636Steps to Reproduce
A safer bounded repro shows the repeat clearly:
Observed output:
{ "brain_score_initial": 84, "brain_score_final": 84, "brain_score_target": 84, "target_reached": true, "submitted": [ { "step": 1, "id": "sync.repo", "job_id": 5665, "status": "completed" }, { "step": 2, "id": "sync.repo", "job_id": 5665, "status": "completed" } ], "aborted_count": 0 }Expected Behavior
After a remediation ID completes once in a
doctor_run_id, one of the following should happen:blocked/non_clearing/failed_to_clearand the run moves on or exits; orsync.reposhould not claim to remediatestale_pageswhenstale_pagesis not actually expected to clear via sync.The CLI should also print progress before waiting on each job in non-JSON mode so it does not look dead.
Actual Behavior
sync.repocompletes successfully.health.stale_pagesremains at21.sync.repo.--max-jobs, the command can appear to hang / run until external timeout.Relevant Code Pointers
src/commands/doctor.ts:runRemediate()recomputes recommendations from fresh health after each step:src/core/brain-score-recommendations.ts:sync.repofires whenhealth.stale_pages > 0.Workaround
Manual maintenance works fine:
That cleared missing embeddings and ran full extraction without runaway generation, but
stale_pagesremained21, so autonomous remediation still wants to runsync.repoagain.