Skip to content

fix(merge-batch-graphs): accept batch-existing.json + fail loud on unrecognized filenames (#292)#389

Open
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/merge-batch-existing-filename
Open

fix(merge-batch-graphs): accept batch-existing.json + fail loud on unrecognized filenames (#292)#389
tirth8205 wants to merge 1 commit into
Egonex-AI:mainfrom
tirth8205:fix/merge-batch-existing-filename

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Summary

The /understand skill's incremental-update path (Phase 2 of SKILL.md) writes the pruned existing nodes/edges as batch-existing.json, but merge-batch-graphs.py filtered input files with re.compile(r'batch-(\d+)(?:-part-(\d+))?\.json') and silently dropped the file because the name carries no digits. Up to ~70% of the graph (every node from unchanged files) could be lost while the script exited 0.

This PR fixes the regex and turns silent drops into a hard failure so this kind of breakage can never reappear.

Defence-in-depth changes

  • Accept batch-existing.json via a new union pattern batch-(\d+|existing)(?:-part-(\d+))?\.json. The file sorts first with a synthetic index of -1, so baseline nodes are seen before any freshly-analyzed batch. Combined with Step 5's "keep last" dedup, this means a re-analyzed node always wins over its existing-graph copy — the correct incremental-update semantics.
  • Hard exit on unrecognized filenames (was: stderr warning + silent drop). A partial merged graph is never produced when a batch file with an unknown name is present. The previous "warning" path was the exact mechanism that hid the batch-existing.json bug for so long.
  • Three new test classes in tests/skill/understand/test_merge_batch_graphs.py:
    • TestBatchExistingFilename covers the happy path: batch-existing.json alongside batch-0.json, alone, and the sort-order guarantee that fresh batches override the baseline copy.
    • TestUnrecognizedBatchFilename was rewritten to assert the new hard-exit contract (no partial graph is written).
  • No SKILL.md change — the incremental update text already correctly instructs writing batch-existing.json. The bug was purely in the merge script's regex, not in the doc, and there was no workaround text to remove.

Closes #292

Test plan

  • python3 -m unittest tests.skill.understand.test_merge_batch_graphs — 71/71 pass (was 69; +3 new in TestBatchExistingFilename, -1 obsolete in TestUnrecognizedBatchFilename)
  • Existing tests/skill/understand/test_compute_batches.test.mjs, test_scan_project.test.mjs, test_extract_import_map.test.mjs, and understand-anything-plugin/src/__tests__/merge-recover-imports.test.mjs are unaffected (none of them touch batch-existing.json or rely on the dropped warning behaviour).
  • Manual diff of script behavior: incremental update path now produces a graph containing every node from batch-existing.json plus every node from batch-<N>.json files, with fresh-batch winning on collision.

🤖 Generated with Claude Code

…recognized filenames (Egonex-AI#292)

The /understand skill's incremental update path (Phase 2) writes the
pruned existing nodes/edges as batch-existing.json, but the merge script
filtered input files with `batch-(\d+)(?:-part-(\d+))?\.json` and
silently dropped the file because it has no digits. Up to ~70% of the
graph could be lost (every node from unchanged files) while the script
exited 0.

Defence-in-depth:
1. Accept `batch-existing.json` via `batch-(\d+|existing)(?:-part-(\d+))?\.json`.
   It sorts first (synthetic index -1) so baseline nodes are seen before
   freshly-analyzed batches; combined with Step 5's "keep last" dedup,
   re-analyzed nodes always win over their existing-graph copies — the
   correct incremental-update semantics.
2. Make unrecognized filenames a hard exit instead of a stderr warning
   that callers routinely missed. A partial merged graph is never
   produced when a batch file with an unknown name is present.
3. Add tests covering batch-existing.json acceptance, sort order, and
   the new hard-exit contract for unknown filenames.

SKILL.md already instructed writing batch-existing.json — no doc change
needed beyond confirming the contract now matches the implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incremental: skill text instructs writing 'batch-existing.json' which merge regex silently drops

1 participant