Skip to content

Incremental update: 3 edge-fidelity bugs (rename orphans, inbound-edge pruning, importMap recovery skips non-file: nodes) #366

@Glyphe58

Description

@Glyphe58

Summary

Three independent edge-fidelity bugs in the /understand incremental update path. Each silently erodes graph correctness on incremental runs (only stderr hints, exit 0). Discovered while dogfooding understand-anything 2.7.4 on a ~1,150-file Python/FastAPI + React/TS repo; I then verified all three are still present on current main (skills/understand/SKILL.md and skills/understand/merge-batch-graphs.py, around commit 025b884).

Cross-referencing to show these are distinct from known issues: #292 is the batch-existing.json filename-regex drop; #293 (closed) is scan-result.json cleanup; #302 is the tested_by path-convention linker. The three below are separate root causes.


Bug 1 — Renamed/moved files leave orphaned old-path nodes

Root cause. Phase 0 and Phase 2 (incremental) build the changed-file list with:

git diff <lastCommitHash>..HEAD --name-only

With git's default rename detection, a rename is reported as only the new path (a single line), not both. The prune step ("Remove old nodes whose filePath matches any changed file") therefore never sees the old path, so the pre-rename node — and every edge touching it — survives as an orphan pointing at a file that no longer exists.

Repro. Baseline-scan a repo, git mv docs/a.md docs/b.md, commit, run /understand (incremental). The graph keeps a stale document:docs/a.md node (plus its edges) in addition to the new document:docs/b.md.

Observed. 7 docs moved drafts/completed/ in one incremental; --name-only listed only the 7 new paths, so the 7 old-path nodes would have persisted unless pruned by hand.

Fix. Use git diff <base>..HEAD --name-status -M (or --no-renames) and add both old and new paths to the changed/prune set — prune the old node, analyze the new path.


Bug 2 — Naive edge prune deletes inbound edges from unchanged files (never regenerated)

Root cause. Phase 2 incremental, step 2 (SKILL.md ~L380):

Remove old edges whose source or target references a removed node

When file F is modified, it's re-analyzed (regenerating its outbound edges). But edges into F from an unchanged file U (e.g. U imports/calls/tests F) are deleted because their target was removed — and U is never re-analyzed, so those edges are never regenerated. Every incremental run silently erodes the inbound edges of changed files.

Observed. In one 64-file incremental, 194 inbound edges would have been lost under the source-OR-target rule.

Fix. Prune by source only: keep an existing edge iff its source node survives. Re-analysis regenerates all outbound edges of changed files, while inbound edges from unchanged sources are preserved. The merge's existing dedup ((source,target,type)) and dangling-edge drop safely absorb any overlap or now-invalid function-level targets. (I ran exactly this rule locally — preserved the 194 edges with zero dangling/duplicate fallout.)


Bug 3 — importMap recovery only matches file:-typed nodes, dropping edges to config/doc/table nodes

Root cause. recover_imports_from_scan() in merge-batch-graphs.py (main, ~L939-966):

file_node_ids = set()
for node in assembled["nodes"]:
    if node.get("type") == "file":            # <-- only type == "file"
        file_node_ids.add(node.get("id", ""))
...
src_id = f"file:{src_path}"                    # <-- hardcoded file: prefix
...
tgt_id = f"file:{tgt_path}"                    # <-- hardcoded file: prefix
if tgt_id not in file_node_ids:
    skipped_no_tgt_node += 1
    continue

A source file the scanner classifies as config/docs/script/etc. gets a non-file: node — e.g. a settings module literally named config.py becomes config:src/demo/config.py. The recovery never finds it as a source or a target (its synthesized file:… id isn't in file_node_ids), so every import edge into/out of it is permanently dropped — surfaced only as Skipped N importMap target paths with no file: node.

Real example. src/demo/config.py (a config: node holding pydantic Settings) is imported by ~23 files; all 23 imports edges were skipped by recovery. (In my run the LLM assemble-reviewer recovered 2 of them by hand, but the deterministic pass should not have dropped them.)

Fix. Resolve source/target to the actual node id by file path across all file-level node types, not just file:. e.g.:

FILE_LEVEL = {"file","config","document","service","table","schema","resource","endpoint","pipeline"}
path_to_id = {n["filePath"]: n["id"] for n in assembled["nodes"]
              if n.get("type") in FILE_LEVEL and n.get("filePath")}
src_id = path_to_id.get(src_path)   # skip only if truly absent
tgt_id = path_to_id.get(tgt_path)

Environment

  • Plugin understand-anything 2.7.4 (installed), bugs re-verified against current main.
  • Windows, Node 24, Python 3.13.

Happy to open a PR for any/all three (the source-only prune in Bug 2 and the prefix-resolution in Bug 3 are small, self-contained changes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions