Skip to content

feat: persistent cross-file edge overlays#357

Open
chris-asmussen wants to merge 1 commit intosafishamsi:v4from
chris-asmussen:feat/cross-file-edge-overlays
Open

feat: persistent cross-file edge overlays#357
chris-asmussen wants to merge 1 commit intosafishamsi:v4from
chris-asmussen:feat/cross-file-edge-overlays

Conversation

@chris-asmussen
Copy link
Copy Markdown

Summary

  • Adds cross_edges.json support so manually curated or LLM-generated cross-file edges survive AST-only rebuilds (_rebuild_code in watch.py)
  • Adds graphify edges CLI with list, add, remove, and prune subcommands for managing cross-file edges
  • Includes 13 tests covering rebuild injection, staleness handling, malformed JSON resilience, and all CLI subcommands

Problem

_rebuild_code() preserves semantic nodes/edges where neither endpoint is a code node, but drops cross-file edges between two code nodes. This means community detection produces hundreds of isolated 1-node communities for multi-file codebases in Ruby, TypeScript, Go, etc. where AST extraction can't resolve cross-file calls.

Approach

A persistent graphify-out/cross_edges.json overlay file stores edges that should survive rebuilds. After the existing semantic preservation logic in _rebuild_code(), the new injection block merges these edges into the result dict -- skipping edges where either node no longer exists and deduplicating against edges already present from AST extraction.

The graphify edges CLI provides CRUD operations for managing the overlay without hand-editing JSON.

Complements #315 (global symbol table) -- the symbol table handles statically resolvable calls, while this overlay handles framework conventions (Rails has_many, Django ForeignKey), dynamic dispatch, and user knowledge about architectural intent.

Closes #298

Test plan

  • test_preserves_cross_edges -- cross edges with valid nodes are injected
  • test_skips_stale_cross_edges -- edges referencing deleted nodes are silently skipped
  • test_no_cross_edges_file -- rebuild works identically without cross_edges.json
  • test_malformed_cross_edges -- invalid JSON does not crash rebuild
  • test_no_duplicate_injection -- existing AST edges are not duplicated
  • test_edges_add / test_edges_add_duplicate -- CLI add with dedup
  • test_edges_remove / test_edges_remove_not_found -- CLI remove
  • test_edges_list_empty / test_edges_list_shows_edges -- CLI list
  • test_edges_prune -- removes edges where nodes no longer exist in graph
  • test_edges_help -- usage output on missing subcommand

Add cross_edges.json support so manually curated or LLM-generated
cross-file edges survive AST-only rebuilds. This addresses the problem
where _rebuild_code() drops edges between two code nodes, leaving
multi-file codebases with hundreds of isolated 1-node communities.

Changes:
- watch.py: inject cross_edges.json edges into _rebuild_code() after
  semantic preservation, skipping stale/duplicate edges
- __main__.py: add `graphify edges` CLI with list/add/remove/prune
  subcommands for managing cross-file edges
- tests/test_cross_edges.py: 13 tests covering rebuild injection,
  staleness handling, malformed JSON resilience, and all CLI commands

Closes safishamsi#298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AST extraction: cross-file call resolution is limited to Python imports only

1 participant