fix(rust): resolve `use crate::...` into real cross-file edges by BTCB · Pull Request #330 · safishamsi/graphify

BTCB · 2026-04-14T05:18:16Z

The Rust extractor's use_declaration handler was emitting a single edge per use statement to a stem-only node ID like _make_id("types"), which never matched any real node and got garbage-collected by the dangling-edge filter. Net result: zero cross-file edges for Rust workspaces, so every lib.rs / types.rs / model.rs showed up as an orphan weakly-connected component with degree=1.

On a 23-crate real-world Rust workspace, this meant:

422 weakly-connected "orphan" nodes (mostly type-definition modules)
same-file vs cross-file edge ratio of 52:1 (healthy Rust projects should be ~3-5:1)
edge relation histogram: 8681 calls + 4970 contains + only 4 uses
orphaned type modules disconnected from their in-crate consumers despite sitting in the same crate — e.g. crates/graduation/types.rs had NO PATH to crates/graduation/manager.rs even though manager.rs contains use crate::types::{ModeTransition, StrategyMode, StrategyRuntime};

Fix

Two-pass resolution, mirroring the Python cross-file resolver's structure:

Per-file pass (extract_rust): each use_declaration is parsed and recorded on the result dict's new _rust_uses list. The parser handles:
- use crate::foo::Bar; (single ident)
- use crate::foo::{A, B as C}; (brace group + aliases)
- use crate::foo::*; (glob — skipped,
  can't resolve statically)
- pub use foo::{A, B}; in lib.rs / mod.rs (relative re-export
  of sibling module)
- Filters out self::, super::, and external-crate paths.
- No edge is emitted in this pass — targets live in other files.
Cross-file pass (_resolve_cross_file_rust_imports): builds a global label → [(node_id, source_file)] index across all parsed Rust files, then for each recorded use, resolves each imported identifier to a concrete target node. Prefers candidates whose source_file matches the use's module prefix (so use crate::types::Foo picks a Foo defined in a types.rs). Falls back to first non-self candidate if no prefix match.

Produces two new edge relations:

uses (confidence_score 0.95) for ordinary imports
reexports (confidence_score 1.0) for pub use in lib.rs / mod.rs

Both are INFERRED, not EXTRACTED, because tree-sitter alone cannot fully verify type identity the way rust-analyzer would — two different crates may define identically-named types and the heuristic picks a best-effort candidate.

Cache schema bump

cache.py now mixes a _CACHE_SCHEMA_TAG = b"v2" constant into every file hash. Pre-existing cache entries (which lack the new _rust_uses field) silently stop matching and get re-extracted on next run. No manual rm -rf graphify-out/cache needed. Bump this tag whenever the extractor output schema changes again.

Specifically on the graduation crate (6 files, shown in commit-msg narrative above): 0 cross-file edges → 15 uses + 17 reexports = 32 new edges. StrategyRuntime, ModeTransition, GraduationScorecard now correctly path to StrategyLifecycleManager in 1-2 hops.

Tests

Added 5 new unit tests in test_multilang.py covering a minimal multi-file Rust fixture (tests/fixtures/rust_crate/):

test_rust_use_crate_resolves_braced_imports
test_rust_use_crate_resolves_single_ident_import
test_rust_pub_use_reexports_in_lib_rs
test_rust_cross_file_edges_are_inferred
test_rust_use_crate_never_produces_dangling_imports_from

Full test suite: 417 passed, 0 regressions. (7 pre-existing failures in tests/test_security.py are environment-specific — they reproduce identically on unpatched main when local DNS resolves example.com to a private IP range, unrelated to this PR.)

The Rust extractor's `use_declaration` handler was emitting a single edge per use statement to a stem-only node ID like `_make_id("types")`, which never matched any real node and got garbage-collected by the dangling-edge filter. Net result: **zero** cross-file edges for Rust workspaces, so every `lib.rs` / `types.rs` / `model.rs` showed up as an orphan weakly-connected component with degree=1. On a 23-crate real-world Rust workspace, this meant: - 422 weakly-connected "orphan" nodes (mostly type-definition modules) - same-file vs cross-file edge ratio of 52:1 (healthy Rust projects should be ~3-5:1) - edge relation histogram: 8681 calls + 4970 contains + only 4 uses - orphaned type modules disconnected from their in-crate consumers despite sitting in the same crate — e.g. `crates/graduation/types.rs` had NO PATH to `crates/graduation/manager.rs` even though manager.rs contains `use crate::types::{ModeTransition, StrategyMode, StrategyRuntime};` ## Fix Two-pass resolution, mirroring the Python cross-file resolver's structure: 1. **Per-file pass** (`extract_rust`): each `use_declaration` is parsed and recorded on the result dict's new `_rust_uses` list. The parser handles: - `use crate::foo::Bar;` (single ident) - `use crate::foo::{A, B as C};` (brace group + aliases) - `use crate::foo::*;` (glob — skipped, can't resolve statically) - `pub use foo::{A, B};` in `lib.rs` / `mod.rs` (relative re-export of sibling module) - Filters out `self::`, `super::`, and external-crate paths. - No edge is emitted in this pass — targets live in other files. 2. **Cross-file pass** (`_resolve_cross_file_rust_imports`): builds a global `label → [(node_id, source_file)]` index across all parsed Rust files, then for each recorded use, resolves each imported identifier to a concrete target node. Prefers candidates whose `source_file` matches the use's module prefix (so `use crate::types::Foo` picks a `Foo` defined in a `types.rs`). Falls back to first non-self candidate if no prefix match. Produces two new edge relations: - `uses` (confidence_score 0.95) for ordinary imports - `reexports` (confidence_score 1.0) for `pub use` in `lib.rs` / `mod.rs` Both are `INFERRED`, not `EXTRACTED`, because tree-sitter alone cannot fully verify type identity the way rust-analyzer would — two different crates may define identically-named types and the heuristic picks a best-effort candidate. ## Cache schema bump `cache.py` now mixes a `_CACHE_SCHEMA_TAG = b"v2"` constant into every file hash. Pre-existing cache entries (which lack the new `_rust_uses` field) silently stop matching and get re-extracted on next run. No manual `rm -rf graphify-out/cache` needed. Bump this tag whenever the extractor output schema changes again. ## Verified on arbitrage-bot (23 crates, 237 .rs files) ``` Before: uses=4, reexports=0 After: uses=506, reexports=276 (+782 real cross-file edges) ``` Specifically on the `graduation` crate (6 files, shown in commit-msg narrative above): 0 cross-file edges → 15 uses + 17 reexports = 32 new edges. `StrategyRuntime`, `ModeTransition`, `GraduationScorecard` now correctly path to `StrategyLifecycleManager` in 1-2 hops. ## Tests Added 5 new unit tests in `test_multilang.py` covering a minimal multi-file Rust fixture (`tests/fixtures/rust_crate/`): - `test_rust_use_crate_resolves_braced_imports` - `test_rust_use_crate_resolves_single_ident_import` - `test_rust_pub_use_reexports_in_lib_rs` - `test_rust_cross_file_edges_are_inferred` - `test_rust_use_crate_never_produces_dangling_imports_from` Full test suite: 417 passed, 0 regressions. (7 pre-existing failures in `tests/test_security.py` are environment-specific — they reproduce identically on unpatched `main` when local DNS resolves `example.com` to a private IP range, unrelated to this PR.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(rust): resolve `use crate::...` into real cross-file edges#330

fix(rust): resolve `use crate::...` into real cross-file edges#330
BTCB wants to merge 1 commit intosafishamsi:v4from
BTCB:fix/rust-cross-file-use-edges

BTCB commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BTCB commented Apr 14, 2026

Fix

Cache schema bump

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant