Skip to content

Commit 702ac5b

Browse files
committed
feat: parser refactoring, Jedi call resolution, and performance optimizations
Major improvements to code-review-graph spanning parser architecture, call graph accuracy, and build performance. Parser refactoring: - Extract 16 per-language handler modules into code_review_graph/lang/ using a strategy pattern, replacing monolithic conditionals in parser.py - Thread-safe parser caches with double-check locking Call graph enrichment: - Jedi-based Python method call resolution at build time (jedi_resolver.py) - Pre-scan filtering by project function names (36s to 3s on large repos) - Typed variable call enrichment (Python, JS/TS, Kotlin/Java) - Star import resolution, namespace imports, CommonJS require() - Angular template parsing, JSX handler tracking - Module-level import tracking and module-qualified call resolution - Function/class references passed as call arguments PreToolUse search enrichment: - New enrich.py module and code-review-graph enrich CLI command - Injects graph context (callers, flows, community, tests) into agent search results passively via hook Dead code false positive reduction: - Framework decorators recognized as entry points - CDK construct methods, abstract overrides excluded - E2e test directories excluded from dead code detection Performance: - Community detection: 48.6s to 2.3s (21x speedup) via bulk node loading and adjacency-indexed cohesion computation - Jedi enrichment: 36s to 3s (12x) via pre-scan filtering - Batch file storage (50-file transactions) - Batch risk_index (2 GROUP BY queries replace per-node loops) Other: - Weighted flow risk scoring by criticality - Transitive TESTED_BY lookup for tests_for and risk scoring - DB schema v8: composite edge index (v7 reserved by PR #127) - --quiet and --json CLI flags - Search query deduplication, test function deprioritization - New [enrichment] optional dependency group for Jedi - 829+ tests across 26 test files (up from 615) Evaluated against Gadgetbridge (41k nodes, 280k edges): 8/10 PASS, call resolution rate improved from 28% to 39.6%.
1 parent 3677716 commit 702ac5b

73 files changed

Lines changed: 9537 additions & 1638 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,47 @@
11
# Changelog
22

3+
## [Unreleased]
4+
5+
### Added
6+
- **Parser refactoring**: Extracted 16 per-language handler modules into `code_review_graph/lang/` package using a strategy pattern, replacing monolithic conditionals in `parser.py`
7+
- **Jedi-based call resolution**: New `jedi_resolver.py` module resolves Python method calls at build time via Jedi static analysis, with pre-scan filtering by project function names (36s to 3s on large repos)
8+
- **PreToolUse search enrichment**: New `enrich.py` module and `code-review-graph enrich` CLI command inject graph context (callers, callees, flows, community, tests) into agent search results passively
9+
- **Typed variable call enrichment**: Track constructor-based type inference and instance method calls for Python, JS/TS, and Kotlin/Java
10+
- **Star import resolution**: Resolve `from module import *` by scanning target module's exported names
11+
- **Namespace imports**: Track `import * as X from 'module'` and CommonJS `require()` patterns
12+
- **Angular template parsing**: Extract call targets from Angular component templates
13+
- **JSX handler tracking**: Detect function/class references passed as JSX event handler props
14+
- **Framework decorator recognition**: Identify entry points decorated with `@app.route`, `@router.get`, `@cli.command`, etc., reducing dead code false positives
15+
- **Module-level import tracking**: Track module-qualified call resolution (`module.function()`)
16+
- **Thread safety**: Double-check locking on parser caches (`_type_sets`, `_get_parser`, `_resolve_module_to_file`, `_get_exported_names`)
17+
- **Batch file storage**: `store_file_batch()` groups file insertions into 50-file transactions for faster builds
18+
- **Bulk node loading**: `get_all_nodes()` replaces per-file SQL queries for community detection
19+
- **Adjacency-indexed cohesion**: Community cohesion computed in O(community-edges) instead of O(all-edges), yielding 21x speedup (48.6s to 2.3s on 41k-node repos)
20+
- **Phase timing instrumentation**: `time.perf_counter()` timing at INFO level for all build phases
21+
- **Batch risk_index**: 2 GROUP BY queries replace per-node COUNT loops in risk scoring
22+
- **Weighted flow risk scoring**: Risk scores weighted by flow criticality instead of flat edge counts
23+
- **Transitive TESTED_BY lookup**: `tests_for` and risk scoring follow transitive test relationships
24+
- **DB schema v8**: Composite edge index for upsert performance (v7 reserved by upstream PR #127)
25+
- **`--quiet` and `--json` CLI flags**: Machine-readable output for `build`, `update`, `status`
26+
- **829+ tests** across 26 test files (up from 615), including `test_pain_points.py` (1,587 lines TDD suite), `test_hardened.py` (467 lines), `test_enrich.py` (237 lines)
27+
- **14 new test fixtures**: Kotlin, Java, TypeScript, JSX, Python resolution scenarios
28+
29+
### Changed
30+
- New `[enrichment]` optional dependency group for Jedi-based Python call resolution
31+
- Leiden community detection scales resolution parameter with graph size
32+
- Adaptive directory-based fallback for community detection when Leiden produces poor clusters
33+
- Search query deduplication and test function deprioritization
34+
35+
### Fixed
36+
- **Dead code false positives**: Decorators, CDK construct methods, abstract overrides, and overriding methods with called parents no longer flagged as dead
37+
- **E2e test exclusion**: Playwright/Cypress e2e test directories excluded from dead code detection
38+
- **Unique-name plausible caller optimization**: Faster dead code analysis via pre-filtered candidate sets
39+
- **Store cache liveness check**: Cached SQLite connections verified as alive before reuse
40+
41+
### Performance
42+
- **Community detection**: 48.6s to 2.3s (21x) on Gadgetbridge (41k nodes, 280k edges)
43+
- **Jedi enrichment**: 36s to 3s (12x) via pre-scan filtering by project function names
44+
345
## [2.2.2] - 2026-04-08
446

547
### Added

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ When using code-review-graph MCP tools, follow these rules:
4646

4747
```bash
4848
# Development
49-
uv run pytest tests/ --tb=short -q # Run tests (572 tests)
49+
uv run pytest tests/ --tb=short -q # Run tests (609 tests)
5050
uv run ruff check code_review_graph/ # Lint
5151
uv run mypy code_review_graph/ --ignore-missing-imports --no-strict-optional
5252

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,7 @@ code-review-graph watch # Auto-update on file changes
229229
code-review-graph visualize # Generate interactive HTML graph
230230
code-review-graph wiki # Generate markdown wiki from communities
231231
code-review-graph detect-changes # Risk-scored change impact analysis
232+
code-review-graph enrich # Enrich search results with graph context
232233
code-review-graph register <path> # Register repo in multi-repo registry
233234
code-review-graph unregister <id> # Remove repo from registry
234235
code-review-graph repos # List registered repositories
@@ -293,6 +294,7 @@ Optional dependency groups:
293294
pip install code-review-graph[embeddings] # Local vector embeddings (sentence-transformers)
294295
pip install code-review-graph[google-embeddings] # Google Gemini embeddings
295296
pip install code-review-graph[communities] # Community detection (igraph)
297+
pip install code-review-graph[enrichment] # Jedi-based Python call resolution
296298
pip install code-review-graph[eval] # Evaluation benchmarks (matplotlib)
297299
pip install code-review-graph[wiki] # Wiki generation with LLM summaries (ollama)
298300
pip install code-review-graph[all] # All optional dependencies
@@ -316,7 +318,7 @@ pytest
316318
<summary><strong>Adding a new language</strong></summary>
317319
<br>
318320

319-
Edit `code_review_graph/parser.py` and add your extension to `EXTENSION_TO_LANGUAGE` along with node type mappings in `_CLASS_TYPES`, `_FUNCTION_TYPES`, `_IMPORT_TYPES`, and `_CALL_TYPES`. Include a test fixture and open a PR.
321+
Edit the appropriate language handler in `code_review_graph/lang/` (e.g., `_python.py`, `_kotlin.py`) or create a new one following `_base.py`. Add your extension to `EXTENSION_TO_LANGUAGE` in `parser.py`, include a test fixture, and open a PR.
320322

321323
</details>
322324

code-review-graph-vscode/src/backend/cli.ts

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -53,22 +53,14 @@ export class CliWrapper {
5353
/**
5454
* Build (or fully rebuild) the graph database for a workspace.
5555
*/
56-
async buildGraph(
57-
workspaceRoot: string,
58-
options?: { fullRebuild?: boolean },
59-
): Promise<CliResult> {
60-
const args = ['build'];
61-
if (options?.fullRebuild) {
62-
args.push('--full');
63-
}
64-
56+
async buildGraph(workspaceRoot: string): Promise<CliResult> {
6557
return vscode.window.withProgress(
6658
{
6759
location: vscode.ProgressLocation.Notification,
6860
title: 'Code Review Graph: Building graph\u2026',
6961
cancellable: false,
7062
},
71-
() => this.exec(args, workspaceRoot),
63+
() => this.exec(['build'], workspaceRoot),
7264
);
7365
}
7466

code-review-graph-vscode/src/backend/sqlite.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ export class SqliteReader {
208208
if (row) {
209209
const version = parseInt(row.value, 10);
210210
// Must match LATEST_VERSION in code_review_graph/migrations.py
211-
const SUPPORTED_SCHEMA_VERSION = 6;
211+
const SUPPORTED_SCHEMA_VERSION = 8;
212212
if (!isNaN(version) && version > SUPPORTED_SCHEMA_VERSION) {
213213
return `Database was created with a newer version (schema v${version}). Update the extension.`;
214214
}

code_review_graph/changes.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -152,15 +152,19 @@ def compute_risk_score(store: GraphStore, node: GraphNode) -> float:
152152
Scoring factors:
153153
- Flow participation: 0.05 per flow membership, capped at 0.25
154154
- Community crossing: 0.05 per caller from a different community, capped at 0.15
155-
- Test coverage: 0.30 if no TESTED_BY edges, 0.05 if tested
155+
- Test coverage: 0.30 (untested) scaling down to 0.05 (5+ TESTED_BY edges)
156156
- Security sensitivity: 0.20 if name matches security keywords
157157
- Caller count: callers / 20, capped at 0.10
158158
"""
159159
score = 0.0
160160

161-
# --- Flow participation (cap 0.25) ---
162-
flow_count = store.count_flow_memberships(node.id)
163-
score += min(flow_count * 0.05, 0.25)
161+
# --- Flow participation (cap 0.25), weighted by criticality ---
162+
flow_criticalities = store.get_flow_criticalities_for_node(node.id)
163+
if flow_criticalities:
164+
score += min(sum(flow_criticalities), 0.25)
165+
else:
166+
flow_count = store.count_flow_memberships(node.id)
167+
score += min(flow_count * 0.05, 0.25)
164168

165169
# --- Community crossing (cap 0.15) ---
166170
callers = store.get_edges_by_target(node.qualified_name)
@@ -177,10 +181,10 @@ def compute_risk_score(store: GraphStore, node: GraphNode) -> float:
177181
cross_community += 1
178182
score += min(cross_community * 0.05, 0.15)
179183

180-
# --- Test coverage ---
181-
tested_edges = store.get_edges_by_target(node.qualified_name)
182-
has_test = any(e.kind == "TESTED_BY" for e in tested_edges)
183-
score += 0.05 if has_test else 0.30
184+
# --- Test coverage (direct + transitive) ---
185+
transitive_tests = store.get_transitive_tests(node.qualified_name)
186+
test_count = len(transitive_tests)
187+
score += 0.30 - (min(test_count / 5.0, 1.0) * 0.25)
184188

185189
# --- Security sensitivity ---
186190
name_lower = node.name.lower()

code_review_graph/cli.py

Lines changed: 141 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
code-review-graph visualize
1212
code-review-graph wiki
1313
code-review-graph detect-changes [--base BASE] [--brief]
14+
code-review-graph enrich
1415
code-review-graph register <path> [--alias name]
1516
code-review-graph unregister <path_or_alias>
1617
code-review-graph repos
@@ -151,6 +152,67 @@ def _handle_init(args: argparse.Namespace) -> None:
151152
print(" 2. Restart your AI coding tool to pick up the new config")
152153

153154

155+
def _run_post_processing(store, quiet: bool = False) -> None:
156+
"""Run signatures, FTS, flows, and communities after build/update."""
157+
import sqlite3
158+
159+
# Signatures
160+
try:
161+
nodes = store._conn.execute(
162+
"SELECT id, name, kind, params, return_type FROM nodes "
163+
"WHERE kind IN ('Function','Test','Class')"
164+
).fetchall()
165+
for row in nodes:
166+
node_id, name, kind, params, ret = row
167+
if kind in ("Function", "Test"):
168+
sig = f"{name}({params or ''})"
169+
if ret:
170+
sig += f" -> {ret}"
171+
elif kind == "Class":
172+
sig = f"class {name}"
173+
else:
174+
sig = name
175+
store.update_node_signature(node_id, sig[:512])
176+
store.commit()
177+
except (sqlite3.OperationalError, TypeError, KeyError) as e:
178+
if not quiet:
179+
print(f"Warning: signature computation failed: {e}")
180+
181+
# FTS index
182+
try:
183+
from .search import rebuild_fts_index
184+
fts_count = rebuild_fts_index(store)
185+
if not quiet:
186+
print(f"FTS indexed: {fts_count} nodes")
187+
except (sqlite3.OperationalError, ImportError) as e:
188+
if not quiet:
189+
print(f"Warning: FTS index rebuild failed: {e}")
190+
191+
# Flows
192+
try:
193+
from .flows import store_flows as _store_flows
194+
from .flows import trace_flows as _trace_flows
195+
flows = _trace_flows(store)
196+
count = _store_flows(store, flows)
197+
if not quiet:
198+
print(f"Flows detected: {count}")
199+
except (sqlite3.OperationalError, ImportError) as e:
200+
if not quiet:
201+
print(f"Warning: flow detection failed: {e}")
202+
203+
# Communities
204+
try:
205+
from .communities import detect_communities as _detect_communities
206+
from .communities import store_communities as _store_communities
207+
comms = _detect_communities(store)
208+
count = _store_communities(store, comms)
209+
if not quiet:
210+
print(f"Communities detected: {count}")
211+
except (sqlite3.OperationalError, ImportError) as e:
212+
if not quiet:
213+
print(f"Warning: community detection failed: {e}")
214+
215+
154216
def main() -> None:
155217
"""Main CLI entry point."""
156218
ap = argparse.ArgumentParser(
@@ -227,6 +289,7 @@ def main() -> None:
227289
# build
228290
build_cmd = sub.add_parser("build", help="Full graph build (re-parse all files)")
229291
build_cmd.add_argument("--repo", default=None, help="Repository root (auto-detected)")
292+
build_cmd.add_argument("-q", "--quiet", action="store_true", help="Suppress output")
230293
build_cmd.add_argument(
231294
"--skip-flows", action="store_true",
232295
help="Skip flow/community detection (signatures + FTS only)",
@@ -240,6 +303,7 @@ def main() -> None:
240303
update_cmd = sub.add_parser("update", help="Incremental update (only changed files)")
241304
update_cmd.add_argument("--base", default="HEAD~1", help="Git diff base (default: HEAD~1)")
242305
update_cmd.add_argument("--repo", default=None, help="Repository root (auto-detected)")
306+
update_cmd.add_argument("-q", "--quiet", action="store_true", help="Suppress output")
243307
update_cmd.add_argument(
244308
"--skip-flows", action="store_true",
245309
help="Skip flow/community detection (signatures + FTS only)",
@@ -266,6 +330,11 @@ def main() -> None:
266330
# status
267331
status_cmd = sub.add_parser("status", help="Show graph statistics")
268332
status_cmd.add_argument("--repo", default=None, help="Repository root (auto-detected)")
333+
status_cmd.add_argument("-q", "--quiet", action="store_true", help="Suppress output")
334+
status_cmd.add_argument(
335+
"--json", action="store_true", dest="json_output",
336+
help="Output as JSON",
337+
)
269338

270339
# visualize
271340
vis_cmd = sub.add_parser("visualize", help="Generate interactive HTML graph visualization")
@@ -327,6 +396,13 @@ def main() -> None:
327396
)
328397
detect_cmd.add_argument("--repo", default=None, help="Repository root (auto-detected)")
329398

399+
# embed
400+
embed_cmd = sub.add_parser("embed", help="Compute vector embeddings for graph nodes")
401+
embed_cmd.add_argument("--repo", default=None, help="Repository root (auto-detected)")
402+
403+
# enrich (PreToolUse hook -- reads hook JSON from stdin)
404+
sub.add_parser("enrich", help="Enrich search results with graph context (hook)")
405+
330406
# serve
331407
serve_cmd = sub.add_parser("serve", help="Start MCP server (stdio transport)")
332408
serve_cmd.add_argument("--repo", default=None, help="Repository root (auto-detected)")
@@ -346,6 +422,28 @@ def main() -> None:
346422
serve_main(repo_root=args.repo)
347423
return
348424

425+
if args.command == "embed":
426+
from .incremental import find_repo_root
427+
repo_root = Path(args.repo) if args.repo else find_repo_root()
428+
if not repo_root:
429+
repo_root = Path.cwd()
430+
db_path = repo_root / ".code-review-graph" / "graph.db"
431+
if not db_path.exists():
432+
print("No graph database found. Run 'code-review-graph build' first.")
433+
return
434+
from .embeddings import EmbeddingStore, embed_all_nodes
435+
from .graph import GraphStore
436+
store = GraphStore(str(db_path))
437+
emb_store = EmbeddingStore(str(db_path))
438+
count = embed_all_nodes(store, emb_store)
439+
print(f"Embedded {count} nodes.")
440+
return
441+
442+
if args.command == "enrich":
443+
from .enrich import run_hook
444+
run_hook()
445+
return
446+
349447
if args.command == "eval":
350448
from .eval.reporter import generate_full_report, generate_readme_tables
351449
from .eval.runner import run_eval
@@ -485,13 +583,14 @@ def main() -> None:
485583
parsed = result.get("files_parsed", 0)
486584
nodes = result.get("total_nodes", 0)
487585
edges = result.get("total_edges", 0)
488-
print(
489-
f"Full build: {parsed} files, "
490-
f"{nodes} nodes, {edges} edges"
491-
f" (postprocess={pp})"
492-
)
493-
if result.get("errors"):
494-
print(f"Errors: {len(result['errors'])}")
586+
if not getattr(args, "quiet", False):
587+
print(
588+
f"Full build: {parsed} files, "
589+
f"{nodes} nodes, {edges} edges"
590+
f" (postprocess={pp})"
591+
)
592+
if result.get("errors"):
593+
print(f"Errors: {len(result['errors'])}")
495594

496595
elif args.command == "update":
497596
pp = "none" if getattr(args, "skip_postprocess", False) else (
@@ -505,35 +604,53 @@ def main() -> None:
505604
updated = result.get("files_updated", 0)
506605
nodes = result.get("total_nodes", 0)
507606
edges = result.get("total_edges", 0)
508-
print(
509-
f"Incremental: {updated} files updated, "
510-
f"{nodes} nodes, {edges} edges"
511-
f" (postprocess={pp})"
512-
)
607+
if not getattr(args, "quiet", False):
608+
print(
609+
f"Incremental: {updated} files updated, "
610+
f"{nodes} nodes, {edges} edges"
611+
f" (postprocess={pp})"
612+
)
513613

514614
elif args.command == "status":
615+
import json as json_mod
515616
stats = store.get_stats()
516-
print(f"Nodes: {stats.total_nodes}")
517-
print(f"Edges: {stats.total_edges}")
518-
print(f"Files: {stats.files_count}")
519-
print(f"Languages: {', '.join(stats.languages)}")
520-
print(f"Last updated: {stats.last_updated or 'never'}")
521-
# Show branch info and warn if stale
522617
stored_branch = store.get_metadata("git_branch")
523618
stored_sha = store.get_metadata("git_head_sha")
524-
if stored_branch:
525-
print(f"Built on branch: {stored_branch}")
526-
if stored_sha:
527-
print(f"Built at commit: {stored_sha[:12]}")
528619
from .incremental import _git_branch_info
529620
current_branch, current_sha = _git_branch_info(repo_root)
621+
stale_warning = None
530622
if stored_branch and current_branch and stored_branch != current_branch:
531-
print(
532-
f"WARNING: Graph was built on '{stored_branch}' "
623+
stale_warning = (
624+
f"Graph was built on '{stored_branch}' "
533625
f"but you are now on '{current_branch}'. "
534626
f"Run 'code-review-graph build' to rebuild."
535627
)
536628

629+
if getattr(args, "json_output", False):
630+
data = {
631+
"nodes": stats.total_nodes,
632+
"edges": stats.total_edges,
633+
"files": stats.files_count,
634+
"languages": list(stats.languages),
635+
"last_updated": stats.last_updated,
636+
"branch": stored_branch,
637+
"commit": stored_sha[:12] if stored_sha else None,
638+
"stale": stale_warning,
639+
}
640+
print(json_mod.dumps(data))
641+
elif not args.quiet:
642+
print(f"Nodes: {stats.total_nodes}")
643+
print(f"Edges: {stats.total_edges}")
644+
print(f"Files: {stats.files_count}")
645+
print(f"Languages: {', '.join(stats.languages)}")
646+
print(f"Last updated: {stats.last_updated or 'never'}")
647+
if stored_branch:
648+
print(f"Built on branch: {stored_branch}")
649+
if stored_sha:
650+
print(f"Built at commit: {stored_sha[:12]}")
651+
if stale_warning:
652+
print(f"WARNING: {stale_warning}")
653+
537654
elif args.command == "watch":
538655
watch(repo_root, store)
539656

0 commit comments

Comments
 (0)