Skip to content

Module-scope calls (notebooks, scripts, __main__) emit no CALLS edges → find_dead_code false positives #284

@michael-denyer

Description

@michael-denyer

Bug

_extract_calls in code_review_graph/parser.py gates CALLS edge emission on enclosing_func being set:

if call_name and enclosing_func:
    caller = self._qualify(enclosing_func, file_path, enclosing_class)
    ...edges.append(EdgeInfo(kind="CALLS", source=caller, target=target, ...))

So calls made from module scope — top-level script glue, CLI entrypoints, if __name__ == "__main__" blocks, and Jupyter/Databricks notebook cells — produce zero CALLS edges. Any function invoked only from those contexts is then flagged as dead by find_dead_code (which counts incoming CALLS edges as evidence of liveness).

Notebook impact (severe)

PR #69 added notebook parsing — node extraction and IMPORTS_FROM edges work — but every cell is module-scope by definition, so notebooks emit no CALLS edges at all. This makes the dead-code detector's notebook coverage vacuous: any function called only from notebooks looks orphaned.

Reproducer

Real-world: a Databricks notebook (production inference pipeline) that calls Predict.extract_data_from_sample_ids():

# Direct parser call — bypasses any CLI/MCP layering
>>> from pathlib import Path
>>> from code_review_graph.parser import CodeParser
>>> nodes, edges = CodeParser().parse_file(Path("ML_wpredict_apply_v1.0.ipynb"))
>>> [e for e in edges if e.kind == "CALLS"]
[]
>>> [e for e in edges if e.kind == "IMPORTS_FROM"]
[<IMPORTS_FROM ... -> logging>, <IMPORTS_FROM ... -> sys>, <IMPORTS_FROM ... -> src.predict>]

The notebook contains predict_obj.extract_data_from_sample_ids(...) and similar calls in cells. Imports resolve correctly; calls are silently dropped.

refactor_tool(mode="dead_code") then flags extract_data_from_sample_ids and extract_data_from_files as dead — they're the entire reason the apply notebook exists.

Same shape reproduces with a plain .py file containing only a top-level helper() invocation, so this isn't notebook-specific — notebooks just suffer worst because they're 100% module-scope.

Scope of the fix

5 emission sites in parser.py gate on enclosing_func:

Line Site Languages affected
1455 Elixir call path Elixir
2379 Generic _extract_calls Python, JS, TS, others — the main fix
2415 JSX component invocation TSX/JSX (a top-level <App /> render is module-scope)
2700 Solidity emit statement Solidity
4002 R call path R

Plus a downstream consideration: detect_entry_points in flows.py treats "is a CALLS target" as "is not a root" — so attributing a script's module-scope calls to the script's own File node would make script-only callees look "called by the script" and hide them from flow analysis. The fix needs to filter File-sourced CALLS in entry-point detection.

Why the existing convention supports the fix

_extract_value_references already attributes references to file_path when enclosing_func is None (parser.py line 2508-ish). CONTAINS edges do the same when there's no enclosing function. The fix just brings CALLS into line with the existing pattern.

Not addressed by prior PRs

Confirmed by reviewing dead-code-related history:

Fix submitted

PR pending — links 5 module-scope CALLS sites + filters File-sourced CALLS in detect_entry_points. End-to-end verification: edge count on ML_wpredict_apply_v1.0.ipynb goes from 0 to 14 CALLS edges; find_dead_code no longer flags the notebook-only methods.

318 tests pass (parser, refactor, flows, multilang, notebook).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions