perf(flows): load adjacency in-memory for trace_flows#296
Open
0bLoM wants to merge 1 commit intotirth8205:mainfrom
Open
perf(flows): load adjacency in-memory for trace_flows#2960bLoM wants to merge 1 commit intotirth8205:mainfrom
0bLoM wants to merge 1 commit intotirth8205:mainfrom
Conversation
…cality On large graphs (~500k nodes, ~3M edges) trace_flows and compute_criticality did tens of millions of per-row SQLite point queries (get_edges_by_source, get_node, get_node_by_id, get_edges_by_target), causing the build's flow step to grind for many minutes at 100% CPU with no progress output. Add FlowAdjacency dataclass and GraphStore.load_flow_adjacency() that builds the needed adjacency (CALLS out-edges, TESTED_BY incoming set, nodes-by-qn, nodes-by-id) in two streaming SELECTs. Refactor _trace_single_flow, compute_criticality, trace_flows, and incremental_trace_flows to use it instead of per-node/per-edge queries. BFS target resolution, external-call detection, and test-coverage checks become dict/set lookups; the whole pass reduces to two table scans plus in-memory traversal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
trace_flowsandcompute_criticalitywere grinding for many minutes at 100% CPU because every BFS step and criticality factor did per-row SQLite point queries (get_edges_by_source,get_node,get_node_by_id,get_edges_by_target).FlowAdjacencydataclass +GraphStore.load_flow_adjacency()that streamsnodesandCALLS/TESTED_BYedges into memory in two queries._trace_single_flow,compute_criticality,trace_flows,incremental_trace_flowsto use in-memory dict/set lookups instead of SQLite round-trips.Test plan
uv run pytest tests/— 788 passed, 1 skipped, 2 xpassedruff checkon modified files — cleanmypy --ignore-missing-imports --no-strict-optionalon modified files — clean🤖 Generated with Claude Code