feat(openant-core): add Rust parser support by ar7casper · Pull Request #10 · knostic/OpenAnt

ar7casper · 2026-03-03T14:45:11Z

Add 4-stage Rust parser using tree-sitter-rust
- repository_scanner.py: enumerate .rs files
- function_extractor.py: extract functions/methods from AST
- call_graph_builder.py: build bidirectional call graphs
- unit_generator.py: generate dataset.json
- test_pipeline.py: orchestrator with all 4 processing levels
Register Rust in parser_adapter.py (detection + dispatch)
Add 'rust' to CLI language whitelist (cli.py, parse.go)
Add tree-sitter-rust to dependencies (requirements.txt, pyproject.toml)
Update README with Rust in supported languages
Add adding-a-parser.md guide with CLI whitelist and venv dependency docs

- Add 4-stage Rust parser using tree-sitter-rust - repository_scanner.py: enumerate .rs files - function_extractor.py: extract functions/methods from AST - call_graph_builder.py: build bidirectional call graphs - unit_generator.py: generate dataset.json - test_pipeline.py: orchestrator with all 4 processing levels - Register Rust in parser_adapter.py (detection + dispatch) - Add 'rust' to CLI language whitelist (cli.py, parse.go) - Add tree-sitter-rust to dependencies (requirements.txt, pyproject.toml) - Update README with Rust in supported languages - Add adding-a-parser.md guide with CLI whitelist and venv dependency docs

NahumKorda

                                                                                                                                                                                                                           Generated by Clause Code:

Summary

Well-structured PR that adds a Rust parser following the established 4-stage pipeline. The adding-a-parser.md guide is a valuable addition. Overall quality is good — it will work for typical Rust codebases — but there are
several correctness issues to address.

Major Issues

M1. qualified_name uses :: separator, inconsistent with all other parsers
function_extractor.py builds IDs like src/lib.rs:Config::new. The guide documents . as the separator (Config.method). The :: also makes splitting on : ambiguous — src/lib.rs:Config::new yields ["src/lib.rs", "Config", "",
"new"] instead of ["src/lib.rs", "Config::new"]. Use . to match convention.

M2. Trait impl names include for keyword, creating IDs with spaces
impl Display for Config produces impl_name = "Display for Config", leading to src/lib.rs:Display for Config::fmt. Spaces in function IDs will break downstream processing. Normalize to Config (the implementing type).

M3. rstrip('::') bug in _extract_imports
base = parts.split('{')[0].rstrip('::')
str.rstrip('::') strips any character in {':', ':'}, not the substring ::. So std::collections:: strips all trailing c, o, l, e, i, t, n, s, : characters. Use .removesuffix('::') instead.

M4. _is_async uses fragile string matching instead of AST
return code.strip().startswith('async ') or 'async fn' in code[:50]
A comment containing async in the first 50 chars causes false positives. Check tree-sitter children instead: any(child.type == 'async' for child in node.children).

Minor Issues

m1. RUST_BUILTINS has duplicates (contains, assert, assert_eq, debug, take, flatten). Harmless but suggests copy-paste oversight.

m2. _has_test_attribute walks parent siblings, but in tree-sitter-rust attributes may be direct children of the function node inside impl/mod blocks. Test detection may not work reliably in all contexts. Same issue for
_has_route_attribute and _has_main_attribute.

m3. _resolve_simple_call checks the same impl block first for bare function calls. In Rust, foo() inside an impl block does NOT resolve to Self::foo() — only self.foo() does. This creates false call graph edges.

m4. Closures (|args| body) are not extracted. Code inside closures (common in iterator chains, async) won't appear in the call graph, potentially missing vulnerability-relevant call chains.

m5. _extract_imports ignores glob imports (use foo::*) and doesn't handle nested group uses (use std::{collections::{HashMap, BTreeMap}, io::Read}).

m6. Scanner excludes examples/ by default — other parsers don't exclude equivalent directories, and example code can contain vulnerability patterns.

m7. _resolve_method_call unique-name matching is overly aggressive — if there's exactly one method named process in the entire codebase, any .process() call resolves to it regardless of type.

Nits

n1. adding-a-parser.md shows Config.new (dot) in Rust examples, but the parser produces Config::new (double colon). Should be consistent.

n2. _is_public doesn't distinguish pub vs pub(crate) vs pub(super) — these have different attack surface implications.

n3. build is classified as constructor, but in Rust's builder pattern, build() is the finalizer, not the constructor.

n4. associated_function and entry_point unit types are returned by the classifier but not documented in adding-a-parser.md.

n5. get_dependencies/get_callers are duplicated identically in both CallGraphBuilder and UnitGenerator (consistent with other parsers, but noted).

Positives

Excellent structural consistency with existing parsers
All integration points updated (parser_adapter, cli.py, parse.go, pyproject.toml, requirements.txt, README)
Comprehensive RUST_BUILTINS filter covering macros, iterators, Option/Result, async, logging
Correctly handles impl blocks, trait implementations, self/Self:: calls, async, route attributes (actix/axum/rocket)
adding-a-parser.md is genuinely useful and will lower the barrier for future contributions
Stack-based traversal (no recursion), robust error handling with regex fallback

ar7casper requested review from NahumKorda, dgeyshis, shahar-davidson, sounil and yotamleo as code owners March 3, 2026 14:45

NahumKorda reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openant-core): add Rust parser support#10

feat(openant-core): add Rust parser support#10
ar7casper wants to merge 1 commit intomasterfrom
with-parser-docs

ar7casper commented Mar 3, 2026

Uh oh!

NahumKorda left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ar7casper commented Mar 3, 2026

Uh oh!

NahumKorda left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants