Skip to content

feat(openant-core): add Rust parser support#10

Open
ar7casper wants to merge 1 commit intomasterfrom
with-parser-docs
Open

feat(openant-core): add Rust parser support#10
ar7casper wants to merge 1 commit intomasterfrom
with-parser-docs

Conversation

@ar7casper
Copy link
Collaborator

  • Add 4-stage Rust parser using tree-sitter-rust

    • repository_scanner.py: enumerate .rs files
    • function_extractor.py: extract functions/methods from AST
    • call_graph_builder.py: build bidirectional call graphs
    • unit_generator.py: generate dataset.json
    • test_pipeline.py: orchestrator with all 4 processing levels
  • Register Rust in parser_adapter.py (detection + dispatch)

  • Add 'rust' to CLI language whitelist (cli.py, parse.go)

  • Add tree-sitter-rust to dependencies (requirements.txt, pyproject.toml)

  • Update README with Rust in supported languages

  • Add adding-a-parser.md guide with CLI whitelist and venv dependency docs

- Add 4-stage Rust parser using tree-sitter-rust
  - repository_scanner.py: enumerate .rs files
  - function_extractor.py: extract functions/methods from AST
  - call_graph_builder.py: build bidirectional call graphs
  - unit_generator.py: generate dataset.json
  - test_pipeline.py: orchestrator with all 4 processing levels

- Register Rust in parser_adapter.py (detection + dispatch)
- Add 'rust' to CLI language whitelist (cli.py, parse.go)
- Add tree-sitter-rust to dependencies (requirements.txt, pyproject.toml)
- Update README with Rust in supported languages
- Add adding-a-parser.md guide with CLI whitelist and venv dependency docs
Copy link
Collaborator

@NahumKorda NahumKorda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                                                                                                                                                                                                                           Generated by Clause Code:

Summary

Well-structured PR that adds a Rust parser following the established 4-stage pipeline. The adding-a-parser.md guide is a valuable addition. Overall quality is good — it will work for typical Rust codebases — but there are
several correctness issues to address.


Major Issues

M1. qualified_name uses :: separator, inconsistent with all other parsers
function_extractor.py builds IDs like src/lib.rs:Config::new. The guide documents . as the separator (Config.method). The :: also makes splitting on : ambiguous — src/lib.rs:Config::new yields ["src/lib.rs", "Config", "",
"new"] instead of ["src/lib.rs", "Config::new"]. Use . to match convention.

M2. Trait impl names include for keyword, creating IDs with spaces
impl Display for Config produces impl_name = "Display for Config", leading to src/lib.rs:Display for Config::fmt. Spaces in function IDs will break downstream processing. Normalize to Config (the implementing type).

M3. rstrip('::') bug in _extract_imports
base = parts.split('{')[0].rstrip('::')
str.rstrip('::') strips any character in {':', ':'}, not the substring ::. So std::collections:: strips all trailing c, o, l, e, i, t, n, s, : characters. Use .removesuffix('::') instead.

M4. _is_async uses fragile string matching instead of AST
return code.strip().startswith('async ') or 'async fn' in code[:50]
A comment containing async in the first 50 chars causes false positives. Check tree-sitter children instead: any(child.type == 'async' for child in node.children).


Minor Issues

m1. RUST_BUILTINS has duplicates (contains, assert, assert_eq, debug, take, flatten). Harmless but suggests copy-paste oversight.

m2. _has_test_attribute walks parent siblings, but in tree-sitter-rust attributes may be direct children of the function node inside impl/mod blocks. Test detection may not work reliably in all contexts. Same issue for
_has_route_attribute and _has_main_attribute.

m3. _resolve_simple_call checks the same impl block first for bare function calls. In Rust, foo() inside an impl block does NOT resolve to Self::foo() — only self.foo() does. This creates false call graph edges.

m4. Closures (|args| body) are not extracted. Code inside closures (common in iterator chains, async) won't appear in the call graph, potentially missing vulnerability-relevant call chains.

m5. _extract_imports ignores glob imports (use foo::*) and doesn't handle nested group uses (use std::{collections::{HashMap, BTreeMap}, io::Read}).

m6. Scanner excludes examples/ by default — other parsers don't exclude equivalent directories, and example code can contain vulnerability patterns.

m7. _resolve_method_call unique-name matching is overly aggressive — if there's exactly one method named process in the entire codebase, any .process() call resolves to it regardless of type.


Nits

n1. adding-a-parser.md shows Config.new (dot) in Rust examples, but the parser produces Config::new (double colon). Should be consistent.

n2. _is_public doesn't distinguish pub vs pub(crate) vs pub(super) — these have different attack surface implications.

n3. build is classified as constructor, but in Rust's builder pattern, build() is the finalizer, not the constructor.

n4. associated_function and entry_point unit types are returned by the classifier but not documented in adding-a-parser.md.

n5. get_dependencies/get_callers are duplicated identically in both CallGraphBuilder and UnitGenerator (consistent with other parsers, but noted).


Positives

  • Excellent structural consistency with existing parsers
  • All integration points updated (parser_adapter, cli.py, parse.go, pyproject.toml, requirements.txt, README)
  • Comprehensive RUST_BUILTINS filter covering macros, iterators, Option/Result, async, logging
  • Correctly handles impl blocks, trait implementations, self/Self:: calls, async, route attributes (actix/axum/rocket)
  • adding-a-parser.md is genuinely useful and will lower the barrier for future contributions
  • Stack-based traversal (no recursion), robust error handling with regex fallback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants