Abstract syntax tree + forbidden keyword test#300
Conversation
Introduces a new pipeline step that uses ast-grep-py to parse student submissions into AST roots. These roots are stored in a new StructuralAnalysisResult dataclass and made available to subsequent grading steps, preventing redundant parsing during the grading phase. Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>
…alysis - Implement ForbiddenKeywordTest in input_output.py with predefined rules for loop detection. - Add config_schema to TestFunction for early Pydantic validation in CriteriaTreeService. - Plumbing: Forward StructuralAnalysisResult from GradeStep through GraderService to test execution. - Fix ast-grep-py version in requirements.txt to ensure build compatibility. - Update docker-compose.yml with correct Docker socket path and ministack tag for local dev. - Add English and Portuguese translations for the new test. - Include unit tests for the test logic and validation mechanism. Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds AST-backed structural analysis to the autograder pipeline (via ast-grep) and introduces an AST-based ForbiddenKeywordTest to detect forbidden constructs reliably.
Changes:
- Added a
StructuralAnalysisStepto parse submission code files intoast-greproots and threaded its result into grading execution. - Implemented
ForbiddenKeywordTest(with i18n strings) to detect forbidden constructs using structural queries instead of regex. - Added early Pydantic-based parameter validation for tests during criteria tree building, plus unit tests for the new behavior.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
autograder/steps/structural_analysis_step.py |
New pipeline step that parses code files into AST roots via ast-grep. |
autograder/autograder.py |
Inserts the new structural analysis step into the pipeline execution order. |
autograder/steps/step_registry.py |
Registers the new step so it can be constructed by the registry. |
autograder/models/dataclass/step_result.py |
Adds STRUCTURAL_ANALYSIS to StepName. |
autograder/models/dataclass/structural_analysis_result.py |
New dataclass to store per-file AST parse roots. |
autograder/models/pipeline_execution.py |
Adds accessor for retrieving structural analysis results from the pipeline. |
autograder/steps/grade_step.py |
Passes structural analysis data into the grader service. |
autograder/services/grader_service.py |
Threads structural_analysis through grading and into TestFunction.execute(...). |
autograder/template_library/input_output.py |
Adds ForbiddenKeywordTest + config schema and registers it in InputOutputTemplate. |
autograder/translations/en.json |
Adds English strings for the new forbidden keyword test. |
autograder/translations/pt_br.json |
Adds Brazilian Portuguese strings for the new forbidden keyword test. |
autograder/models/abstract/test_function.py |
Introduces optional config_schema hook for test parameter validation. |
autograder/services/criteria_tree_service.py |
Validates test parameters early using config_schema during tree build. |
tests/unit/test_forbidden_keyword.py |
Unit tests covering registration/config/execution behaviors of ForbiddenKeywordTest. |
tests/unit/pipeline/test_structural_analysis_step.py |
Unit tests for structural analysis step parsing/heuristics/failure behavior. |
tests/unit/test_test_function_validation.py |
Unit tests for early parameter validation during criteria tree building. |
requirements.txt |
Adds ast-grep-py dependency. |
docker-compose.yml |
Updates docker socket configuration and Ministack image tag. |
scripts/test_structural_analysis_live.py |
Adds a manual local script to exercise structural analysis + queries. |
| from autograder.models.dataclass.step_result import StepName | ||
| from sandbox_manager.models.sandbox_models import Language | ||
|
|
||
| def main(): |
There was a problem hiding this comment.
CI runs pylint on new/modified .py files; main() currently has no docstring and will likely trigger missing-function-docstring. Add a short docstring or disable pylint for this script if it’s not meant to meet project linting standards.
| def main(): | |
| def main(): | |
| """Run a live structural analysis test against a mock submission.""" |
| from autograder.models.dataclass.structural_analysis_result import StructuralAnalysisResult | ||
| return cast( | ||
| StructuralAnalysisResult, | ||
| self._require_step_data(StepName.STRUCTURAL_ANALYSIS, "structural analysis result"), |
There was a problem hiding this comment.
If the STRUCTURAL_ANALYSIS step ran but produced data=None (e.g., StepResult.fail), _require_step_data() will raise here. Consider checking the step result’s status/data and returning None when it’s not SUCCESS so later steps can proceed safely.
| from autograder.models.dataclass.structural_analysis_result import StructuralAnalysisResult | |
| return cast( | |
| StructuralAnalysisResult, | |
| self._require_step_data(StepName.STRUCTURAL_ANALYSIS, "structural analysis result"), | |
| step_result = self.get_step_result(StepName.STRUCTURAL_ANALYSIS) | |
| if step_result.status != StepStatus.SUCCESS or step_result.data is None: | |
| return None | |
| from autograder.models.dataclass.structural_analysis_result import StructuralAnalysisResult | |
| return cast( | |
| StructuralAnalysisResult, | |
| step_result.data, |
| if test_function.config_schema: | ||
| try: | ||
| # We use model_validate to leverage Pydantic's validation logic. | ||
| test_function.config_schema(**test_params) |
There was a problem hiding this comment.
The comment says this uses Pydantic’s model_validate, but the code is instantiating the model via config_schema(**test_params). Either update the comment to match the implementation or switch to calling model_validate (Pydantic v2) for clarity/consistency.
| test_function.config_schema(**test_params) | |
| test_function.config_schema.model_validate(test_params) |
| continue | ||
|
|
||
| for rule in active_rules: | ||
| matches = root.root().find_all(**rule) |
There was a problem hiding this comment.
custom_ast_grep_rules is user-provided config, but find_all(**rule) will raise at runtime if a rule dict contains unexpected keys/values. That would interrupt grading instead of producing a clean config error. Consider validating custom rules more strictly in ForbiddenKeywordConfig (e.g., a typed model / constrained keys) and/or catching exceptions around find_all to return a 0-score TestResult with a configuration error message.
| matches = root.root().find_all(**rule) | |
| if not isinstance(rule, dict): | |
| return TestResult( | |
| test_name=self.name, | |
| score=0.0, | |
| report=( | |
| "Invalid forbidden keyword configuration: each custom AST rule " | |
| f"must be a dictionary, got {type(rule).__name__}." | |
| ) | |
| ) | |
| if any(not isinstance(key, str) for key in rule.keys()): | |
| return TestResult( | |
| test_name=self.name, | |
| score=0.0, | |
| report=( | |
| "Invalid forbidden keyword configuration: custom AST rule keys " | |
| f"must all be strings, got rule {rule!r}." | |
| ) | |
| ) | |
| try: | |
| matches = root.root().find_all(**rule) | |
| except (TypeError, ValueError) as exc: | |
| return TestResult( | |
| test_name=self.name, | |
| score=0.0, | |
| report=( | |
| "Invalid forbidden keyword configuration: failed to apply custom " | |
| f"AST rule {rule!r} for file '{sub_file.filename}': {exc}" | |
| ) | |
| ) |
ArthurCRodrigues
left a comment
There was a problem hiding this comment.
@jaoppb can you add some documentation outlining the feature in both technical and utilitary explanations?
- Explain what is the new feature and how to use it
- Add new test to input_output documentation with usage examples and configuration guide
- Add technical documentation of how we use AST and the AST step
- Add detailed documentation for StructuralAnalysisStep in pipeline docs. - Document forbidden_keyword test function in input_output template docs. - Update pipeline overview with structural analysis phase. Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>
Context
Enables Abstract Syntax Tree parsing and querying with
ast-grepto allow better and reliable some static code analyzes with a step beforeCreate
ForbiddenKeywordTestbased on the AST to check behavior and easy of use.Solution
Create a
StructuralAnalysisStepbeforeGradingStepthat parses every submission code file (based on heuristic, can be improved) and fallback toNonewhen fails.Further clarifications
It was considered a lazy parse somehow but both
ast-grepandtree-sitterlibraries were made with performance in mind and so even with 1MB ofJavaScriptsingle file code, it takes ~100ms, therefore there is no need to create a Lazy mechanism that would not improve much of the Pipeline execution time.Related issues
Close #251
Checklist