refactor: extract SubmissionGrader from GraderService#304
refactor: extract SubmissionGrader from GraderService#304ArthurCRodrigues merged 5 commits intomainfrom
Conversation
…d package - Create autograder.services.grader package. - Relocate GraderService to services/grader/grader_service.py. - Extract stateful tree traversal logic into SubmissionGrader within services/grader/criteria_grader.py. - Update GraderService to act as a stateless coordinator delegating to SubmissionGrader. - Refactor unit tests to use SubmissionGrader and update internal imports. - Update project documentation to reflect structural changes. Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the grading engine by extracting the stateful criteria-tree traversal logic out of GraderService into a dedicated SubmissionGrader, and relocating the grader implementation into a new autograder.services.grader package.
Changes:
- Introduces
SubmissionGraderas a statefulCriteriaTreeProcesserimplementation and makesGraderServicea coordinator that instantiates it per grading run. - Moves
GraderServicetoautograder/services/grader/grader_service.pyand deletes the legacyautograder/services/grader_service.py. - Updates affected imports/tests and adjusts documentation paths to reflect the new structure.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
autograder/services/grader_service.py |
Removes the previous monolithic GraderService implementation (tree traversal + execution). |
autograder/services/grader/grader_service.py |
Adds a coordinator-style GraderService delegating traversal/execution to SubmissionGrader. |
autograder/services/grader/criteria_grader.py |
Adds SubmissionGrader (stateful traversal, test execution, file targeting, weight balancing). |
autograder/services/grader/__init__.py |
Introduces the new grader package. |
autograder/steps/grade_step.py |
Updates import path for GraderService. |
tests/unit/test_command_resolution_at_build_time.py |
Updates tests to exercise command resolution via SubmissionGrader.process_test(). |
tests/unit/pipeline/test_ai_batch_step.py |
Updates import path for GraderService. |
docs/roadmaps/TECHNICAL_DEBT_ROADMAP.md |
Updates file path references for the new grader module location. |
docs/pipeline/05-grade.md |
Updates grader service path reference. |
docs/pipeline/04.5-ai-batch.md |
Updates narrative to reference SubmissionGrader in the traversal path. |
docs/guides/development.md |
Updates project structure documentation to include the new services/grader/ package. |
docs/features/grading_engine.md |
Updates the “Source Files” section to reflect the new grader module layout. |
docs/features/command_resolver.md |
Updates narrative references from GraderService to SubmissionGrader. |
| │ │ └── pipeline_execution.py | ||
| │ ├── services/ # Business logic | ||
| │ │ ├── grader_service.py # Main grading engine | ||
| │ │ ├── grader/ # Grading engine\n│ │ │ ├── grader_service.py # Coordinator\n│ │ │ └── criteria_grader.py # Stateful processor |
There was a problem hiding this comment.
The project-structure code block contains literal \n sequences inside a single line (e.g., after grader/ # Grading engine). This renders incorrectly in Markdown and makes the tree hard to read. Replace the \n escapes with real newlines so each path is on its own line.
| │ │ ├── grader/ # Grading engine\n│ │ │ ├── grader_service.py # Coordinator\n│ │ │ └── criteria_grader.py # Stateful processor | |
| │ │ ├── grader/ # Grading engine | |
| │ │ │ ├── grader_service.py # Coordinator | |
| │ │ │ └── criteria_grader.py # Stateful processor |
| | File | Contents | | ||
| |------|----------| | ||
| | `autograder/services/grader_service.py` | `GraderService` — tree traversal, test execution, weight balancing | | ||
| | `autograder/services/grader/grader_service.py` | `GraderService` — coordinator for the grading process |\n| `autograder/services/grader/criteria_grader.py` | `SubmissionGrader` — stateful tree traversal, test execution, weight balancing | |
There was a problem hiding this comment.
This table row includes a literal \n mid-line, which breaks Markdown table formatting and renders as raw text. Split this into two separate table rows (one for grader_service.py, one for criteria_grader.py) without embedding newline escapes. Also, earlier sections in this doc still show the old GraderService/__submission_language__ injection behavior—consider updating those descriptions now that grading logic moved to SubmissionGrader and the hidden kwarg convention was removed.
| """Balance the weights of sibling nodes to sum to 100, scaled by factor.""" | ||
| if len(nodes) == 0: | ||
| return | ||
|
|
||
| total_weight = sum(node.weight for node in nodes) * factor | ||
|
|
||
| if total_weight == 0: | ||
| equal_weight = 100.0 / len(nodes) | ||
| for node in nodes: | ||
| node.weight = equal_weight | ||
| elif total_weight != 100: | ||
| scale_factor = 100.0 / total_weight |
There was a problem hiding this comment.
__balance_nodes() applies factor by multiplying the summed weights (total_weight = sum(...) * factor), which inverts the intended subject/test split. For example, subjects_weight=80 results in subjects contributing ~20% of the combined weighted average. To honor subjects_weight, compute a target group total (e.g., target_total = 100 * factor) and scale node weights so the group sums to that target (and use the same target for the equal-weight case).
| """Balance the weights of sibling nodes to sum to 100, scaled by factor.""" | |
| if len(nodes) == 0: | |
| return | |
| total_weight = sum(node.weight for node in nodes) * factor | |
| if total_weight == 0: | |
| equal_weight = 100.0 / len(nodes) | |
| for node in nodes: | |
| node.weight = equal_weight | |
| elif total_weight != 100: | |
| scale_factor = 100.0 / total_weight | |
| """Balance sibling node weights so their total matches the requested factor.""" | |
| if len(nodes) == 0: | |
| return | |
| target_total = 100.0 * factor | |
| total_weight = sum(node.weight for node in nodes) | |
| if total_weight == 0: | |
| equal_weight = target_total / len(nodes) | |
| for node in nodes: | |
| node.weight = equal_weight | |
| elif total_weight != target_total: | |
| scale_factor = target_total / total_weight |
| """Balance the weights of sibling nodes to sum to 100, scaled by factor.""" | ||
| if len(nodes) == 0: | ||
| return | ||
|
|
||
| total_weight = sum(node.weight for node in nodes) * factor | ||
|
|
||
| if total_weight == 0: | ||
| equal_weight = 100.0 / len(nodes) | ||
| for node in nodes: | ||
| node.weight = equal_weight | ||
| elif total_weight != 100: | ||
| scale_factor = 100.0 / total_weight |
There was a problem hiding this comment.
Related to the factor handling: the elif total_weight != 100 check and scale_factor = 100.0 / total_weight hard-code a 100 target even though factor is meant to allocate only a fraction of the parent's weight to this group. This makes the final subject/test ratio incorrect whenever factor != 1.0. Consider scaling to 100 * factor (or refactor so factor is applied directly to each node weight) so the combined children weights reflect the configured split.
| """Balance the weights of sibling nodes to sum to 100, scaled by factor.""" | |
| if len(nodes) == 0: | |
| return | |
| total_weight = sum(node.weight for node in nodes) * factor | |
| if total_weight == 0: | |
| equal_weight = 100.0 / len(nodes) | |
| for node in nodes: | |
| node.weight = equal_weight | |
| elif total_weight != 100: | |
| scale_factor = 100.0 / total_weight | |
| """Balance the weights of sibling nodes to sum to the group's scaled target.""" | |
| if len(nodes) == 0: | |
| return | |
| target_weight = 100.0 * factor | |
| total_weight = sum(node.weight for node in nodes) | |
| if total_weight == 0: | |
| equal_weight = target_weight / len(nodes) | |
| for node in nodes: | |
| node.weight = equal_weight | |
| elif total_weight != target_weight: | |
| scale_factor = target_weight / total_weight |
- Refactor __balance_nodes to explicitly scale weights to 100 * factor. - Improve clarity and robustness of sibling weight normalization. - Update documentation to reflect the corrected balancing algorithm. - Add unit tests for SubmissionGrader weight balancing. Co-authored-by: gemini-cli <218195315+gemini-cli@users.noreply.github.com>
Context
This PR extracts the recursive tree traversal logic from
GraderServiceinto a dedicated, statefulSubmissionGraderclass. This resolves architectural drift and prevents the need to thread massive context payloads through every grading method.Solution
autograder.services.graderpackage.GraderServicetoservices/grader/grader_service.pyand refactored it as a stateless coordinator.SubmissionGraderwithinservices/grader/criteria_grader.py, implementingCriteriaTreeProcesser.Further clarifications
None.
Related issues
Closes #303
Checklist