fix(#2054): synthesize review body when findings contradict summary#78
fix(#2054): synthesize review body when findings contradict summary#78guyoron1 wants to merge 18 commits into
Conversation
…t summary When the review agent produces a result where the action is request-changes with critical/high findings but the body omits those findings (e.g. says "No findings"), the sticky comment misleads reviewers into thinking the review is clean. The previous approach (PR fullsend-ai#2055, closed) used regex replacement to patch "No findings" text in-place. This was fragile: the regex could match inside longer phrases, ReplaceAllString could duplicate content, and inserting bullet lists mid-sentence produced malformed markdown. This fix takes a different approach. Instead of string surgery, ensureBodyFindingsConsistency checks whether the body references any critical/high finding categories (case-insensitive substring match on hyphenated tokens like "logic-error", "auth-bypass"). If none are referenced, the entire body is replaced with one synthesized from the structured findings array using the standard review format from the pr-review skill. The pr-review skill is also updated with an explicit instruction that when action is request-changes or reject, the body MUST list the findings — fixing the issue closer to the source while the CLI provides a safety net. Note: pre-commit could not run in the sandbox due to shellcheck network restrictions (infrastructure issue, not code issue). Closes fullsend-ai#2054
|
/fs-qf |
|
🤖 Finished Review · ✅ Success · Started 3:03 PM UTC · Completed 3:12 PM UTC |
ReviewFindingsHigh
Medium
Low
Info
Previous runReviewFindingsHigh
Medium
Low
Info
Previous run (2)ReviewReason: stale-head The review agent reviewed commit Previous run (3)ReviewReason: stale-head The review agent reviewed commit Previous run (4)ReviewReason: stale-head The review agent reviewed commit Previous run (5)ReviewFindingsLow
Info
Previous run (6)ReviewReason: stale-head The review agent reviewed commit |
|
/fs-review |
|
🤖 Finished Review · ✅ Success · Started 3:25 PM UTC · Completed 3:34 PM UTC |
| // Check whether the body already references any significant finding. | ||
| // A body is considered consistent if it mentions at least one | ||
| // critical/high finding's category. Categories are hyphenated tokens | ||
| // like "logic-error", "auth-bypass", "missing-test" — specific enough |
There was a problem hiding this comment.
[low] edge-case
When all critical/high findings have an empty Category field, the consistency check loop never matches (because of the f.Category != empty-string guard), and the body is unconditionally replaced. The synthesized body renders empty category brackets via synthesizeReviewBody.
Suggested fix: Either treat empty-category findings as inherently consistent (skip them in the significant slice), or handle the empty-category case in synthesizeReviewBody by omitting the brackets or using a fallback label like uncategorized.
| // Check whether the body already references any significant finding. | ||
| // A body is considered consistent if it mentions at least one | ||
| // critical/high finding's category. Categories are hyphenated tokens | ||
| // like "logic-error", "auth-bypass", "missing-test" — specific enough |
There was a problem hiding this comment.
[info] false-negative
Single-word categories can naturally appear in well-written review prose, causing the consistency check to consider the body consistent even when it does not actually describe the specific finding. Practical risk is negligible.
|
🤖 Finished Review · ✅ Success · Started 3:39 PM UTC · Completed 3:50 PM UTC |
Generated 22 Go unit tests from STD YAML for ensureBodyFindingsConsistency() and synthesizeReviewBody() covering body-verdict consistency checks, severity ordering, file location rendering, and edge cases. Co-Authored-By: QualityFlow[bot] <qualityflow[bot]@users.noreply.github.com>
Replaces intermediate pipeline artifacts with organized test files. Total: 2 test files → qf-tests/fullsend-aiGH-2054/ Jira: fullsend-aiGH-2054 [skip ci]
QualityFlow Pipeline Summary
Test Output
Issue: GH-78 Generated by QualityFlow |
|
/fs-review |
|
/fs-qf |
|
🤖 Finished Review · ✅ Success · Started 6:56 AM UTC · Completed 7:07 AM UTC |
|
/fs-review |
|
🤖 Finished Review · ✅ Success · Started 7:11 AM UTC · Completed 7:25 AM UTC |
|
/fs-review |
|
🤖 Finished Review · ✅ Success · Started 7:28 AM UTC · Completed 8:02 AM UTC |
- Add v2.1-enhanced fields (patterns, variables, test_structure, code_structure) to all 17 scenarios - Remove related_prs and source_bugs from document_metadata (content policy) - Add testify imports to all Go stub files with blank-reference guards - Normalize test_type casing to lowercase across all scenarios - Verdict upgraded from APPROVED_WITH_FINDINGS (5 major) to APPROVED (0 major) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes intermediate pipeline artifacts (STP, STD, reviews). Test files (1) are co-located in source tree with qf_ prefix. Jira: GH-78 [skip ci]
| @@ -0,0 +1,330 @@ | |||
| package cli | |||
There was a problem hiding this comment.
[medium] logic-error
Test files under qf-tests/fullsend-aiGH-2054/go/ declare package cli but are not located in internal/cli directory. These files reference unexported functions and types, so they cannot compile outside internal/cli.
Suggested fix: Remove these files or move them into internal/cli/ with qf_ prefixes.
Co-located tests (qf_* prefix) are now in source package directories. The qf-tests/ directory contained non-compiling tests from the old pipeline.
|
🤖 Finished Review · ✅ Success · Started 8:33 AM UTC · Completed 8:48 AM UTC |
| func synthesizeReviewBody(findings []ReviewFinding) string { | ||
| // Group findings by severity. | ||
| order := []string{"critical", "high", "medium", "low", "info"} | ||
| groups := make(map[string][]ReviewFinding) |
There was a problem hiding this comment.
[medium] logic-error
In synthesizeReviewBody, when a finding has an empty Category field, the output produces malformed Markdown: - [] — description. The ensureBodyFindingsConsistency function correctly skips empty-category findings during the consistency check, but synthesizeReviewBody includes all findings regardless of category presence, creating an inconsistency between the two functions.
Suggested fix: Handle the empty-category case by omitting the [...] wrapper or using a fallback label like [uncategorized].
| // Check whether the body already references any significant finding. | ||
| // A body is considered consistent if it mentions at least one | ||
| // critical/high finding's category. Categories are hyphenated tokens | ||
| // like "logic-error", "auth-bypass", "missing-test" — specific enough |
There was a problem hiding this comment.
[low] edge-case
The body-consistency check uses strings.Contains for category substring matching. While the code comment explains that hyphenated tokens are specific enough to avoid false matches, a theoretical false negative could occur with short category names. The failure mode is safe (unnecessary body replacement).
| "github.com/stretchr/testify/require" | ||
| ) | ||
|
|
||
| // TestEnsureBodyFindingsConsistency_QF covers the 17 STD scenarios for |
There was a problem hiding this comment.
[info] pattern-inconsistency
Test comments use TS-GH-78-001 style IDs not found in any other test file in the package.
Mirror of upstream fullsend-ai#2189
Synthesizes review body when findings contradict the summary to ensure consistency.