Skip to content

fix(#2054): synthesize review body when findings contradict summary#78

Open
guyoron1 wants to merge 18 commits into
mainfrom
mirror/2189-2054-review-summary-consistency
Open

fix(#2054): synthesize review body when findings contradict summary#78
guyoron1 wants to merge 18 commits into
mainfrom
mirror/2189-2054-review-summary-consistency

Conversation

@guyoron1

Copy link
Copy Markdown
Owner

Mirror of upstream fullsend-ai#2189

Synthesizes review body when findings contradict the summary to ensure consistency.

…t summary

When the review agent produces a result where the action is
request-changes with critical/high findings but the body omits
those findings (e.g. says "No findings"), the sticky comment
misleads reviewers into thinking the review is clean.

The previous approach (PR fullsend-ai#2055, closed) used regex replacement
to patch "No findings" text in-place. This was fragile: the
regex could match inside longer phrases, ReplaceAllString could
duplicate content, and inserting bullet lists mid-sentence
produced malformed markdown.

This fix takes a different approach. Instead of string surgery,
ensureBodyFindingsConsistency checks whether the body references
any critical/high finding categories (case-insensitive substring
match on hyphenated tokens like "logic-error", "auth-bypass").
If none are referenced, the entire body is replaced with one
synthesized from the structured findings array using the standard
review format from the pr-review skill.

The pr-review skill is also updated with an explicit instruction
that when action is request-changes or reject, the body MUST
list the findings — fixing the issue closer to the source while
the CLI provides a safety net.

Note: pre-commit could not run in the sandbox due to shellcheck
network restrictions (infrastructure issue, not code issue).

Closes fullsend-ai#2054
@guyoron1

Copy link
Copy Markdown
Owner Author

/fs-qf

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 3:03 PM UTC · Completed 3:12 PM UTC
Commit: 0535520 · View workflow run →

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review

Findings

High

  • [protected-path] CLAUDE.md — This PR deletes CLAUDE.md, which is a protected path requiring human approval. The PR has no linked issue and the description does not explain why the protected file is being removed. Protected-path changes always require human review regardless of context.
    Remediation: Either create a linked issue authorizing the CLAUDE.md removal with rationale, or split the CLAUDE.md deletion into a separate PR with explicit justification. The review body synthesis fix can proceed independently.

Medium

  • [scope-exceeded] CLAUDE.md — The PR title describes a bug fix for review body synthesis (fix(#2054)), but the diff also deletes CLAUDE.md. This is an unrelated change bundled into the fix PR. The deleted file is a 3-line stub pointing to AGENTS.md, so the impact is minimal, but scope creep complicates review and history tracking.
  • [logic-error] internal/cli/postreview.go:571 — In synthesizeReviewBody, when a finding has an empty Category field, the output produces malformed Markdown: - **[]** — description. The ensureBodyFindingsConsistency function correctly skips empty-category findings during the consistency check, but synthesizeReviewBody includes all findings regardless of category presence, creating an inconsistency between the two functions.
    Remediation: Handle the empty-category case by omitting the **[...]** wrapper or using a fallback label like **[uncategorized]**.

Low

  • [edge-case] internal/cli/postreview.go:549 — The body-consistency check uses strings.Contains for category substring matching. While the code comment explains that hyphenated tokens like logic-error are specific enough to avoid false matches, a theoretical false negative could occur with short category names. The failure mode is safe (unnecessary body replacement), so this is a minor robustness concern.
  • [injection-vuln] internal/cli/postreview.gosynthesizeReviewBody interpolates finding fields (Category, File, Description, Remediation) directly into Markdown without sanitization. Risk is low because findings originate from the agent's own structured output and GitHub sanitizes rendered Markdown. This follows the same pattern as the existing formatFindingComment function.
  • [tier-mismatch] — PR is labeled as fix but introduces 662 additions including new functions and a new test file. The bulk (~550 lines) is tests, and the logic change is ~100 lines, so fix is defensible, but the scope is closer to a refactor.

Info

  • [naming-convention] internal/cli/qf_body_consistency_test.go — The qf_ filename prefix is not established in the codebase. All 20 existing test files in internal/cli/ follow the <source_file>_test.go pattern. Consider merging into postreview_test.go or documenting the new convention.
  • [pattern-inconsistency] internal/cli/qf_body_consistency_test.go:11 — Test comments use TS-GH-78-001 style IDs not found in any other test file in the package. If these link to an external test specification system, a comment explaining the convention would help.
Previous run

Review

Findings

High

  • [protected-path] CLAUDE.md — This PR deletes CLAUDE.md, which is a protected governance file. The PR description does not explain why this file is being deleted, and there is no linked issue authorizing the change. Protected-path changes always require human approval regardless of context.
    Remediation: Either remove the CLAUDE.md deletion from this PR and handle it in a dedicated PR with proper justification, or add an explanation in the PR description and link an authorizing issue.

Medium

  • [logic-error] internal/cli/postreview.gosynthesizeReviewBody silently drops findings with unrecognized severity values. The order slice only contains ["critical", "high", "medium", "low", "info"] — findings with any other severity (e.g., a typo like "crtical", an empty string, or a future severity level) are grouped into the map but never iterated, so they disappear from the synthesized body without warning. Since this function is a safety net for consistency, silently swallowing findings defeats its purpose.
    Remediation: After the ordered-group loop, check whether any findings remain ungrouped. Either append them under an "Other" section or log a warning.

  • [stale-reference] hack/lint-mint-embed-sync:5 — Stale reference to removed CLAUDE.md. Line 5 contains # See CLAUDE.md for details on the mint/embed sync requirement. which will point to a nonexistent file after this PR merges.
    Remediation: Update the comment to reference AGENTS.md or remove the reference.

  • [logic-error] qf-tests/GH-2054/go/body_consistency_test.go:1 — Test files under qf-tests/GH-2054/go/ declare package cli but are not located in the internal/cli directory. Go requires test files to be co-located with the package they test for white-box access to unexported symbols. These files reference unexported functions (ensureBodyFindingsConsistency, synthesizeReviewBody) and types (ReviewResult, ReviewFinding), so they cannot compile.
    Remediation: Remove these files (the co-located internal/cli/qf_body_consistency_test.go already covers the same scenarios) or move them into internal/cli/ with appropriate qf_ prefixes.

  • [stale-doc] docs/guides/user/customizing-with-agents-md.md:14 — This user guide recommends keeping CLAUDE.md lightweight and pointing it to AGENTS.md, with example CLAUDE.md content shown. Since CLAUDE.md is deleted in this PR, the guidance is inconsistent with this repo's own practices.
    Remediation: Update the guide to note that CLAUDE.md is optional, or clarify that the guide describes downstream repo patterns.

Low

  • [test-gap] internal/cli/postreview_test.go — No test covers the case where a finding has an empty Category string. An empty category causes strings.Contains(bodyLower, "") to always return true per Go spec, effectively disabling the safety net for any finding set where at least one critical/high finding has an empty category.
  • [edge-case] internal/cli/postreview.go — The body-category consistency check uses strings.Contains for substring matching. Categories are typically hyphenated tokens which mitigates false matches, but single-word categories remain vulnerable.
  • [scope-creep] — PR bundles QualityFlow test infrastructure (qf-tests/ directory) and CLAUDE.md deletion alongside the core bug fix, which are tangentially related changes.
  • [architectural-inconsistency] CLAUDE.md — Deleted without explanation in the PR description. The content largely duplicates AGENTS.md.
  • [design-alignment] internal/cli/postreview.go — The PR actually addresses both layers (SKILL.md update to prevent at source + CLI safety net), which is defense-in-depth rather than symptom-chasing. Consider adding a comment noting this is intentional.
  • [test-organization] internal/cli/qf_body_consistency_test.go — Partially overlaps with tests already in postreview_test.go. Consider consolidating.
  • [content-injection] internal/cli/postreview.gosynthesizeReviewBody interpolates finding fields into Markdown without escaping, but this is the same trust level as the existing parsed.Body path. Defense-in-depth improvement, not a new attack surface.

Info

  • [commit-hygiene] — PR title references #2054 but the issue exists in fullsend-ai/fullsend, not guyoron1/fullsend. Ambiguous in fork context.
  • [upstream-mirror-policy] — No documented fork synchronization policy.
Previous run (2)

Review

Reason: stale-head

The review agent reviewed commit 1cf4907c382bc976ec4c525b2fe0d9895990ce18 but the PR HEAD is now 9ff7aa05d9908751f60ffa42edcd68c9f29d0ac8. This review was discarded to avoid approving unreviewed code.

Previous run (3)

Review

Reason: stale-head

The review agent reviewed commit 4a1c03f8284bb80b0f377a7d4d8084fa95eca6a7 but the PR HEAD is now 1cf4907c382bc976ec4c525b2fe0d9895990ce18. This review was discarded to avoid approving unreviewed code.

Previous run (4)

Review

Reason: stale-head

The review agent reviewed commit d38e97c2b1235920386fb631862ab6a64cd1065a but the PR HEAD is now 2235687f695434c7f3366f3e6952a4126395df16. This review was discarded to avoid approving unreviewed code.

Previous run (5)

Review

Findings

Low

  • [edge-case] internal/cli/postreview.go:549 — When all critical/high findings have an empty Category field, the consistency check loop never matches (because of the f.Category != "" guard), and the body is unconditionally replaced. The synthesized body renders empty category brackets (- **[]**) via synthesizeReviewBody because the template unconditionally writes f.Category between **[ and ]** markers.
    Remediation: Either treat empty-category findings as inherently consistent (skip them in the significant slice), or handle the empty-category case in synthesizeReviewBody by omitting the brackets or using a fallback label like uncategorized.

Info

Previous run (6)

Review

Reason: stale-head

The review agent reviewed commit 178656f95d5e9fdd12269b18c6b04eb26e263dad but the PR HEAD is now c70a3de6704dc308185b8e1774654ef2b18e17bc. This review was discarded to avoid approving unreviewed code.

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 3:25 PM UTC · Completed 3:34 PM UTC
Commit: 0535520 · View workflow run →

// Check whether the body already references any significant finding.
// A body is considered consistent if it mentions at least one
// critical/high finding's category. Categories are hyphenated tokens
// like "logic-error", "auth-bypass", "missing-test" — specific enough

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

When all critical/high findings have an empty Category field, the consistency check loop never matches (because of the f.Category != empty-string guard), and the body is unconditionally replaced. The synthesized body renders empty category brackets via synthesizeReviewBody.

Suggested fix: Either treat empty-category findings as inherently consistent (skip them in the significant slice), or handle the empty-category case in synthesizeReviewBody by omitting the brackets or using a fallback label like uncategorized.

// Check whether the body already references any significant finding.
// A body is considered consistent if it mentions at least one
// critical/high finding's category. Categories are hyphenated tokens
// like "logic-error", "auth-bypass", "missing-test" — specific enough

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] false-negative

Single-word categories can naturally appear in well-written review prose, causing the consistency check to consider the body consistent even when it does not actually describe the specific finding. Practical risk is negligible.

@fullsend-ai-review fullsend-ai-review Bot added the ready-for-merge All reviewers approved — ready to merge label Jun 21, 2026
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 3:39 PM UTC · Completed 3:50 PM UTC
Commit: 0535520 · View workflow run →

QualityFlow and others added 2 commits June 21, 2026 15:41
Generated 22 Go unit tests from STD YAML for ensureBodyFindingsConsistency()
and synthesizeReviewBody() covering body-verdict consistency checks,
severity ordering, file location rendering, and edge cases.

Co-Authored-By: QualityFlow[bot] <qualityflow[bot]@users.noreply.github.com>
Replaces intermediate pipeline artifacts with organized test files.

Total: 2 test files → qf-tests/fullsend-aiGH-2054/
Jira: fullsend-aiGH-2054
[skip ci]
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

QualityFlow Pipeline Summary

Stage Agent Status
1 STP Builder
2 STP Reviewer
3 STP Refiner
4 STD Builder
5 STD Reviewer
6 STD Refiner
7 Test Generator

Test Output

Language Count Location
Go 1 files internal/cli/qf_*_test.go

Issue: GH-78


Generated by QualityFlow

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@guyoron1

Copy link
Copy Markdown
Owner Author

/fs-qf

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 6:56 AM UTC · Completed 7:07 AM UTC
Commit: 6ead6d0 · View workflow run →

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 7:11 AM UTC · Completed 7:25 AM UTC
Commit: 6ead6d0 · View workflow run →

@fullsend-ai-review

Copy link
Copy Markdown

/fs-review

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 7:28 AM UTC · Completed 8:02 AM UTC
Commit: 6ead6d0 · View workflow run →

QualityFlow and others added 3 commits June 22, 2026 07:32
- Add v2.1-enhanced fields (patterns, variables, test_structure, code_structure) to all 17 scenarios
- Remove related_prs and source_bugs from document_metadata (content policy)
- Add testify imports to all Go stub files with blank-reference guards
- Normalize test_type casing to lowercase across all scenarios
- Verdict upgraded from APPROVED_WITH_FINDINGS (5 major) to APPROVED (0 major)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes intermediate pipeline artifacts (STP, STD, reviews).
Test files (1) are co-located in source tree with qf_ prefix.
Jira: GH-78
[skip ci]

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

@@ -0,0 +1,330 @@
package cli

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] logic-error

Test files under qf-tests/fullsend-aiGH-2054/go/ declare package cli but are not located in internal/cli directory. These files reference unexported functions and types, so they cannot compile outside internal/cli.

Suggested fix: Remove these files or move them into internal/cli/ with qf_ prefixes.

@fullsend-ai-review fullsend-ai-review Bot removed the ready-for-merge All reviewers approved — ready to merge label Jun 22, 2026
Co-located tests (qf_* prefix) are now in source package directories.
The qf-tests/ directory contained non-compiling tests from the old pipeline.
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 22, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 8:33 AM UTC · Completed 8:48 AM UTC
Commit: 6ead6d0 · View workflow run →

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

func synthesizeReviewBody(findings []ReviewFinding) string {
// Group findings by severity.
order := []string{"critical", "high", "medium", "low", "info"}
groups := make(map[string][]ReviewFinding)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] logic-error

In synthesizeReviewBody, when a finding has an empty Category field, the output produces malformed Markdown: - [] — description. The ensureBodyFindingsConsistency function correctly skips empty-category findings during the consistency check, but synthesizeReviewBody includes all findings regardless of category presence, creating an inconsistency between the two functions.

Suggested fix: Handle the empty-category case by omitting the [...] wrapper or using a fallback label like [uncategorized].

// Check whether the body already references any significant finding.
// A body is considered consistent if it mentions at least one
// critical/high finding's category. Categories are hyphenated tokens
// like "logic-error", "auth-bypass", "missing-test" — specific enough

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

The body-consistency check uses strings.Contains for category substring matching. While the code comment explains that hyphenated tokens are specific enough to avoid false matches, a theoretical false negative could occur with short category names. The failure mode is safe (unnecessary body replacement).

"github.com/stretchr/testify/require"
)

// TestEnsureBodyFindingsConsistency_QF covers the 17 STD scenarios for

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] pattern-inconsistency

Test comments use TS-GH-78-001 style IDs not found in any other test file in the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant