Skip to content

fix(#1835): require file reads before asserting contents in findings#2443

Open
fullsend-ai-coder[bot] wants to merge 1 commit into
mainfrom
agent/1835-verify-file-contents-before-asserting
Open

fix(#1835): require file reads before asserting contents in findings#2443
fullsend-ai-coder[bot] wants to merge 1 commit into
mainfrom
agent/1835-verify-file-contents-before-asserting

Conversation

@fullsend-ai-coder

Copy link
Copy Markdown
Contributor

The review agent hallucinated file contents on PR konflux-ci/konflux-test#833, claiming a Dockerfile contained
--nogpgcheck when it never did. The root cause was that the
code-review skill and correctness sub-agent had no explicit requirement to read files outside the PR diff before asserting what they contain.

Changes:

  • code-review SKILL.md step 2: added cross-file verification
    bullet requiring the agent to read any file it references
    in a finding, even if not in the diff.
  • code-review SKILL.md step 4: added cross-file finding
    self-check requiring verification that referenced files
    were read before finalizing findings.
  • correctness sub-agent: added cross-file verification
    section with the same read-before-assert requirement.

Note: make lint could not run due to sandbox network restrictions preventing shellcheck installation.


Closes #1835

Post-script verification

  • Branch is not main/master (agent/1835-verify-file-contents-before-asserting)
  • Secret scan passed (gitleaks — 59159d05e2cb01f868054a4c7143303d6049214b..HEAD)
  • Pre-commit hooks passed (authoritative run on runner)
  • Tests ran inside sandbox

The review agent hallucinated file contents on PR
konflux-ci/konflux-test#833, claiming a Dockerfile contained
--nogpgcheck when it never did. The root cause was that the
code-review skill and correctness sub-agent had no explicit
requirement to read files outside the PR diff before asserting
what they contain.

Changes:
- code-review SKILL.md step 2: added cross-file verification
  bullet requiring the agent to read any file it references
  in a finding, even if not in the diff.
- code-review SKILL.md step 4: added cross-file finding
  self-check requiring verification that referenced files
  were read before finalizing findings.
- correctness sub-agent: added cross-file verification
  section with the same read-before-assert requirement.

Note: make lint could not run due to sandbox network
restrictions preventing shellcheck installation.

Closes #1835
@github-actions

Copy link
Copy Markdown

E2E tests did not run

E2E tests run automatically for org/repo members and collaborators on pull requests.

For other contributors, a maintainer must add the ok-to-test label after the latest push.

See E2E testing guide for details.

@github-actions

Copy link
Copy Markdown

Site preview

Preview: https://01f829bf-site.fullsend-ai.workers.dev

Commit: 0f5066f722c1862e4ccc8e160b10f8125fc66a91

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 18, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 7:56 PM UTC · Completed 8:05 PM UTC
Commit: 0f5066f · View workflow run →

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@fullsend-ai-review

Copy link
Copy Markdown

Review

Findings

Medium

  • [scope-completeness] internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/security.md — The security sub-agent makes factual assertions about files outside the PR diff (auth middleware, RBAC config, IAM policies, workflow files) but lacks the same cross-file verification checkpoint added to correctness. The security agent's exploration budget instructs it to "Read the full file for every changed auth/authorization module" and "Read related config files," but there is no self-check step before recording findings. The root cause from Review agent should verify file contents before asserting facts about them #1835 could manifest in the security dimension. See also: [consumer-completeness] finding on correctness.md.
    Remediation: Add a cross-file verification section to security.md similar to the one added in correctness.md. Place it after the exploration budget section and before the fail-open evaluation section.

Low

  • [consumer-completeness] internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/correctness.md:76 — The cross-file verification instruction is added to the correctness sub-agent but not to the security or docs-currency sub-agents. The SKILL.md additions do not reach sub-agents dispatched by the pr-review orchestrator because the orchestrator's spawn prompt composes from sub-agent definition + meta-prompt + context package only — code-review/SKILL.md is not included. Security is the higher-risk gap given its exploration budget explicitly instructs reading external files. See also: [scope-completeness] finding on security.md.

  • [scope-completeness] internal/scaffold/fullsend-repo/skills/pr-review/sub-agents/docs-currency.md — The docs-currency sub-agent searches documentation files for stale references and could theoretically make assertions about documentation contents without reading files first. The risk is lower than security/correctness because the sub-agent's core task is grep-based searching which inherently requires reading. No immediate action required; monitor for hallucinations in this dimension.


Labels: PR modifies review agent skill and sub-agent definitions under the scaffold harness.

@fullsend-ai-review fullsend-ai-review Bot added requires-manual-review Review requires human judgment component/harness Agent harness, config, and skills loading labels Jun 18, 2026
@ben-alkov

Copy link
Copy Markdown
Member

/fs-fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/harness Agent harness, config, and skills loading requires-manual-review Review requires human judgment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review agent should verify file contents before asserting facts about them

1 participant