What happened
The retro agent ran on PR #2457 and identified a single false-positive review finding: the cross-repo-contracts sub-agent flagged upload-artifact@v7 paired with download-artifact@v8 as version skew. The retro agent filed issue #2499 proposing a fix that hardcoded specific GitHub Action pairs as exceptions. The triage agent auto-labeled it ready-to-code (run 27952788002), the code agent created PR #2500 (run 27953328035) adding 20 lines of hardcoded guidance, and the human maintainer closed it in ~4 minutes with feedback: "not needed, too specific." The entire pipeline consumed tokens across 3 agent runs (retro, triage, code) plus triggered a review agent run that was skipped because the PR was already closed.
What could go better
The retro agent lacked a proportionality filter. It observed a single false-positive finding on one PR and proposed a hardcoded exception list — a fix whose maintenance cost exceeds the cost of the occasional false positive it suppresses. The agent should have asked: (1) How frequently does this false positive occur? (A single instance on one PR is low-frequency.) (2) Is the proposed fix generalizable, or does it require ongoing maintenance (hardcoded lists)? (3) Does the cost of the fix (prompt bloat, maintenance burden, review overhead) exceed the cost of the problem (one low-severity informational finding)?
Confidence: High that a proportionality check would have prevented this issue from being filed. The human's rejection was immediate and clear — this class of narrow fix is not worth the overhead. Uncertainty: I haven't verified whether other retro-filed issues exhibit the same pattern of proposing hardcoded exceptions for single-instance findings, though the existence of issues #1775 and #2047 suggests it's recurring.
Proposed change
Add proportionality guidance to the retro agent definition (likely agents/retro.md or the retro-analysis skill) that instructs the agent to evaluate before proposing:
- Frequency check: Has the problem occurred on multiple PRs, or is it a single instance? Single-instance findings with low severity should generally be skipped unless the fix is trivial and generalizable.
- Fix generalizability: Does the proposed fix apply broadly (e.g., improving a heuristic) or narrowly (e.g., hardcoding specific exceptions)? Prefer generalizable fixes; skip or downgrade proposals that require maintaining hardcoded lists.
- Cost-benefit: Is the maintenance cost of the fix (prompt bloat, ongoing updates, review overhead) proportional to the cost of the problem (frequency × severity)?
Add language like: "Do not propose fixes for single-instance, low-severity false positives unless the fix is generalizable and low-maintenance. A hardcoded exception list for a one-time false positive is not proportional — skip the proposal and note in your summary that the finding was observed but not worth fixing."
Validation criteria
Over the next 10 retro agent runs on fullsend-ai/fullsend, the retro agent should not file proposals for single-instance low-severity findings that require hardcoded exceptions. When it encounters such findings, it should note them in its summary as "observed but not proportional to fix." Measure: zero human-rejected PRs originating from retro-filed issues about single-instance false positives in the next 30 days.
Generated by retro agent from #2500
What happened
The retro agent ran on PR #2457 and identified a single false-positive review finding: the cross-repo-contracts sub-agent flagged
upload-artifact@v7paired withdownload-artifact@v8as version skew. The retro agent filed issue #2499 proposing a fix that hardcoded specific GitHub Action pairs as exceptions. The triage agent auto-labeled itready-to-code(run 27952788002), the code agent created PR #2500 (run 27953328035) adding 20 lines of hardcoded guidance, and the human maintainer closed it in ~4 minutes with feedback: "not needed, too specific." The entire pipeline consumed tokens across 3 agent runs (retro, triage, code) plus triggered a review agent run that was skipped because the PR was already closed.What could go better
The retro agent lacked a proportionality filter. It observed a single false-positive finding on one PR and proposed a hardcoded exception list — a fix whose maintenance cost exceeds the cost of the occasional false positive it suppresses. The agent should have asked: (1) How frequently does this false positive occur? (A single instance on one PR is low-frequency.) (2) Is the proposed fix generalizable, or does it require ongoing maintenance (hardcoded lists)? (3) Does the cost of the fix (prompt bloat, maintenance burden, review overhead) exceed the cost of the problem (one low-severity informational finding)?
Confidence: High that a proportionality check would have prevented this issue from being filed. The human's rejection was immediate and clear — this class of narrow fix is not worth the overhead. Uncertainty: I haven't verified whether other retro-filed issues exhibit the same pattern of proposing hardcoded exceptions for single-instance findings, though the existence of issues #1775 and #2047 suggests it's recurring.
Proposed change
Add proportionality guidance to the retro agent definition (likely
agents/retro.mdor theretro-analysisskill) that instructs the agent to evaluate before proposing:Add language like: "Do not propose fixes for single-instance, low-severity false positives unless the fix is generalizable and low-maintenance. A hardcoded exception list for a one-time false positive is not proportional — skip the proposal and note in your summary that the finding was observed but not worth fixing."
Validation criteria
Over the next 10 retro agent runs on fullsend-ai/fullsend, the retro agent should not file proposals for single-instance low-severity findings that require hardcoded exceptions. When it encounters such findings, it should note them in its summary as "observed but not proportional to fix." Measure: zero human-rejected PRs originating from retro-filed issues about single-instance false positives in the next 30 days.
Generated by retro agent from #2500