Retro agent should assess fix proportionality before proposing changes for single-instance findings

## What happened

The retro agent ran on [PR #2457](https://github.com/fullsend-ai/fullsend/pull/2457) and identified a single false-positive review finding: the cross-repo-contracts sub-agent flagged `upload-artifact@v7` paired with `download-artifact@v8` as version skew. The retro agent filed [issue #2499](https://github.com/fullsend-ai/fullsend/issues/2499) proposing a fix that hardcoded specific GitHub Action pairs as exceptions. The triage agent auto-labeled it `ready-to-code` ([run 27952788002](https://github.com/fullsend-ai/.fullsend/actions/runs/27952788002)), the code agent created [PR #2500](https://github.com/fullsend-ai/fullsend/pull/2500) ([run 27953328035](https://github.com/fullsend-ai/.fullsend/actions/runs/27953328035)) adding 20 lines of hardcoded guidance, and the human maintainer closed it in ~4 minutes with feedback: "not needed, too specific." The entire pipeline consumed tokens across 3 agent runs (retro, triage, code) plus triggered a review agent run that was skipped because the PR was already closed.

## What could go better

The retro agent lacked a proportionality filter. It observed a single false-positive finding on one PR and proposed a hardcoded exception list — a fix whose maintenance cost exceeds the cost of the occasional false positive it suppresses. The agent should have asked: (1) How frequently does this false positive occur? (A single instance on one PR is low-frequency.) (2) Is the proposed fix generalizable, or does it require ongoing maintenance (hardcoded lists)? (3) Does the cost of the fix (prompt bloat, maintenance burden, review overhead) exceed the cost of the problem (one low-severity informational finding)?

Confidence: **High** that a proportionality check would have prevented this issue from being filed. The human's rejection was immediate and clear — this class of narrow fix is not worth the overhead. Uncertainty: I haven't verified whether other retro-filed issues exhibit the same pattern of proposing hardcoded exceptions for single-instance findings, though the existence of issues #1775 and #2047 suggests it's recurring.

## Proposed change

Add proportionality guidance to the retro agent definition (likely `agents/retro.md` or the `retro-analysis` skill) that instructs the agent to evaluate before proposing:

1. **Frequency check:** Has the problem occurred on multiple PRs, or is it a single instance? Single-instance findings with low severity should generally be skipped unless the fix is trivial and generalizable.
2. **Fix generalizability:** Does the proposed fix apply broadly (e.g., improving a heuristic) or narrowly (e.g., hardcoding specific exceptions)? Prefer generalizable fixes; skip or downgrade proposals that require maintaining hardcoded lists.
3. **Cost-benefit:** Is the maintenance cost of the fix (prompt bloat, ongoing updates, review overhead) proportional to the cost of the problem (frequency × severity)?

Add language like: "Do not propose fixes for single-instance, low-severity false positives unless the fix is generalizable and low-maintenance. A hardcoded exception list for a one-time false positive is not proportional — skip the proposal and note in your summary that the finding was observed but not worth fixing."

## Validation criteria

Over the next 10 retro agent runs on fullsend-ai/fullsend, the retro agent should not file proposals for single-instance low-severity findings that require hardcoded exceptions. When it encounters such findings, it should note them in its summary as "observed but not proportional to fix." Measure: zero human-rejected PRs originating from retro-filed issues about single-instance false positives in the next 30 days.

---
_Generated by retro agent from https://github.com/fullsend-ai/fullsend/pull/2500_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retro agent should assess fix proportionality before proposing changes for single-instance findings #2502

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Retro agent should assess fix proportionality before proposing changes for single-instance findings #2502

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions