Skip to content

Retro agent should assess fix proportionality before proposing changes for single-instance findings #2502

@fullsend-ai-retro

Description

@fullsend-ai-retro

What happened

The retro agent ran on PR #2457 and identified a single false-positive review finding: the cross-repo-contracts sub-agent flagged upload-artifact@v7 paired with download-artifact@v8 as version skew. The retro agent filed issue #2499 proposing a fix that hardcoded specific GitHub Action pairs as exceptions. The triage agent auto-labeled it ready-to-code (run 27952788002), the code agent created PR #2500 (run 27953328035) adding 20 lines of hardcoded guidance, and the human maintainer closed it in ~4 minutes with feedback: "not needed, too specific." The entire pipeline consumed tokens across 3 agent runs (retro, triage, code) plus triggered a review agent run that was skipped because the PR was already closed.

What could go better

The retro agent lacked a proportionality filter. It observed a single false-positive finding on one PR and proposed a hardcoded exception list — a fix whose maintenance cost exceeds the cost of the occasional false positive it suppresses. The agent should have asked: (1) How frequently does this false positive occur? (A single instance on one PR is low-frequency.) (2) Is the proposed fix generalizable, or does it require ongoing maintenance (hardcoded lists)? (3) Does the cost of the fix (prompt bloat, maintenance burden, review overhead) exceed the cost of the problem (one low-severity informational finding)?

Confidence: High that a proportionality check would have prevented this issue from being filed. The human's rejection was immediate and clear — this class of narrow fix is not worth the overhead. Uncertainty: I haven't verified whether other retro-filed issues exhibit the same pattern of proposing hardcoded exceptions for single-instance findings, though the existence of issues #1775 and #2047 suggests it's recurring.

Proposed change

Add proportionality guidance to the retro agent definition (likely agents/retro.md or the retro-analysis skill) that instructs the agent to evaluate before proposing:

  1. Frequency check: Has the problem occurred on multiple PRs, or is it a single instance? Single-instance findings with low severity should generally be skipped unless the fix is trivial and generalizable.
  2. Fix generalizability: Does the proposed fix apply broadly (e.g., improving a heuristic) or narrowly (e.g., hardcoding specific exceptions)? Prefer generalizable fixes; skip or downgrade proposals that require maintaining hardcoded lists.
  3. Cost-benefit: Is the maintenance cost of the fix (prompt bloat, ongoing updates, review overhead) proportional to the cost of the problem (frequency × severity)?

Add language like: "Do not propose fixes for single-instance, low-severity false positives unless the fix is generalizable and low-maintenance. A hardcoded exception list for a one-time false positive is not proportional — skip the proposal and note in your summary that the finding was observed but not worth fixing."

Validation criteria

Over the next 10 retro agent runs on fullsend-ai/fullsend, the retro agent should not file proposals for single-instance low-severity findings that require hardcoded exceptions. When it encounters such findings, it should note them in its summary as "observed but not proportional to fix." Measure: zero human-rejected PRs originating from retro-filed issues about single-instance false positives in the next 30 days.


Generated by retro agent from #2500

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/retroRetro agentcomponent/harnessAgent harness, config, and skills loadingfeatureFeature-category issue awaiting human prioritizationpriority/mediumNormal priority, plan for next cycletriagedTriaged but awaiting human prioritizationtype/featureNew capability request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions