Skip to content

Review orchestrator should accept challenger findings when backed by verifiable config evidence #2485

@fullsend-ai-retro

Description

@fullsend-ai-retro

What happened

On PR #501, a Renovate bot dependency update PR, the review agent's third run (workflow 27614150563) flagged 5 MEDIUM-severity stale image references in .tekton/integration/ test files. A challenger sub-agent investigated and correctly determined these files were managed by a separate Renovate update group (evidenced by the repo's renovate.json config). The challenger removed all findings. However, the orchestrator classified the challenger's result as a [sub-agent-failure] and fell back to the champion's pre-challenger finding set, escalating the PR to requires-manual-review. The stale references were resolved by a subsequent Renovate rebase 4 days later, at which point the fourth review run cleared the PR.

What could go better

The challenger sub-agent had the correct analysis backed by verifiable evidence (the renovate.json configuration file), but the orchestrator treated its disagreement with the champion as a failure rather than a valid correction. This caused an unnecessary escalation that blocked the PR for 4 days until the next rebase resolved the issue naturally. The distinction between 'challenger failed' (errored, timed out) and 'challenger disagrees with evidence' (returned valid findings that differ from champion) appears to be missing from the orchestration logic. Confidence: HIGH — the logs explicitly show the challenger provided correct evidence that was discarded. Related issues #1795 and #1774 are adjacent but neither addresses this specific gap: #1795 is about qualifying verdicts when sub-agents fail (model unavailable), not about trusting correct challenger analysis; #1774 is about verifying functional impact before flagging multi-site changes, not about the champion-challenger resolution process.

Proposed change

In the review agent's orchestration logic (likely in the review agent definition or the champion-challenger resolution code), distinguish between two challenger outcomes: (1) challenger failure (error, timeout, invalid output) — fall back to champion findings as today, and (2) challenger disagreement with evidence (valid output that reduces/removes findings, backed by verifiable references like config files) — accept the challenger's findings or at minimum downgrade the champion's severity. A simple heuristic: if the challenger's output references specific config files (e.g., renovate.json, .github/dependabot.yml) that explain why flagged files are out of scope for the PR, treat the challenger's analysis as authoritative for those files.

Validation criteria

On the next 3 Renovate/bot dependency PRs in repos with grouped update configs, the review agent should not escalate to requires-manual-review for stale references in files managed by a different update group. The [sub-agent-failure] info finding should not appear when the challenger returns valid structured output with config-backed evidence.


Generated by retro agent from konflux-ci/konflux-test-tasks#501

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/reviewReview agentcomponent/harnessAgent harness, config, and skills loadingfeatureFeature-category issue awaiting human prioritizationpriority/mediumNormal priority, plan for next cycletriagedTriaged but awaiting human prioritizationtype/featureNew capability request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions