Review orchestrator should accept challenger findings when backed by verifiable config evidence

## What happened

On [PR #501](https://github.com/konflux-ci/konflux-test-tasks/pull/501), a Renovate bot dependency update PR, the review agent's third run ([workflow 27614150563](https://github.com/konflux-ci/.fullsend/actions/runs/27614150563)) flagged 5 MEDIUM-severity stale image references in `.tekton/integration/` test files. A challenger sub-agent investigated and correctly determined these files were managed by a separate Renovate update group (evidenced by the repo's `renovate.json` config). The challenger removed all findings. However, the orchestrator classified the challenger's result as a `[sub-agent-failure]` and fell back to the champion's pre-challenger finding set, escalating the PR to `requires-manual-review`. The stale references were resolved by a subsequent Renovate rebase 4 days later, at which point the fourth review run cleared the PR.

## What could go better

The challenger sub-agent had the correct analysis backed by verifiable evidence (the `renovate.json` configuration file), but the orchestrator treated its disagreement with the champion as a failure rather than a valid correction. This caused an unnecessary escalation that blocked the PR for 4 days until the next rebase resolved the issue naturally. The distinction between 'challenger failed' (errored, timed out) and 'challenger disagrees with evidence' (returned valid findings that differ from champion) appears to be missing from the orchestration logic. Confidence: HIGH — the logs explicitly show the challenger provided correct evidence that was discarded. Related issues [#1795](https://github.com/fullsend-ai/fullsend/issues/1795) and [#1774](https://github.com/fullsend-ai/fullsend/issues/1774) are adjacent but neither addresses this specific gap: #1795 is about qualifying verdicts when sub-agents fail (model unavailable), not about trusting correct challenger analysis; #1774 is about verifying functional impact before flagging multi-site changes, not about the champion-challenger resolution process.

## Proposed change

In the review agent's orchestration logic (likely in the review agent definition or the champion-challenger resolution code), distinguish between two challenger outcomes: (1) challenger failure (error, timeout, invalid output) — fall back to champion findings as today, and (2) challenger disagreement with evidence (valid output that reduces/removes findings, backed by verifiable references like config files) — accept the challenger's findings or at minimum downgrade the champion's severity. A simple heuristic: if the challenger's output references specific config files (e.g., `renovate.json`, `.github/dependabot.yml`) that explain why flagged files are out of scope for the PR, treat the challenger's analysis as authoritative for those files.

## Validation criteria

On the next 3 Renovate/bot dependency PRs in repos with grouped update configs, the review agent should not escalate to `requires-manual-review` for stale references in files managed by a different update group. The `[sub-agent-failure]` info finding should not appear when the challenger returns valid structured output with config-backed evidence.

---
_Generated by retro agent from https://github.com/konflux-ci/konflux-test-tasks/pull/501_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review orchestrator should accept challenger findings when backed by verifiable config evidence #2485

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Review orchestrator should accept challenger findings when backed by verifiable config evidence #2485

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions