What happened
PR #145 addressed issue #144 (integration tests fail with docker engine). The code agent produced an initial fix on Jun 15. When humans tested with docker and reported failures, the fix agent ran 4 more iterations (workflow runs 27682539146, 27689678762, 27692981710, 27696404532). Each iteration tried a different strategy (privileged mode, security opts, vfs storage driver) — none worked because the agent could never run the tests with docker. In fix iteration 2, the agent explicitly acknowledged 'Docker is not available in the sandbox environment' but continued producing unvalidated code. The PR was closed without merge after a human reviewer noted the agent was 'throwing shit at the wall to see what sticks.'
What could go better
The agent should have recognized at the outset that the task required docker to validate, detected that docker was absent from the sandbox, and escalated with a clear message rather than producing speculative fixes. This is distinct from existing issue #2117 (which covers unresolvable test failures) — here the tests passed in the sandbox with podman; the problem was that the required validation tool was entirely absent. The agent wasted ~4 fix cycles and ~5 review cycles of token budget on code that was never mergeable. Confidence: HIGH — the agent itself acknowledged the limitation in iteration 2 but was not instructed to stop.
Proposed change
Add a pre-flight check to the code and fix agent definitions (agents/code.md, agents/fix.md) that instructs the agent to: (1) identify what tools or environments are needed to validate the fix based on the issue description, (2) check whether those tools are available in the sandbox, and (3) if a critical validation tool is missing, post a comment explaining what's needed and why the agent cannot proceed, rather than producing unvalidated code. For example, if an issue says 'fails with docker' and docker is not available, the agent should say so and suggest the human either provide docker access or solve this manually. This could be implemented as a prompt section like: 'Before writing code, verify you have the tools needed to test your changes against the reported failure. If the issue requires a specific runtime (e.g., docker, a specific OS, a cloud service) that is not available in your environment, stop and post a comment explaining the limitation instead of producing untestable code.'
Validation criteria
On the next 3 issues where the required validation tool is absent from the sandbox, the code/fix agent should: (1) post a comment identifying the missing tool within the first response, (2) not produce speculative fixes, and (3) not trigger more than 1 fix iteration before escalating. Measure by checking whether closed-without-merge PRs due to sandbox limitations drop to zero over the next 30 days.
Generated by retro agent from konflux-ci/konflux-build-cli#145
What happened
PR #145 addressed issue #144 (integration tests fail with docker engine). The code agent produced an initial fix on Jun 15. When humans tested with docker and reported failures, the fix agent ran 4 more iterations (workflow runs 27682539146, 27689678762, 27692981710, 27696404532). Each iteration tried a different strategy (privileged mode, security opts, vfs storage driver) — none worked because the agent could never run the tests with docker. In fix iteration 2, the agent explicitly acknowledged 'Docker is not available in the sandbox environment' but continued producing unvalidated code. The PR was closed without merge after a human reviewer noted the agent was 'throwing shit at the wall to see what sticks.'
What could go better
The agent should have recognized at the outset that the task required docker to validate, detected that docker was absent from the sandbox, and escalated with a clear message rather than producing speculative fixes. This is distinct from existing issue #2117 (which covers unresolvable test failures) — here the tests passed in the sandbox with podman; the problem was that the required validation tool was entirely absent. The agent wasted ~4 fix cycles and ~5 review cycles of token budget on code that was never mergeable. Confidence: HIGH — the agent itself acknowledged the limitation in iteration 2 but was not instructed to stop.
Proposed change
Add a pre-flight check to the code and fix agent definitions (agents/code.md, agents/fix.md) that instructs the agent to: (1) identify what tools or environments are needed to validate the fix based on the issue description, (2) check whether those tools are available in the sandbox, and (3) if a critical validation tool is missing, post a comment explaining what's needed and why the agent cannot proceed, rather than producing unvalidated code. For example, if an issue says 'fails with docker' and
dockeris not available, the agent should say so and suggest the human either provide docker access or solve this manually. This could be implemented as a prompt section like: 'Before writing code, verify you have the tools needed to test your changes against the reported failure. If the issue requires a specific runtime (e.g., docker, a specific OS, a cloud service) that is not available in your environment, stop and post a comment explaining the limitation instead of producing untestable code.'Validation criteria
On the next 3 issues where the required validation tool is absent from the sandbox, the code/fix agent should: (1) post a comment identifying the missing tool within the first response, (2) not produce speculative fixes, and (3) not trigger more than 1 fix iteration before escalating. Measure by checking whether closed-without-merge PRs due to sandbox limitations drop to zero over the next 30 days.
Generated by retro agent from konflux-ci/konflux-build-cli#145