-
Notifications
You must be signed in to change notification settings - Fork 292
TRT-2487: job-run-aggregator: fail on minimum of 3 failures #4894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
TRT-2487: job-run-aggregator: fail on minimum of 3 failures #4894
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@petr-muller: This pull request references TRT-2487 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughIntroduces a pityFactor helper that relaxes strictRequiredNumberOfPasses derived from historical data; updates percentile-with-grace and CheckFailed paths to use strict → pity-derived required passes and optionally include a pity message; expands tests to cover pity-factor scenarios. (33 words) Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing touches
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: petr-muller The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go (1)
500-555: Control flow and Go version issues inpityFactor()require fixes.
When
numberOfAttempts < 3, the currentpityFactor()returnsrequiredNumberOfPasses = 0(viamax(0, numberOfAttempts - 3)), causing the function to skip (line 508) before the explicit "require at least three attempts" failure check (line 518). This is a semantic inversion.The implementation uses
min()/max()builtins, which require Go 1.21+ (released August 8, 2023). Verify your repository'sgo.modGo version is compatible; older versions will fail to compile.Confirm the threshold semantics: the code allows up to 3 failures (i.e.,
requiredPasses = numberOfAttempts - 3). Align this with PR objectives if needed.The proposed fix resolves these by returning early from
pityFactor()whennumberOfAttempts < 3, replacingmin()/max()with explicit conditionals, and only generating pity messages when actual relaxation occurs.
🤖 Fix all issues with AI agents
In @pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go:
- Around line 207-218: The inline comment for the "Test 80th Percentile Pass"
case is inconsistent with the test data: it says “with pity factor allowing 3
failures” but the case sets failedCount: 5 and still expects a pass; update the
comment to reflect the actual rule validated by this case (e.g., change to
“allowing 5 failures” or remove the incorrect “3 failures” phrase) and ensure it
aligns with the fields in this test case (name: "Test 80th Percentile Pass",
disruptions: []int{...}, thresholdPercentile: 80, historicalDisruption: 1,
failedCount: 5, successCount: 5).
🧹 Nitpick comments (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (1)
60-73: Update test comments (and optionally assertsummary) to avoid misleading “pity factor” attribution.These cases say “Required Passes … (with pity factor …)”, but they don’t actually assert the pity-factor behavior/message, and the “required passes” value may simply be the strict table value (especially after removing the historical “-1” adjustment). Consider either:
- adjusting comments to “strict required passes is …” (and only mention pity when it actually changes the outcome), and/or
- adding a
require.Contains(t, summary, "...pity factor...")assertion in at least one case where pity is expected to apply (strict > required).Also applies to: 75-88
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go (1)
623-648: Gate pity-factor messaging on “did we relax?”, not on “passes < strict”.In
CheckFailed, the currentif numberOfPasses < strictRequiredNumberOfPasses { ... }is an indirect proxy; it can hide/show the message based on outcomes rather than whether pity actually changed the requirement. Consider keying offrequiredNumberOfPasses < strictRequiredNumberOfPasses(or havepityFactorreturn an empty msg when no relaxation, as in the diff above).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.gopkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.gopkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR modifies the job run aggregation logic to be less sensitive to infrastructure noise and flaky tests by introducing a "pity factor" that requires at least 3 failures before rejecting payloads. The change removes the previous unconditional -1 adjustment to required passes and replaces it with a more consistent policy that allows a minimum of 3 failures across all tests.
Key changes:
- Introduces a
pityFactorfunction that ensures at least 3 failures are required before aggregation fails - Removes the previous
-1adjustment logic from disruption test aggregation - Updates test cases to reflect the new pass/fail thresholds
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go | Implements the new pityFactor function and integrates it into both CheckPercentileDisruption and CheckFailed methods, replacing the previous -1 adjustment |
| pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go | Updates test cases with new expected pass/fail counts and comments to reflect the pity factor logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
Outdated
Show resolved
Hide resolved
bb5b1f7 to
b1662e9
Compare
b1662e9 to
95148bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go:
- Around line 61-88: Update the stale inline comments that state “Required
Passes … is 6/4” to reflect the new strict+pity behavior for the affected test
cases (e.g., the cases named "Test 95th Percentile Fuzzy Pass" and "Test 95th
Percentile Multi Fuzzy Pass" and the similar cases around lines referenced
208-218); change the comments so they state the correct required pass counts and
clarify that the pity factor allows the specified number of failures (match
comments to the actual values: required passes = 7 with pity allowing 3
failures, or whatever the updated calculation yields) so the commentary matches
the test data and expected success/failure counts.
In @pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go:
- Around line 505-507: The access
requiredPassesByPassPercentageByNumberOfAttempts[numberOfAttempts][workingPercentage]
can panic for out-of-range numberOfAttempts or workingPercentage; add explicit
bounds checks before indexing (validate numberOfAttempts is within the outer
slice/map keys and workingPercentage is within the inner slice length), log a
warning/trace with the offending values (numberOfAttempts, workingPercentage)
and skip processing or return a safe default when invalid, and only call
pityFactor(numberOfAttempts, strictRequiredNumberOfPasses) after the validated
value is retrieved; update the code around the strictRequiredNumberOfPasses
assignment where requiredPassesByPassPercentageByNumberOfAttempts is referenced
to include these guards and early-return/continue behavior.
🧹 Nitpick comments (4)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go (2)
540-554: Pity-factor messaging is well-scoped to “pass due to relaxation”, but add a separator for readability.Right now the summary becomes
failed X times(strict ...)(no space). Consider prefixing the message with a leading space when it’s non-empty.Small readability tweak
- if numberOfPasses < strictRequiredNumberOfPasses { - pityFactorMsg = fmt.Sprintf("(%s)", pityFactorMsg) + if numberOfPasses < strictRequiredNumberOfPasses { + pityFactorMsg = fmt.Sprintf(" (%s)", pityFactorMsg) } else { pityFactorMsg = "" }
623-648: Same: consider a space before the appended pity-factor message inCheckFailedpass summaries.Small readability tweak
- if numberOfPasses < strictRequiredNumberOfPasses { - pityFactorMsg = fmt.Sprintf("(%s)", pityFactorMsg) + if numberOfPasses < strictRequiredNumberOfPasses { + pityFactorMsg = fmt.Sprintf(" (%s)", pityFactorMsg) } else { pityFactorMsg = "" }pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (2)
295-396: Good coverage added, but a couple cases don’t actually exercise pity (strict may already allow the outcome).E.g., the “10 attempts: 7 passes, 3 failures” case may pass even without pity depending on the required-pass table. Consider switching those cases to a stricter historical percent (e.g., 99/100) so
strictRequiredNumberOfPasses > attempts-3and the test truly validates the relaxation. Also consider usingcontext.Background()instead ofnilfor thectxargument.
398-479: Nice internal test, but it can be tightened to directly prove pity is applied.Same idea: pick a percentile/required-pass combination where strict would fail but pity passes (or assert on the presence of the pity message in
summarywhenpasses < strict). AlsohistoricalDisruptionStatisticis currently unused in this test and can be dropped to reduce noise.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.gopkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.gopkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: CodeQL analysis (go)
- GitHub Check: Agent
🔇 Additional comments (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (1)
9-11: Import addition makes sense for new pity-factor tests.pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go (1)
651-659: [rewritten review comment]
[classification tag]
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
Outdated
Show resolved
Hide resolved
95148bc to
27e47ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ), nil | ||
| } | ||
|
|
||
| // pityFactor relaxes required success rate to always pass on 2 and fewer failures to reduce aggregation fails caused by |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "pass on 2 and fewer failures" but the implementation uses a constant failurePityFactor = 3, which allows exactly 3 failures, not 2 and fewer. The comment should say "pass on 3 and fewer failures" to match the implementation.
| // pityFactor relaxes required success rate to always pass on 2 and fewer failures to reduce aggregation fails caused by | |
| // pityFactor relaxes required success rate to always pass on 3 and fewer failures to reduce aggregation fails caused by |
27e47ac to
2a49b2d
Compare
|
/hold I have generated the tests with Claude and I saw it struggling with them enough to not trust them, so I want to review them |
|
/test images |
|
Scheduling required tests: Scheduling tests matching the |
2a49b2d to
ac57e75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (1)
548-559: Remove unusedhistoricalDisruptionStatisticvariable.The
historicalDisruptionStatisticstruct is created and populated but never used. TheinnerCheckPercentileDisruptionWithGracefunction receivestest.historicalDisruptiondirectly as a float value.🧹 Suggested cleanup
t.Run(test.name, func(t *testing.T) { jobRunIDToAvailabilityResultForBackend := createJobRunIDToAvailabilityResultForBackend(test.disruptions) - historicalDisruptionStatistic := backendDisruptionStats{ - percentileByIndex: make([]float64, 100), - } - historicalDisruptionStatistic.percentileByIndex[test.thresholdPercentile] = test.historicalDisruption requiredPasses, failureJobRunIDs, successJobRunIDs, status, summary := weeklyAverage.innerCheckPercentileDisruptionWithGrace( jobRunIDToAvailabilityResultForBackend, test.historicalDisruption, test.thresholdPercentile, test.graceSeconds, )
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.gopkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: CodeQL analysis (go)
- GitHub Check: Agent
🔇 Additional comments (3)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (3)
61-88: LGTM on updated test expectations.The adjusted
requiredPassesvalues and corresponding disruption arrays correctly reflect the new pity factor calculation logic. The inline comments accurately describe the expected behavior.
295-486: Well-structured test coverage for pity factor logic.The test comprehensively covers:
- Various working percentages (70%, 80%, 90%, 95%, 100%)
- Different attempt counts (5, 10, 12)
- Edge cases where pity factor doesn't relax the requirement
- Boundary conditions (exactly meeting vs. failing requirement)
The inline descriptions clearly document the expected
strictandpitycalculations, making the test self-documenting.
207-219: LGTM on the 80th percentile test update.The adjusted test data correctly reflects the new required passes calculation, with the disruption array and expected counts properly aligned.
Our current aggregation logic is too sensitive, leading to the rejection of payloads for non-regressions. Analysis shows that a significant portion of rejected payloads are failing due to infrastructure noise or existing flakes rather than genuine code regressions. We have component readiness as a backstop to identify regressions with greater sample sizes.
I do not _entirely_ understand why the relaxation for disruption tests exists here but the TODO exists this may be worth tightening. The `pityFactor` will relax the threshold in some cases (high attempts, high pass rate) so the additional relaxation would only apply in the remaining cases.
ac57e75 to
f180675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (1)
488-571: Test structure looks solid.The direct testing of
innerCheckPercentileDisruptionWithGraceprovides good coverage of the core pity factor logic. Consider adding a case with non-zerograceSecondsto verify the interaction between grace period and pity factor, if not already covered in the other disruption tests.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.gopkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go
🧬 Code graph analysis (1)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail.go (1)
TestKey(166-169)pkg/jobrunaggregator/jobrunaggregatorapi/types_row_aggregatedtestrun.go (1)
AggregatedTestRunRow(9-22)
🔇 Additional comments (2)
pkg/jobrunaggregator/jobrunaggregatoranalyzer/pass_fail_test.go (2)
61-88: LGTM on test data adjustments.The updated required passes values and disruption arrays align with the new pity factor semantics. Comments accurately document the expected behavior.
446-478: The test key construction is correct - no mismatch exists.The mock data (
CombinedTestSuiteName: "test-suite") matches whatCheckFailedwill look up. The implementation usestestCaseDetails.TestSuiteNamedirectly (line 607 of pass_fail.go), not thesuiteNamesparameter. Since the test setstestCaseDetails.TestSuiteName: "test-suite", the key lookup succeeds as intended. ThesuiteNamesparameter is accepted but unused in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/retest-required |
|
Scheduling required tests: Scheduling tests matching the |
|
@petr-muller: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Our current aggregation logic is too sensitive, leading to the rejection of payloads for non-regressions. Analysis shows that a significant portion of rejected payloads are failing due to infrastructure noise or existing flakes rather than genuine code regressions.
We have component readiness as a backstop to identify regressions with greater sample sizes.
I also added a commit to remove the existing
-1adjustment in disruption test aggregtation that would now only apply in cases the threshold is not already relaxed by the pity factor. I do not entirely understand why it existed but there is a todo that suggests we may eventually remove it. If we see this causing problems we may revert the commit.