Background
PR #1194 adds a per-failure classifier for PR test results, posting a sticky comment that splits failures into NEW (likely introduced by this PR) vs. KNOWN (recurring/flaky on nightly). To do that classification, each per-matrix test container sources ci/utils/nightly_report_helper.sh in PR mode, downloads the target branch's nightly failure history from S3, and uploads its own per-matrix summary back to a PR-scoped S3 prefix. The pr-test-summary job then aggregates those summaries into the sticky comment.
That works, but it has two architectural awkwardnesses:
- Load-bearing fallback for the target branch. GHA leaves
github.base_ref empty for push events, and the PR workflow triggers on push to pull-request/[0-9]+ branches (the copy-pr-bot pattern). The rapidsai shared test workflows don't propagate a target branch into the test container, so the helper currently uses ${GITHUB_BASE_REF:-${RAPIDS_BRANCH:-main}} to recover one. See the inline comment at ci/utils/nightly_report_helper.sh:115.
- S3 write surface in every PR test container. Each test container needs
CUOPT_AWS_* credentials so it can upload its per-matrix summary.
Proposal: centralize PR classification in pr-test-summary
The pr-test-summary job already resolves the target branch via the GitHub API (it has to, in order to render the comment). If we move classification there, both awkwardnesses disappear.
Sketch:
|
Current (distributed) |
Proposed (centralized) |
| Per-matrix test job |
Sources nightly_report_helper.sh in PR mode, downloads nightly history, classifies, uploads per-matrix summary to S3 |
Just produces JUnit XML — uploaded as a workflow artifact (actions/upload-artifact) |
pr-test-summary |
Downloads classified summaries from S3, aggregates, posts |
Downloads JUnit XML artifacts, resolves target branch, runs nightly_report.py --mode pr per matrix against the resolved branch's history, aggregates, posts |
CUOPT_AWS_* write secrets in test containers |
required |
not required |
${GITHUB_BASE_REF:-${RAPIDS_BRANCH:-main}} fallback |
required |
removed |
nightly_report_helper.sh PR-mode branch |
present |
removed |
Work items
Why not in PR #1194
PR #1194 is already large (classifier + renderer + comment poster + workflow wiring + several review iterations). The architecture refactor changes a different axis (where classification runs) and is best kept isolated so its review can focus on the workflow / artifact mechanics rather than on the comment content.
Background
PR #1194 adds a per-failure classifier for PR test results, posting a sticky comment that splits failures into NEW (likely introduced by this PR) vs. KNOWN (recurring/flaky on nightly). To do that classification, each per-matrix test container sources
ci/utils/nightly_report_helper.shin PR mode, downloads the target branch's nightly failure history from S3, and uploads its own per-matrix summary back to a PR-scoped S3 prefix. Thepr-test-summaryjob then aggregates those summaries into the sticky comment.That works, but it has two architectural awkwardnesses:
github.base_refempty forpushevents, and the PR workflow triggers onpushtopull-request/[0-9]+branches (the copy-pr-bot pattern). The rapidsai shared test workflows don't propagate a target branch into the test container, so the helper currently uses${GITHUB_BASE_REF:-${RAPIDS_BRANCH:-main}}to recover one. See the inline comment atci/utils/nightly_report_helper.sh:115.CUOPT_AWS_*credentials so it can upload its per-matrix summary.Proposal: centralize PR classification in
pr-test-summaryThe
pr-test-summaryjob already resolves the target branch via the GitHub API (it has to, in order to render the comment). If we move classification there, both awkwardnesses disappear.Sketch:
nightly_report_helper.shin PR mode, downloads nightly history, classifies, uploads per-matrix summary to S3actions/upload-artifact)pr-test-summarynightly_report.py --mode prper matrix against the resolved branch's history, aggregates, postsCUOPT_AWS_*write secrets in test containers${GITHUB_BASE_REF:-${RAPIDS_BRANCH:-main}}fallbacknightly_report_helper.shPR-mode branchWork items
conda-python-tests.yaml,wheels-test.yaml, the custom one used bytest-self-hosted-server) already uploadRAPIDS_TESTS_DIR/*.xmlas workflow artifacts, and confirm the artifact-name convention sopr-test-summarycan find them. If they don't, add anactions/upload-artifactstep (probably inci/test_*.shor as a new workflow step).pr-test-summary(in.github/workflows/pr_test_summary.yamlandci/pr_summary.sh):(test_type, matrix_label)tuple, invokenightly_report.py --mode prpointing at the rights3://.../ci_test_reports/nightly/history/{branch}/{test_type}-{matrix}.json.ci/utils/nightly_report_helper.shand the now-unusedmode="pr"paths through the helper.CUOPT_AWS_*secrets from PR test job callers in.github/workflows/pr.yaml.Why not in PR #1194
PR #1194 is already large (classifier + renderer + comment poster + workflow wiring + several review iterations). The architecture refactor changes a different axis (where classification runs) and is best kept isolated so its review can focus on the workflow / artifact mechanics rather than on the comment content.