Skip to content

ci: centralize PR test classification in pr-test-summary (follow-up to #1194) #1223

@rgsl888prabhu

Description

@rgsl888prabhu

Background

PR #1194 adds a per-failure classifier for PR test results, post­ing a sticky comment that splits failures into NEW (likely introduced by this PR) vs. KNOWN (recurring/flaky on nightly). To do that classification, each per-matrix test container sources ci/utils/nightly_report_helper.sh in PR mode, downloads the target branch's nightly failure history from S3, and uploads its own per-matrix summary back to a PR-scoped S3 prefix. The pr-test-summary job then aggregates those summaries into the sticky comment.

That works, but it has two architectural awkwardnesses:

  1. Load-bearing fallback for the target branch. GHA leaves github.base_ref empty for push events, and the PR workflow triggers on push to pull-request/[0-9]+ branches (the copy-pr-bot pattern). The rapidsai shared test workflows don't propagate a target branch into the test container, so the helper currently uses ${GITHUB_BASE_REF:-${RAPIDS_BRANCH:-main}} to recover one. See the inline comment at ci/utils/nightly_report_helper.sh:115.
  2. S3 write surface in every PR test container. Each test container needs CUOPT_AWS_* credentials so it can upload its per-matrix summary.

Proposal: centralize PR classification in pr-test-summary

The pr-test-summary job already resolves the target branch via the GitHub API (it has to, in order to render the comment). If we move classification there, both awkwardnesses disappear.

Sketch:

Current (distributed) Proposed (centralized)
Per-matrix test job Sources nightly_report_helper.sh in PR mode, downloads nightly history, classifies, uploads per-matrix summary to S3 Just produces JUnit XML — uploaded as a workflow artifact (actions/upload-artifact)
pr-test-summary Downloads classified summaries from S3, aggregates, posts Downloads JUnit XML artifacts, resolves target branch, runs nightly_report.py --mode pr per matrix against the resolved branch's history, aggregates, posts
CUOPT_AWS_* write secrets in test containers required not required
${GITHUB_BASE_REF:-${RAPIDS_BRANCH:-main}} fallback required removed
nightly_report_helper.sh PR-mode branch present removed

Work items

  • Verify whether the rapidsai shared test workflows (conda-python-tests.yaml, wheels-test.yaml, the custom one used by test-self-hosted-server) already upload RAPIDS_TESTS_DIR/*.xml as workflow artifacts, and confirm the artifact-name convention so pr-test-summary can find them. If they don't, add an actions/upload-artifact step (probably in ci/test_*.sh or as a new workflow step).
  • Refactor pr-test-summary (in .github/workflows/pr_test_summary.yaml and ci/pr_summary.sh):
    • Download every test job's JUnit artifacts.
    • For each (test_type, matrix_label) tuple, invoke nightly_report.py --mode pr pointing at the right s3://.../ci_test_reports/nightly/history/{branch}/{test_type}-{matrix}.json.
    • Aggregate and post (existing renderer is unchanged).
  • Remove the PR-mode branch in ci/utils/nightly_report_helper.sh and the now-unused mode="pr" paths through the helper.
  • Drop CUOPT_AWS_* secrets from PR test job callers in .github/workflows/pr.yaml.
  • Confirm nightly mode is byte-identical in behavior after the change (the helper's nightly branch is untouched by the refactor itself, but the file diff should be reviewed).

Why not in PR #1194

PR #1194 is already large (classifier + renderer + comment poster + workflow wiring + several review iterations). The architecture refactor changes a different axis (where classification runs) and is best kept isolated so its review can focus on the workflow / artifact mechanics rather than on the comment content.

Metadata

Metadata

Assignees

Labels

awaiting responseThis expects a response from maintainer or contributor depending on who requested in last comment.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions