Document the CI failure triage process#5130
Open
btshrewsbury-viam wants to merge 6 commits into
Open
Conversation
Add .github/workflows/ci-failure-triage.md describing how scheduled-job failures flow to ci-failure issues (the report-ci-failure composite action) and how the daily Claude Code triage session turns them into fix PRs or comments, including setup constraints (must run in a repo-scoped session) and the full triage prompt kept in sync with the scheduled trigger. Link it from the workflows README's Failure notifications section and from the top-level README's CI table (updating the stale "N/A (Jira)" blocking notes to "opens ci-failure issue").
✅ Deploy Preview for viam-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
… gate Add two lessons from the first live runs: (1) if an issue's linked PR has failing checks, fix that PR's branch in place instead of skipping it (the skip-if-open-PR rule would otherwise leave a red PR stuck); (2) make the minimal diff and run prettier (a required check) — do not follow markdownlint into formatting that prettier rejects, which had left a PR red.
The scheduled session is long-lived and reused, so its clone can be stale. Add an explicit sync step (git fetch + reset to origin/main; re-fetch before touching a PR branch) and stop assuming a fresh clone. Also make existing-PR discovery robust: a 'Refs #n' mention is not a formal linked-PR, so check the issue's cross-references, comments, and claude/ci-fix-* branches.
Make the automated PRs easy to spot and filter; keep the prefix when updating an existing PR title too.
PR #5133 passed a local `npx prettier --check` but failed CI's prettier check: npx resolved an unpinned `prettier` to a newer local version (3.8.1) that formats blank-lines-before-nested-lists differently than the 3.2.5 that .github/workflows/prettier-lint.yml pins. Pin the pre-PR check commands to prettier@3.2.5 in both CLAUDE.md and the triage prompt, and add a step to the triage prompt to verify the pushed branch's actual CI status rather than trusting a local check alone.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Documents how the scheduled-CI failure reporting + triage works, so others can follow, run, or change it.
.github/workflows/ci-failure-triage.md— the end-to-end flow (scheduled job →report-ci-failurecomposite action →ci-failureissue → daily Claude Code triage session → fix PR or comment), the setup constraints (must run in a repo-scoped session; fresh/unscoped sessions can't reach the repo), the conventions the session follows (verify-before-acting, exhaustive class fixes, verify retargets,FixesvsRefs, CLA-safe commit identity), and the full triage prompt kept in sync with the scheduled trigger..github/workflows/README.md— the Failure notifications section now links to the triage doc and describes the mechanism accurately.README.md— CI table's staleN/A (Jira)blocking notes changed toN/A (opens ci-failure issue), plus a note pointing to the triage doc.Follow-up to #5120 (which removed Jira and added the
ci-failurereporting).Checks
prettier, markdownlint, and vale (error level) pass on all three files.
🤖 Generated with Claude Code