-
Notifications
You must be signed in to change notification settings - Fork 109
Description
🏥 CI Failure Investigation - Run #34708
Summary
Integration: Workflow Cache finished running its integration suite but actions/upload-artifact failed with a 403 while finalizing the generated artifact, so the downstream canary_go coverage job could not download the cache results and marked the cache tests as missing.
Failure Details
- Run: 21872492469
- Commit: 41c500a
- Trigger: push
Root Cause Analysis
Integration: Workflow Cache writes test-result-integration-Workflow Cache.json and immediately uploads it, but Azure returned Failed to FinalizeArtifact: Received non-retryable error: Failed request: (403) Forbidden while actions/upload-artifact@... was finalizing the blob (log around 2026-02-10T16:06:33.6820072Z). Because the artifact never finished uploading, the canary_go job had no JSON file to read for that test group, so compare-test-coverage.sh treated the four cache tests as never executed.
Failed Jobs and Errors
- Integration: Workflow Cache –
actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02step (finalizingtest-result-integration-Workflow Cache) failed with403 Forbidden. - canary_go –
./scripts/compare-test-coverage.sh all-tests.txt executed-tests.txterrored with❌ FAILURE: Found 4 tests that are NOT being executed in CIlistingTestCacheMemoryMultipleIntegration,TestCacheMemoryRestoreOnly,TestCacheMemoryWithThreatDetection, andTestCacheSupport. Those names belong to the Workflow Cache group whose artifact never reached the coverage job.
Investigation Findings
- The cache tests themselves succeeded—
go testproduced JSON output—but artifact finalization aborted with 403, so the JSON file was never stored. canary_godepends on each integration job’s artifact to know which tests ran; missing artifacts are interpreted as missing tests, so the coverage comparison script fails fast with the four cache tests as soon as it cannot find theWorkflow Cacheartifact.
Recommended Actions
- Re-run the workflow so that the artifact upload can finish successfully and the coverage job can consume
test-result-integration-Workflow Cache. If the 403 was transient, everything will pass. - If 403 entries continue to recur, capture the failing upload logs and consider retrying the upload or using a more resilient artifact ingestion strategy (e.g., retry wrapper or custom upload that can troubleshoot 403s).
Prevention Strategies
- Detect missing artifacts before running
compare-test-coverage.shand either skip the comparison or surface a clear error so the root cause (missing artifact) is obvious instead of just missing-test names. - Add retries around
upload-artifactsteps that feed downstream collectors so transient storage/auth errors don’t cascade into coverage failures.
AI Team Self-Improvement
When you wire up coverage/metric jobs, explicitly verify that every actions/upload-artifact step succeeded before letting downstream jobs consume its data; treat 403 FinalizeArtifact responses as fatal and re-run the upload or the workflow.
Historical Context
No open [CI Failure Doctor] issue currently matches run #34708. The only open issue with that label (#14809) describes a different timeout during the frontmatter hash test, so this 403 artifact upload + coverage failure is a new occurrence.
AI generated by CI Failure Doctor
To add this workflow in your repository, run
gh aw add githubnext/agentics/workflows/ci-doctor.md@ea350161ad5dcc9624cf510f134c6a9e39a6f94d. See usage guide.
- expires on Feb 11, 2026, 4:19 PM UTC