Skip to content

Harden live-test download flakiness handling#243

Closed
kyle-messier wants to merge 2 commits into
mainfrom
fix/live-flakiness-hardening
Closed

Harden live-test download flakiness handling#243
kyle-messier wants to merge 2 commits into
mainfrom
fix/live-flakiness-hardening

Conversation

@kyle-messier
Copy link
Copy Markdown
Member

Summary

  • harden download_drought(source='spei') to retry across SPEI fallback endpoints when one endpoint preflight/download fails
  • change download_gridmet() and download_terraclimate() first-URL preflight from hard stop to warning, then proceed with direct download attempt
  • add shared live-test transient flakiness helpers and apply them to drought/gridMET/TerraClimate live tests
  • add regression tests for SPEI retry behavior and preflight-warning fallback paths

Validation

  • Rscript -e "devtools::test(filter = 'drought|gridmet|terraclimate')"
  • AMADEUS_LIVE_TESTS=true Rscript -e "devtools::test(filter = 'drought-live|gridmet-live|terraclimate-live')"
  • Rscript -e "lintr::lint_package()"

Retry SPEI download across fallback endpoints when preflight or single-endpoint attempts fail. Relax gridMET/TerraClimate first-URL preflight from hard stop to warning and proceed with direct download attempts. Add shared live-test transient skip helpers and apply them to drought/gridMET/TerraClimate live tests with new regression coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 26, 2026 22:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens flaky behavior of network-dependent download functions and their live tests. download_drought(source = "spei") now iterates through SPEI endpoint candidates and retries on any preflight failure or failed > 0 result, while download_gridmet()/download_terraclimate() downgrade their first-URL preflight failure from a hard stop to a warning before attempting the actual download. Shared live-test helpers detect transient/upstream-side conditions and skip such cases instead of failing.

Changes:

  • Convert gridMET/TerraClimate preflight 404 stop() into a warning + direct-download fallback, with mocked regression tests.
  • Rework SPEI download to try all candidate URLs (in preferred order) on both preflight failure and failed > 0 results, and update the terminal error message; add a mocked retry test.
  • Add is_transient_live_issue / skip_if_transient_live_issue helpers and apply per-downloader wrappers to drought/gridMET/TerraClimate live tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
R/download.R gridMET/TerraClimate preflight downgraded to warning; SPEI now iterates candidate URLs and inspects success/failed; updated error message.
tests/testthat/helper-skips.R New transient-issue detector and skip helper for live tests.
tests/testthat/test-drought.R New mocked test verifying SPEI retries the next endpoint after a failed download result.
tests/testthat/test-drought-live.R Adds expect_live_download_files wrapper that skips on transient errors/warnings or zero-success results.
tests/testthat/test-gridmet.R New mocked test for warning-and-continue path on preflight failure.
tests/testthat/test-gridmet-live.R Wraps live downloads in expect_gridmet_live_download to skip on transient errors.
tests/testthat/test-terraclimate.R New mocked test for warning-and-continue path on preflight failure.
tests/testthat/test-terraclimate-live.R Wraps live downloads in expect_terraclimate_live_download to skip on transient errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +16
expect_gridmet_live_download <- function(expr, dir) {
tryCatch(
expr,
error = function(e) {
skip_if_transient_live_issue(e)
stop(e)
}
)
files <- list.files(dir, recursive = TRUE, full.names = TRUE)
testthat::expect_gt(length(files), 0)
testthat::expect_gt(sum(file.info(files)$size > 0), 0)
}
Comment on lines +5 to +16
expect_terraclimate_live_download <- function(expr, dir) {
tryCatch(
expr,
error = function(e) {
skip_if_transient_live_issue(e)
stop(e)
}
)
files <- list.files(dir, recursive = TRUE, full.names = TRUE)
testthat::expect_gt(length(files), 0)
testthat::expect_gt(sum(file.info(files)$size > 0), 0)
}
Avoid interactive NASA token prompt during covr package_coverage runs by skipping the setup_nasa_token interactive branch test when AMADEUS_COVERAGE_CI=true.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kyle-messier added a commit that referenced this pull request May 27, 2026
Switching covr to type='tests' surfaced a pre-existing test failure
that does not occur under type='all' (likely a different covr harness).
Run 742 on main with type='all' completed in 5min, so the 6h hang in
PR #243 was sporadic, not deterministic. The 45-minute job timeout
added in this PR is sufficient to bound any future hang.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants