Skip to content

Migrate#211

Merged
kyle-messier merged 299 commits into
mainfrom
migrate
May 20, 2026
Merged

Migrate#211
kyle-messier merged 299 commits into
mainfrom
migrate

Conversation

@kyle-messier
Copy link
Copy Markdown
Member

@kyle-messier kyle-messier commented Feb 27, 2026

download_* functions are being migrated fully to httr2 package

kyle-messier and others added 14 commits March 7, 2026 11:49
- Document narr variable reference addition (closes #194)
- Document narr variable download test coverage
- Document download_data edgar dispatch fix

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add two @note entries to download_narr():
- Points to the PSL NARR Dailies server as the data source
- Explains that the 88 supported variables are the complete set
  available as individual NetCDF files on PSL; additional variables
  in the NARR archive (cloud water, ice mixing ratio, friction
  velocity, momentum fluxes, static land/soil fields) exist only
  in raw merged GRIB files and cannot be downloaded with this
  function

Remove the nolint-wrapped PDF href from @param (PDF uses wgrib
uppercase abbreviations that differ from PSL NetCDF filenames and
caused user confusion per issue #194).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oad functions

- Add rate_limit=2 parameter to download_ecoregion, download_nlcd,
  download_groads, download_population, and download_koppen_geiger,
  replacing the previously hardcoded literal value passed to
  download_run_method()
- Add rate_limit=2 and nasa_earth_data_token=NULL as explicit parameters
  to the download_data() wrapper with Roxygen @param documentation
- Refactor download_data() body to use do.call() so rate_limit is always
  forwarded; nasa_earth_data_token is conditionally forwarded only to
  functions whose formals() include it (geos, merra2, modis), preventing
  unexpected-argument errors on non-NASA datasets
- Add 56 new unit tests covering: formals checks on all 20 download
  functions for rate_limit; formals check on wrapper for both new params;
  token routing logic; rate_limit pass-through via wrapper

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…arams

Run devtools::document() to sync Rd files with updated Roxygen2 source.
Six man pages were stale after the previous parameter additions:
- download_data.Rd: add rate_limit and nasa_earth_data_token
- download_ecoregion.Rd: add rate_limit
- download_nlcd.Rd: add rate_limit
- download_groads.Rd: add rate_limit
- download_population.Rd: add rate_limit
- download_koppen_geiger.Rd: add rate_limit

Fixes R CMD check warning: 'Argument names in code not in docs: rate_limit'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CMR URL filter only matched '.hdf' extensions, causing VNP46A2
(VIIRS Black Marble) granules to be silently dropped and triggering
the 'No granules found' error. VNP46A2 files use HDF5 (.h5) format,
so extend the regex to match both .hdf and .h5.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…calar year URLs

- download_run_method: add retry_on_failure=TRUE for transport-level errors
  (SSL drops, connection resets) and http_version param for server
  compatibility; exposed via new http_version= argument
- download_nlcd: pass http_version=2L to force HTTP/1.1, avoiding nghttp2
  PROTOCOL_ERROR from www.mrlc.gov
- download_nei: fix URL construction for scalar year input; previously when
  called with year=2017 alone (as dispatched by targets pattern=map()),
  both 2017 and 2020 URL suffixes were appended to the 2017 base URL giving
  a 404; replace with named-vector lookup via vapply so each year maps to
  its correct URL independently; also fix sprintf format from %d to %s
- tests: add unit tests for download_run_method http_version/retry_on_failure,
  download_nlcd HTTP/1.1 passthrough, and download_nei scalar/vector year
  URL construction and unknown-year error

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gROADS data is hosted on data.earthdata.nasa.gov which requires Bearer
token auth. download_groads previously hardcoded token=NULL, causing
HTTP 401 on every download attempt (silent failure: targets marked job
as complete with empty output directory).

- Add nasa_earth_data_token param with get_token() retrieval from
  NASA_EARTHDATA_TOKEN env var (matching pattern used by download_geos,
  download_merra2, download_modis)
- Pass token to download_run_method
- Update @note and @param in Roxygen docs
- Add 4 unit tests: param exists, token passed through, env-var lookup,
  and existing file skip path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
check_url_status made a single httr2 HEAD request with no retry logic.
Any transient SSL drop or connection reset caused it to return FALSE,
which downstream callers (download_aqs, download_merra2, etc.) treat
as an invalid URL and raise an error -- even when the URL is perfectly
valid (confirmed HTTP 200 via curl).

- Add max_tries=3L parameter and req_retry(retry_on_failure=TRUE) to
  the HEAD request so transient transport errors are retried before
  concluding the URL is bad
- Add 2 unit tests verifying the new parameter and retry_on_failure flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… mismatch

New max_tries parameter added in prior commit was missing from Roxygen
docs, causing 'Codoc mismatches' WARNING in R CMD check. Add @param
and @importFrom httr2 req_retry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
download_groads now requires nasa_earth_data_token for NASA EarthData
authentication. Update the test that checks which functions accept this
parameter to reflect the new groads requirement.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ownload_run_method

- download_modis CMR granule query: add req_retry(retry_on_failure=TRUE,
  max_tries=3L) + req_timeout(60) so transient DNS timeouts do not
  permanently fail the entire MODIS download

- download_run_method: cap exponential backoff at 30 s (was httr2 default
  60 s) and add req_options(connecttimeout=30L) so each connection
  attempt times out in 30 s rather than hanging; prevents cascading
  slowdowns when DNS resolution is intermittently unreliable

- tests: add connecttimeout assertion and CMR body inspection test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add nasa_earth_data_token param (default NULL)
- Call get_token() to fetch from env if not provided
- Pass token to download_run_method instead of hardcoded NULL
- Move download_population from non_nasa_fns to nasa_fns in tests
- Regenerated man/download_population.Rd

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Wrap CMR query in tryCatch with actionable error pointing to
  NASA EarthData status page
- Add req_options(connecttimeout=30L) for fast DNS failure detection
- Increase max_tries 3->5 and req_timeout 60->120 for slow networks
- Add live CMR tests (skip_if_offline) for MOD11A1 and MCD19A2 that
  verify granule URLs are returned for a known date/bbox
- Update body test: check connecttimeout, tryCatch, and that
  timeout is not the old 60s value

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pipe continuations inside tryCatch must be indented 6 spaces
per lintr indentation_linter rules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the migration of download_* functionality to httr2, updates documentation/vignettes to match the new authentication + download flow, and substantially refactors/expands the test suite to align with the new “URL discovery + httr2 download runner” pattern.

Changes:

  • Update vignettes and Rd docs to remove legacy wget/curl command-file workflow and document NASA token setup via setup_nasa_token() / get_token().
  • Refactor many testthat tests to assert returned URL lists, add deprecation-warning coverage, and add mock-download coverage for hash branches.
  • Add agent prompt files + update repo tooling/config (pkgdown nav, lintr config, CI env vars, build ignores).

Reviewed changes

Copilot reviewed 86 out of 89 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
vignettes/terraclimate_workflow.Rmd Removes legacy args from examples; keeps workflow aligned with new download behavior.
vignettes/protected_datasets.Rmd Rewrites NASA auth guidance around tokens + setup_nasa_token().
vignettes/narr_workflow.Rmd Removes legacy download/remove_command args from examples.
vignettes/modis_workflow.Rmd Updates env var name + removes legacy args from examples/output.
vignettes/gridmet_workflow.Rmd Removes legacy download/remove_command args from examples.
vignettes/epa_download.Rmd Reworks vignette into download_data()-first workflow and modernizes chunk options.
tests/testthat/test-tri.R Updates TRI tests to assert URL discovery + adds mock download/hash + deprecation coverage.
tests/testthat/test-terraclimate.R Updates TerraClimate tests to assert URL discovery + adds mock download/hash + deprecation coverage.
tests/testthat/test-prism.R Refactors PRISM tests for URL discovery + adds mock download branches.
tests/testthat/test-koppen-geiger.R Simplifies download tests + adds deprecation warning + mock download branches.
tests/testthat/test-huc.R Refactors HUC tests for URL discovery + adds deprecation + mock download branches.
tests/testthat/test-hms.R Refactors HMS tests, improves live-test skipping, and adds additional branch coverage.
tests/testthat/test-groads.R Refactors gROADS tests and adds NASA token plumbing coverage.
tests/testthat/test-gridmet.R Refactors gridMET tests and adds deprecation + mock download branches.
tests/testthat/test-gmted.R Heavily refactors GMTED tests; adds deprecation + mock download branches and new URL parsing path.
tests/testthat/test-geos.R Refactors GEOS tests to use NASA token env var + adds mock download/hash coverage.
tests/testthat/test-ecoregion.R Expands ecoregion tests (URL discovery, structure, live, integration, branches).
tests/testthat/test-cropscape.R Refactors CropScape tests and adds deprecation + mock download/hash coverage.
tests/testthat/NA Empty file added (likely accidental placeholder).
man/test.Rd Removes internal helper doc.
man/setup_nasa_token.Rd Adds docs for token setup helper.
man/interactive.Rd Removes internal helper doc.
man/get_token.Rd Adds docs for token resolution helper.
man/download_tri.Rd Updates TRI download docs for new download defaults/arguments.
man/download_terraclimate.Rd Updates TerraClimate download docs for new defaults/arguments.
man/download_run_method.Rd Adds docs for the httr2-based download runner.
man/download_run.Rd Marks legacy download runner as deprecated in docs.
man/download_prism.Rd Updates PRISM docs (new unzip/remove_zip args + httr2 settings).
man/download_population.Rd Updates population docs to include token handling + new args.
man/download_nlcd.Rd Updates NLCD docs for new defaults and argument text.
man/download_nei.Rd Updates NEI docs for new defaults and argument text.
man/download_narr.Rd Expands NARR documentation (including full variable reference section).
man/download_modis.Rd Updates MODIS docs to document token sources + httr2-related args.
man/download_merra2.Rd Updates MERRA-2 docs to include token arg + httr2 args.
man/download_koppen_geiger.Rd Updates Köppen-Geiger docs for new defaults and argument text.
man/download_huc.Rd Updates HUC docs for new defaults and argument text.
man/download_hms.Rd Updates HMS docs for new defaults and argument text.
man/download_groads.Rd Updates gROADS docs to include token arg + httr2 args.
man/download_gridmet.Rd Updates gridMET docs for new defaults and argument text.
man/download_gmted.Rd Updates GMTED docs for new defaults and argument text.
man/download_geos.Rd Updates GEOS docs for token arg + httr2 args.
man/download_edgar.Rd Updates EDGAR docs for new defaults and argument text.
man/download_ecoregion.Rd Updates ecoregion docs for new defaults and argument text.
man/download_data.Rd Updates wrapper docs to include NASA token + rate limiting.
man/download_cropscape.Rd Updates CropScape docs for new defaults and argument text.
man/download_aqs.Rd Updates AQS docs for new defaults and argument text.
man/cov.Rd Removes internal helper doc.
man/check_urls.Rd Minor formatting update.
man/check_url_status.Rd Documents retry parameter.
man/calculate_modis_daily.Rd Doc line wrapping changes.
man/calculate_modis.Rd Doc line wrapping changes.
agents/test-agent.yaml Adds testing specialist agent metadata.
agents/test-agent.md Adds testing specialist agent prompt.
agents/process-agent.yaml Adds process-tier specialist agent metadata.
agents/process-agent.md Adds process-tier specialist agent prompt.
agents/download-agent.yaml Adds download-tier specialist agent metadata.
agents/download-agent.md Adds download-tier specialist agent prompt.
agents/calculate-agent.yaml Adds calculate-tier specialist agent metadata.
agents/calculate-agent.md Adds calculate-tier specialist agent prompt.
agents/README.md Documents agent files and intended usage.
agent.md Adds repository-wide agent guide.
_pkgdown.yml Updates navbar label and includes setup_nasa_token in reference index.
R/process.R Improves NLCD file discovery and aux.xml handling; adjusts HMS missing-data logic.
R/ignore.R Refactors long URL string assembly to satisfy line length rules.
R/helpers.R Removes internal container helper functions.
R/calculate_covariates_auxiliary.R Minor doc formatting updates.
R/calculate_covariates.R Adjusts calculate_nei geometry-return logic; minor doc formatting updates.
NEWS.md Adds 1.3.3 release notes for httr2 migration.
NAMESPACE Exports new helpers and updates httr2 imports.
DESCRIPTION Bumps version and updates Imports list.
AGENTS.md Adds Copilot-oriented repository instructions.
.lintr Tightens line length limit and changes exclusions.
.gitignore Adds ignores for tutorials, migration guide, and testthat problem artifacts.
.github/workflows/test-coverage.R Stops treating warnings as errors during coverage run.
.github/workflows/test-coverage-local.yaml Adds NASA_EARTHDATA_TOKEN env var mapping in CI.
.github/workflows/check-standard.yaml Adds NASA_EARTHDATA_TOKEN env var mapping in CI.
.Rbuildignore Excludes agent prompt directories/files from build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .lintr
Comment thread R/calculate_covariates_auxiliary.R Outdated
Comment thread R/calculate_covariates.R Outdated
Comment thread NEWS.md Outdated
Comment thread vignettes/epa_download.Rmd Outdated
Comment on lines +39 to +40
* `hash = TRUE`: generate unique SHA-1 hash for the downloaded files.

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vignette claims hash is a SHA-1 hash, but download_hash() currently computes a combined md5sum (and uses an external md5sum binary). Please align the wording with the implementation (or update the implementation).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment thread R/process.R
Comment on lines +873 to 876
full.names = TRUE,
recursive = TRUE, # ADD THIS - files may be in subdirectories
ignore.case = TRUE # ADD THIS - for robustness
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove development-time annotations like “ADD THIS” / “FIXED” from comments before merging; they add noise and make it harder to understand the intent of the logic long-term.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment thread vignettes/terraclimate_workflow.Rmd Outdated
Comment on lines 36 to 37
* `hash = TRUE`: generate unique SHA-1 hash for the downloaded files.

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet says hash generates a “SHA-1” hash, but download_hash() currently computes a combined md5sum (and uses an external md5sum binary). Please update the vignette text (or update the hashing implementation) so users aren’t misled.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment thread vignettes/narr_workflow.Rmd Outdated
Comment on lines 35 to 36
* `hash = TRUE`: generate unique SHA-1 hash for the downloaded files.

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet says hash generates a “SHA-1” hash, but download_hash() currently computes a combined md5sum (and uses an external md5sum binary). Please update the vignette text (or update the hashing implementation) so users aren’t misled.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment thread vignettes/gridmet_workflow.Rmd Outdated
Comment on lines 36 to 37
* `hash = TRUE`: generate unique SHA-1 hash for the downloaded files.

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet says hash generates a “SHA-1” hash, but download_hash() currently computes a combined md5sum (and uses an external md5sum binary). Please update the vignette text (or update the hashing implementation) so users aren’t misled.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment thread R/process.R
kyle-messier and others added 5 commits March 11, 2026 16:09
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 11, 2026

@kyle-messier I've opened a new pull request, #212, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 11, 2026

@kyle-messier I've opened a new pull request, #213, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 11, 2026

@kyle-messier I've opened a new pull request, #214, to work on those changes. Once the pull request is ready, I'll request review from you.

kyle-messier and others added 29 commits May 14, 2026 13:41
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The new daily_agg tests for GOES, MERRA2, and GEOS build NetCDF
fixtures via terra::writeCDF, which requires the ncdf4 package.
Without it, testthat skipped 10 tests covering the daily_agg
branches, dropping patch coverage from ~99.3% to 98.69% and
failing the coverage gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Bump runner to ubuntu-24.04
- Add workflow_dispatch filter input for targeted reruns
- Add ncdf4, withr, furrr, mirai, targets to extra-packages
- Add issues: write permission
- Open or update a tracking issue (label: live-test-failure) on failure
  using JasonEtco/create-an-issue@v2 + issue template
- Repo watchers receive an email when the issue is created

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…polygon example

- Fix stray processed_data refs by using processed_mean
- Add 'Choosing a daily aggregation' subsection (mean vs sum)
- Add polygon (areal) extraction example using a synthetic sf polygon
- Cross-link the Computational Considerations vignette

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ions vignette, pkgdown dev site

- README: add missing USGS HUC row to dataset table
- README: add 'NASA Earthdata authentication with setup_nasa_token()' section
- README: add 'Computational considerations' section linking to new vignette
- New vignettes/computational_considerations.Rmd with purrr/furrr/mirai/targets examples
- _pkgdown.yml: register new vignette in articles + navbar; enable development: mode: auto
- .github/workflows/pkgdown.yaml: trigger on migrate/dev branches
- DESCRIPTION: add furrr, mirai, targets to Suggests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
~3-5 inputs per dataset chosen to exercise each major branch of the
corresponding download_* function. Each parameter combination gets its
own testthat::test_that block with a descriptive name so failures
identify the exact input under test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…se layer time metadata safely, and no longer crash in calc_message()/calc_time().
@kyle-messier kyle-messier merged commit eb186c7 into main May 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants