Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the hyperparameter grid generation strategy (moving to space-filling grids) and adds a broad set of unit tests across core helper modules to improve correctness and regression detection across the forecasting workflow.
Changes:
- Switches ensemble hyperparameter grid generation from Latin hypercube to space-filling grids.
- Adds a space-filling grid helper in
models.R. - Introduces comprehensive
testthatcoverage for utility, I/O, project/run info, parallel utilities, model workflows, feature selection, hierarchy helpers, and agent helper functions.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| R/ensemble_models.R | Uses dials::grid_space_filling() for ensemble hyperparameter grid creation. |
| R/models.R | Replaces the internal Latin-hypercube grid helper with a space-filling grid helper. |
| tests/testthat/test-utility.R | Adds unit tests for get_timestamp(). |
| tests/testthat/test-run_info.R | Adds tests for set_run_info() / get_run_info() defaults and validation. |
| tests/testthat/test-read_write_data.R | Adds tests for hashing, read/write helpers, and file listing utilities. |
| tests/testthat/test-project_info.R | Adds tests for set_project_info() defaults, validation, and overwrite behavior. |
| tests/testthat/test-prep_models_helpers.R | Adds tests for prep-model helper functions (spacing/frequency/seasonality/backtest periods). |
| tests/testthat/test-prep_data_helpers.R | Adds tests for prep-data helper functions (frequency, Fourier/lag/rolling/recipes/date regex, transforms). |
| tests/testthat/test-parallel_util.R | Adds tests for core parallel setup/teardown helpers. |
| tests/testthat/test-multistep_models.R | Adds tests for multistep parsnip model specs (print/update/translate). |
| tests/testthat/test-multistep_helper.R | Adds tests for multistep horizon helper functions (xreg checks, lag selection, feature selection). |
| tests/testthat/test-models.R | Adds extensive tests for model listing, recipes/workflows, resampling, tuning helpers, and train-model utilities. |
| tests/testthat/test-input_checks.R | Adds tests for input validation helpers and error messaging. |
| tests/testthat/test-hierarchical.R | Extends hierarchy tests (summaries, HTS objects/nodes, adjust_df behavior). |
| tests/testthat/test-final_models_helpers.R | Adds tests for prediction intervals, weekly-to-daily conversion, and output cleanups. |
| tests/testthat/test-feature_selection.R | Adds tests for correlation and variable-importance feature selection helpers. |
| tests/testthat/test-align_types_prep_data.R | Adds tests for type alignment and several prep-data cleaning/stationarity helpers. |
| tests/testthat/test-agent_summarize_helpers.R | Adds tests for agent model-summarization helper utilities. |
| tests/testthat/test-agent_helpers.R | Adds tests for agent iteration helpers (type coercion, param matching, JSON extraction, NULL handling). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| test_that("extract_json_object parses simple JSON", { | ||
| raw <- '{"key": "value", "num": 42}' | ||
| result <- extract_json_object(raw) | ||
| expect_equal(result$key, "value") | ||
| expect_equal(result$num, 42) | ||
| }) | ||
|
|
||
| test_that("extract_json_object strips code fences", { | ||
| raw <- '```json\n{"key": "value"}\n```' | ||
| result <- extract_json_object(raw) | ||
| expect_equal(result$key, "value") | ||
| }) | ||
|
|
||
| test_that("extract_json_object handles surrounding text", { | ||
| raw <- 'Here is the JSON: {"answer": "yes"} and some more text.' | ||
| result <- extract_json_object(raw) | ||
| expect_equal(result$answer, "yes") | ||
| }) |
There was a problem hiding this comment.
These extract_json_object() tests implicitly require jsonlite to be installed (the implementation calls jsonlite::fromJSON()), but jsonlite is only in Suggests. To avoid check/CI failures when Suggests aren’t installed, add skip_if_not_installed("jsonlite") for this block (or move jsonlite to Imports if it’s required at runtime).
| test_that("par_start with local_machine creates cluster", { | ||
| run_info <- list( | ||
| storage_object = NULL, | ||
| path = tempdir(), | ||
| data_output = "csv", | ||
| object_output = "rds" | ||
| ) | ||
|
|
||
| par_info <- par_start( | ||
| run_info = run_info, | ||
| parallel_processing = "local_machine", | ||
| num_cores = 2, | ||
| task_length = 5 | ||
| ) | ||
|
|
||
| expect_false(is.null(par_info$cl)) | ||
| expect_true(is.function(par_info$foreach_operator)) | ||
|
|
||
| # clean up | ||
| par_end(par_info$cl) | ||
| }) |
There was a problem hiding this comment.
par_start(parallel_processing = "local_machine") will call parallel::makeCluster(min(cores, task_length)) where cores comes from detectCores() - 1. On single-core environments this becomes 0 and makeCluster(0) errors, making this test (and the code path) unreliable. Consider guarding the test with skip_if(parallel::detectCores() <= 1) or updating get_cores()/par_start() to ensure at least 1 worker.
tests/testthat/test-utility.R
Outdated
| # should be within 5 seconds of now | ||
|
|
||
| expect_true(abs(difftime(ts, now_utc, units = "secs")) < 5) |
There was a problem hiding this comment.
This assertion compares get_timestamp() to the current system time with a 5-second window. On slow or heavily loaded CI runners, this can be flaky. Consider freezing time for the test (e.g., via withr) or widening the tolerance to reduce intermittent failures.
| # should be within 5 seconds of now | |
| expect_true(abs(difftime(ts, now_utc, units = "secs")) < 5) | |
| # should be within 60 seconds of now to avoid flaky failures on slow CI | |
| expect_true(abs(difftime(ts, now_utc, units = "secs")) < 60) |
| test_that("target_corr_fn returns correlated features", { | ||
| set.seed(123) | ||
| n <- 50 | ||
| x1 <- rnorm(n) | ||
| data <- tibble::tibble( | ||
| Combo = rep("A", n), | ||
| Date = seq.Date(as.Date("2020-01-01"), by = "month", length.out = n), | ||
| Target = x1 * 2 + rnorm(n, sd = 0.1), | ||
| Feature_A = x1, | ||
| Feature_B = rnorm(n) | ||
| ) | ||
|
|
||
| result <- target_corr_fn(data, threshold = 0.5) | ||
|
|
||
| expect_true("term" %in% colnames(result)) | ||
| expect_true("Target" %in% colnames(result)) | ||
| expect_true("Feature_A" %in% result$term) | ||
| }) |
There was a problem hiding this comment.
target_corr_fn() depends on the corrr package, which is listed under Suggests. If Suggests aren’t installed in the test environment, these tests will fail. Consider adding skip_if_not_installed("corrr") for the target_corr_fn test blocks (or moving corrr to Imports if it’s required at runtime).
| test_that("vip_rf_fn produces variable importance scores", { | ||
| set.seed(42) | ||
| data <- tibble::tibble( | ||
| Date = seq(as.Date("2020-01-01"), by = "day", length.out = 100), | ||
| Target = rnorm(100, mean = 50, sd = 10), | ||
| x1 = rnorm(100), | ||
| x2 = rnorm(100), | ||
| x3 = rnorm(100) | ||
| ) | ||
| data$Target <- data$Target + 2 * data$x1 | ||
|
|
||
| result <- vip_rf_fn(data, seed = 42) | ||
| expect_s3_class(result, "tbl_df") | ||
| expect_true("Variable" %in% names(result)) | ||
| expect_true("Importance" %in% names(result)) | ||
| expect_true(all(result$Importance > 0)) | ||
| }) |
There was a problem hiding this comment.
vip_rf_fn() fits a ranger model (set_engine("ranger")) and calls vip::vi(), but ranger is not listed in this package’s Imports/Suggests in DESCRIPTION. This will make the test suite fail in environments where ranger isn’t installed. Either add ranger to Suggests/Imports, or skip these tests when ranger (and vip) aren’t installed (e.g., skip_if_not_installed()).
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Co-authored-by: mitokic <16709433+mitokic@users.noreply.github.com>
Fix flaky/fragile tests: add skip guards for optional dependencies and widen timestamp tolerance
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # -- utility.R tests -- | ||
|
|
||
| test_that("get_timestamp returns a POSIXct object in UTC", { | ||
| ts <- get_timestamp() | ||
|
|
||
| expect_s3_class(ts, "POSIXct") | ||
| expect_equal(attr(ts, "tzone"), "UTC") | ||
| }) | ||
|
|
||
| test_that("get_timestamp returns a timestamp close to current time", { | ||
| ts <- get_timestamp() | ||
| now_utc <- as.POSIXct(format(Sys.time(), tz = "UTC"), tz = "UTC") | ||
|
|
||
| # should be within 5 seconds of now | ||
|
|
||
| expect_true(abs(difftime(ts, now_utc, units = "secs")) < 5) | ||
| }) | ||
|
|
||
| test_that("get_timestamp format is YYYYMMDDTHHMMSSZ", { | ||
| ts <- get_timestamp() | ||
| formatted <- format(ts, "%Y%m%dT%H%M%SZ") | ||
|
|
||
| # should be parseable back | ||
| parsed <- as.POSIXct(formatted, format = "%Y%m%dT%H%M%SZ", tz = "UTC") | ||
| expect_false(is.na(parsed)) | ||
| }) |
There was a problem hiding this comment.
The get_timestamp tests on lines 193-216 are duplicates of tests already in test-utility.R. These should be removed from this file.
Additionally, note that there's an inconsistency: test-utility.R uses a 60-second tolerance (line 15) which is more appropriate for CI environments, while this file uses a 5-second tolerance (line 206) which may cause flaky test failures on slower systems.
| # -- utility.R tests -- | |
| test_that("get_timestamp returns a POSIXct object in UTC", { | |
| ts <- get_timestamp() | |
| expect_s3_class(ts, "POSIXct") | |
| expect_equal(attr(ts, "tzone"), "UTC") | |
| }) | |
| test_that("get_timestamp returns a timestamp close to current time", { | |
| ts <- get_timestamp() | |
| now_utc <- as.POSIXct(format(Sys.time(), tz = "UTC"), tz = "UTC") | |
| # should be within 5 seconds of now | |
| expect_true(abs(difftime(ts, now_utc, units = "secs")) < 5) | |
| }) | |
| test_that("get_timestamp format is YYYYMMDDTHHMMSSZ", { | |
| ts <- get_timestamp() | |
| formatted <- format(ts, "%Y%m%dT%H%M%SZ") | |
| # should be parseable back | |
| parsed <- as.POSIXct(formatted, format = "%Y%m%dT%H%M%SZ", tz = "UTC") | |
| expect_false(is.na(parsed)) | |
| }) |
This pull request introduces improvements to hyperparameter grid generation and adds comprehensive unit tests for helper functions across multiple files. The most significant changes are the switch from Latin hypercube to space-filling grids for hyperparameter tuning, and the addition of robust tests for data type alignment, agent helper functions, and summarization helpers. These changes enhance both the reliability and maintainability of the codebase.
Hyperparameter grid generation improvements:
dials::grid_latin_hypercubewithdials::grid_space_fillinginensemble_modelsand grid helper functions to generate more diverse and representative hyperparameter grids. [1] [2]Testing for agent helper functions:
test-agent_helpers.Rfor functions such ascollapse_or_na,apply_column_types,does_param_set_exist,extract_json_object, andnull_converter, covering various edge cases and type coercion scenarios.Testing for summarization helpers:
test-agent_summarize_helpers.Rfor summarization-related helper functions, including.chr1,.kv,.unquote,.signif_chr,.extract_predictors,.extract_outcomes,.extract_recipe_steps,.infer_period_from_dates,.find_obj, and.assemble_output, ensuring correct handling of inputs and outputs.Testing for data preparation and alignment:
test-align_types_prep_data.Rforalign_types,get_xregs_future_values_tbl,clean_outliers_missing_values, andmake_stationary, verifying correct data type alignment, handling of external regressors, missing values, outliers, and stationarity transformations.