Expose n_resamples on EvalBase to make bootstrap CI cost configurable#556
Merged
Conversation
Add n_resamples: int = Field(default=9999) to EvalBase and thread it into both bootstrap() calls in stat_and_bootstrap. Default preserves existing production behaviour (scipy default is 9999). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ClassificationMetrics and RegressionMetrics are instantiated with n_resamples=100 in tests that only assert metric values, not CI precision. Cuts test_classification_metrics from ~18s to ~0.2s and test_regression_metrics from ~4s to ~0.04s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
hmacdope
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
EvalBase.stat_and_bootstrapcalledscipy.stats.bootstrapwith a hardcoded default ofn_resamples=9999, causing ~60,000 bootstrap iterations on toy 4-element test data across 6 classification metrics. This results intest_classification_metricstaking ~18s with no way to reduce it from outside the class.Closes #555.
Changes
openadmet/models/eval/eval_base.pyFieldto pydantic importsn_resamples: int = Field(default=9999, ge=1)toEvalBase; default preserves existing production behaviorn_resamples=self.n_resamplesto bothbootstrap()calls instat_and_bootstrapopenadmet/models/tests/unit/eval/test_eval.pyClassificationMetrics(n_resamples=100)andRegressionMetrics(n_resamples=100)in metric tests (CI precision is not under test there)Performance
test_classification_metricstest_regression_metricsQuality Assurance & AI Policy
To maintain project quality and respect maintainer bandwidth, please confirm the following:
Status
Developers Certificate of Origin
Note to Contributors: We reserve the right to close PRs without review if they appear to lack human validation or do not meet the quality standards described in our
CONTRIBUTING.md.