Context
PR #753 added bootstrap-based confidence intervals to the 1-in-X return value analysis in climakitae/new_core/processors/metric_calc.py. As currently implemented, the CI path invokes xr.apply_ufunc twice per simulation batch (and per spatial chunk in the batched path): once for the point-estimate distribution fit, and a second time for the bootstrap CIs.
This works correctly but does redundant orchestration work — coord alignment, dtype promotion, and the full vectorize=True traversal of the grid happen twice.
Proposal
Collapse the two passes into one by introducing a combined inner helper that returns (return_data, ci_lower, ci_upper, p_value) per 1-D block-maxima series, and call apply_ufunc once with four output core dims.
Sketch:
```python
def _fit_and_conf_int_1d(
self,
block_maxima_1d,
*,
return_periods=UNSET,
return_values=UNSET,
distr="gev",
block_size=1,
extremes_type="max",
get_p_value=False,
compute_conf_int=False,
bootstrap_runs=100,
conf_int_lower_bound=2.5,
conf_int_upper_bound=97.5,
):
return_data, p_value = self._fit_return_variable_1d(...)
if not compute_conf_int:
nan_like = np.full_like(return_data, np.nan, dtype=float)
return return_data, nan_like, nan_like, p_value
ci_lower, ci_upper = self._conf_int(...)
return return_data, ci_lower, ci_upper, p_value
```
Then in `_fit_distributions_vectorized` and `_fit_with_early_spatial_batching`, replace the two `apply_ufunc` calls with a single call using:
```python
output_core_dims=[["one_in_x"], ["one_in_x"], ["one_in_x"], []]
```
Why
- One pass over the grid instead of two — removes duplicated orchestration cost, which is meaningful at d03 resolution × many sims × 100 bootstrap runs.
- Point estimates and bootstrap draws share the same fit path, so they can't drift in the future.
- `_bootstrap` and `_conf_int` remain intact as tested units; only the dispatch changes.
Scope
Out of scope for PR #753 — tracked here as follow-up performance/refactor work.
Related
Context
PR #753 added bootstrap-based confidence intervals to the 1-in-X return value analysis in
climakitae/new_core/processors/metric_calc.py. As currently implemented, the CI path invokesxr.apply_ufunctwice per simulation batch (and per spatial chunk in the batched path): once for the point-estimate distribution fit, and a second time for the bootstrap CIs.This works correctly but does redundant orchestration work — coord alignment, dtype promotion, and the full
vectorize=Truetraversal of the grid happen twice.Proposal
Collapse the two passes into one by introducing a combined inner helper that returns
(return_data, ci_lower, ci_upper, p_value)per 1-D block-maxima series, and callapply_ufunconce with four output core dims.Sketch:
```python
def _fit_and_conf_int_1d(
self,
block_maxima_1d,
*,
return_periods=UNSET,
return_values=UNSET,
distr="gev",
block_size=1,
extremes_type="max",
get_p_value=False,
compute_conf_int=False,
bootstrap_runs=100,
conf_int_lower_bound=2.5,
conf_int_upper_bound=97.5,
):
return_data, p_value = self._fit_return_variable_1d(...)
if not compute_conf_int:
nan_like = np.full_like(return_data, np.nan, dtype=float)
return return_data, nan_like, nan_like, p_value
ci_lower, ci_upper = self._conf_int(...)
return return_data, ci_lower, ci_upper, p_value
```
Then in `_fit_distributions_vectorized` and `_fit_with_early_spatial_batching`, replace the two `apply_ufunc` calls with a single call using:
```python
output_core_dims=[["one_in_x"], ["one_in_x"], ["one_in_x"], []]
```
Why
Scope
Out of scope for PR #753 — tracked here as follow-up performance/refactor work.
Related