For the percentile calculations currently in earthkit.climate.utils.percentile, a solution using earthkit-transforms would be preferred to avoid duplication and possible inconsistencies across the earthkit stack.
- earthkit.transforms.climatology.percentiles() currently
- doesn't support seasonal or day-of-year aggregation (the latter is covered by daily_reduce which uses a 366 day calendar instead of 365),
- interprets the "year" frequency differently than
get_percentile (the equivalent aggregation lives in earthkit.transforms.temporal). Maybe None as a value for frequency should cover the case of full temporal aggregation also in climatology.
- The expansion of the day-of-year coordinate could stay here as a utility, separate from the actual aggregation. Only caveat: when doing an all-data aggregation, the time dimension is lost, so the expanded day-of-year coordinate can't easily be placed at the same index without a reference available.
- Windowed computations as in
calculate_percentile_doy could be covered with earthkit.transforms.rolling_reduce, but it would be nice to have a shortcut for the percentile climatology.
- Climatological percentiles should ideally come all from just one function (frequencies could be None, year, season, month, week, day, dayofyear).
A potential starting point to integrate earthkit-transforms:
def percentile_as_doy(da, percentile, freq=None):
q = percentile / 100.0
if freq is None:
quantile = ekt.temporal.reduce(da, how="quantile", q=q)
# Insert day of year dimension where time dimension was before aggregation
expand_axis = sum([x == y for x, y in zip(da.dims, quantile.dims)])
return quantile.expand_dims({"dayofyear": np.arange(1, 366)}, axis=expand_axis)
if freq == "month":
month_lengths = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
month_of_doy = [m for m, n in enumerate(month_lengths, start=1) for _ in range(n)]
return (
ekt.climatology.quantiles(da, q=q, frequency="month")
.sel(month=month_of_doy)
.assign_coords({"dayofyear": ("month", np.arange(1, 366))})
.swap_dims({"month": "dayofyear"})
.drop_vars("month")
.squeeze()
)
if freq == "dayofyear":
return ekt.climatology.daily_reduce(da, how="quantile", q=q)
raise ValueError(f"'{freq}' is not a valid value for freq; supported values are None, 'month', 'dayofyear'")
This dispatches to the appropriate ek-transforms functions and combines the percentile computation with the day-of-year coordinate expansion so the dayofyear dimension can always be inserted in the right place. Not sure if this is a good approach.
For the percentile calculations currently in earthkit.climate.utils.percentile, a solution using earthkit-transforms would be preferred to avoid duplication and possible inconsistencies across the earthkit stack.
get_percentile(the equivalent aggregation lives in earthkit.transforms.temporal). MaybeNoneas a value for frequency should cover the case of full temporal aggregation also in climatology.calculate_percentile_doycould be covered with earthkit.transforms.rolling_reduce, but it would be nice to have a shortcut for the percentile climatology.A potential starting point to integrate earthkit-transforms:
This dispatches to the appropriate ek-transforms functions and combines the percentile computation with the day-of-year coordinate expansion so the
dayofyeardimension can always be inserted in the right place. Not sure if this is a good approach.