Skip to content

Day-of-year percentiles and earthkit-transforms #34

@chpolste

Description

@chpolste

For the percentile calculations currently in earthkit.climate.utils.percentile, a solution using earthkit-transforms would be preferred to avoid duplication and possible inconsistencies across the earthkit stack.

  • earthkit.transforms.climatology.percentiles() currently
    • doesn't support seasonal or day-of-year aggregation (the latter is covered by daily_reduce which uses a 366 day calendar instead of 365),
    • interprets the "year" frequency differently than get_percentile (the equivalent aggregation lives in earthkit.transforms.temporal). Maybe None as a value for frequency should cover the case of full temporal aggregation also in climatology.
  • The expansion of the day-of-year coordinate could stay here as a utility, separate from the actual aggregation. Only caveat: when doing an all-data aggregation, the time dimension is lost, so the expanded day-of-year coordinate can't easily be placed at the same index without a reference available.
  • Windowed computations as in calculate_percentile_doy could be covered with earthkit.transforms.rolling_reduce, but it would be nice to have a shortcut for the percentile climatology.
  • Climatological percentiles should ideally come all from just one function (frequencies could be None, year, season, month, week, day, dayofyear).

A potential starting point to integrate earthkit-transforms:

def percentile_as_doy(da, percentile, freq=None):
    q = percentile / 100.0
    if freq is None:
        quantile = ekt.temporal.reduce(da, how="quantile", q=q)
        # Insert day of year dimension where time dimension was before aggregation
        expand_axis = sum([x == y for x, y in zip(da.dims, quantile.dims)])
        return quantile.expand_dims({"dayofyear": np.arange(1, 366)}, axis=expand_axis)
    if freq == "month":
        month_lengths = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
        month_of_doy = [m for m, n in enumerate(month_lengths, start=1) for _ in range(n)]
        return (
            ekt.climatology.quantiles(da, q=q, frequency="month")
            .sel(month=month_of_doy)
            .assign_coords({"dayofyear": ("month", np.arange(1, 366))})
            .swap_dims({"month": "dayofyear"})
            .drop_vars("month")
            .squeeze()
        )
    if freq == "dayofyear":
        return ekt.climatology.daily_reduce(da, how="quantile", q=q)
    raise ValueError(f"'{freq}' is not a valid value for freq; supported values are None, 'month', 'dayofyear'")

This dispatches to the appropriate ek-transforms functions and combines the percentile computation with the day-of-year coordinate expansion so the dayofyear dimension can always be inserted in the right place. Not sure if this is a good approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions