Skip to content

Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

@gmacgilchrist

Description

@gmacgilchrist

Description of bug
For monthly average data, it is not uncommon for time indices to be on the middle day of the month, which varies from month the month. This breaks the logic in return_time_series_freq, which only picks out a monthly frequency if the time index day is the same for each month.

I encountered the issue while attempting to generate an uninitialized forecast. I think it was likewise causing silent issues in generating a persistence forecast, which was previously producing NaNs but works fine after implementing a hack (changing the time index of the verification dataset to match what's expected).

Code sample (reproducing the core logic of return_time_series_freq)

import cftime

# monthly separated time array
times = [cftime.DatetimeNoLeap(1,1,15),cftime.DatetimeNoLeap(1,2,14),cftime.DatetimeNoLeap(1,3,15)]
ds = xr.Dataset(coords={'time':times})

for freq in ['day','month','year']:
        # first dim values not equal all others
        if not (
            getattr(ds.isel({'time': 0})['time'].dt, freq) == getattr(ds['time'].dt, freq)
        ).all():
            print(freq)
            break

This returns a frequency of "day", which results in subsequent errors. To work around this, a user has to manipulate at least the verification dataset to have the same "day" for each month in the time index.

Would it be undesirable for the frequency of the verification dataset to be user specified in the same way as the units of the initialized dataset lead time need to be specified?

Output of climpred.show_versions()

Details
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-553.5.1.el8_10.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

climpred: 2.4.0
xarray: 2023.12.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
cftime: 1.6.3
netcdf4: None
nc_time_axis: 1.4.1
matplotlib: 3.8.2
cf_xarray: 0.9.2
xclim: 0.50.0
dask: 2024.5.0
distributed: 2024.5.0
setuptools: 69.5.1
pip: 24.0
conda: None
IPython: 8.25.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions