[FEAT] New Sanchez-Franks 2021 Reader#137
Conversation
The new feature includes a reader for the Sanchez-Franks satellite reconstruction from 2021. New files were added for that which enable reading the data and mapping metadata: `sf2021.py` and `sf2021.yml` and the new data file called `altimetry_moc_transport_1993_2020_18mos_smoothed.nc`. Some files were adjusted to account for the new reader like: - `__init__.py` - `read.py` - `reader_utils.py` - `readers.py` - `standardise.py` Also tests were added to adjust for it, one is located in `test_readers.py` and one new file was created called `test_sf2021.py`.
There was a problem hiding this comment.
Pull request overview
This PR adds a new data reader for the Sanchez-Franks et al. 2021 satellite-based reconstruction of the AMOC at 26°N. A new reader module and its YAML metadata are added, the dataset is wired into read/readers/reader_utils/data_sources registries, the generic standardize_time_coordinate helper is broadened to handle SF2021's numeric time (no units attribute), tests are added (test_sf2021.py plus one in test_readers.py), and the demo notebook gets an SF2021 section.
Changes:
- New
sf2021reader module + YAML metadata, with reader registered indata_sources/__init__.py,read.py,readers.py, andreader_utils.py. standardise.standardize_time_coordinategains a value-based heuristic to interpret missing-unitsnumeric times as days since0000-01-01, and a new deprecatedstandardise_sf2021wrapper is added.- Tests and a demo notebook section for the new reader.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| amocatlas/data_sources/sf2021.py | New reader module for the SF2021 dataset. |
| amocatlas/data_sources/init.py | Exposes read_sf2021. |
| amocatlas/metadata/sf2021.yml | YAML metadata, variable mapping, and file source URL. |
| amocatlas/read.py | Adds sf2021 to public read API and known arrays. |
| amocatlas/readers.py | Adds sf2021 to reader dispatch and load_dataset docs. |
| amocatlas/reader_utils.py | Adds friendly name mapping for sf2021 (and nac). |
| amocatlas/standardise.py | Broadens time-coordinate decoding with a magnitude-based heuristic; adds deprecated standardise_sf2021. |
| tests/test_sf2021.py | New unit tests for the SF2021 reader. |
| tests/test_readers.py | Integration test for SF2021 time decoding via read.sf2021(). |
| notebooks/demo.ipynb | Demo cells loading and plotting the SF2021 timeseries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Update small things in `sf2021.py` regarding formatting and inside `sf2021.yml` to provide the correct date coverage and changed the function for time standardisation inside `standardise.py` to have a more robust fallback and just handle sf2021 as an exception. Also added tests for the new logic inside `test_standardise.py`.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
amocatlas/standardise.py:705
- The except clause no longer catches
ValueError, which was previously handled.pd.to_datetime(..., errors="coerce")can still raiseValueErrorfor malformed unit strings (e.g. theunitsparsing path), andnp.datetime64("0000-01-01", "D")may raiseValueErrorin some numpy versions. DroppingValueErrorfrom the caught exceptions is a behavior change that can turn previously-handled cases into unhandled crashes. Consider keepingValueErrorin the tuple.
except (TypeError, OverflowError, pd.errors.OutOfBoundsDatetime) as e:
amocatlas/standardise.py:694
- The SF2021 day-count branch uses
np.datetime64("0000-01-01", "D")as the base. Numpy'sdatetime64[ns]representable range is approximately 1678–2262, and intermediate values near year 0 (e.g.[1, 2, 3]days as in the new test) cannot be safely cast todatetime64[ns]— the cast either overflows or produces incorrect dates. The testtest_standardize_time_coordinate_allows_sf2021_day_countsonly asserts the resulting value does not start with"1970-01-01", which would also be true for an overflowed/garbage datetime, so it does not actually validate correctness. Please either document/validate the expected epoch (MATLAB datenum vs. days since 0000-01-01 vs. days since 0001-01-01 — these differ by 1–2 days) and add an assertion on a known concrete date (e.g. that day 727930 maps to 1993-01-17 per the YAML'stime_coverage_start), or use a base epoch within thedatetime64[ns]range plus an explicit offset.
elif source_file == "altimetry_moc_transport_1993_2020_18mos_smoothed.nc":
# SF2021 uses unlabeled day counts in the raw file.
base_dt = np.datetime64("0000-01-01", "D")
sf2021_time = base_dt + time_coord.values.astype("timedelta64[D]")
time_datetime = sf2021_time.astype("datetime64[ns]")
Updated the time handling to not be part of the `standardise.py` module as it is very specific but rather be handled inside the reader `sf2021.py` itself.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
amocatlas/data_sources/sf2021.py:86
- The TIME normalization is likely incorrect for fractional-day values: converting via
timedelta64[D]will truncate fractions (e.g., 727974.5 becomes 727974), andnp.datetime64('0000-01-01','ns')is outside the representable range fordatetime64[ns]on many NumPy builds. Consider convertingdays since 0000-01-01by first shifting to a safe origin (e.g., compute an integer day offset between 0000-01-01 and 1970-01-01 usingdatetime64[D]), then use pandas/xarray to add aTimedeltain days (preserving fractions) to a 1970 origin.
time_values = ds[time_var].values
time_delta = np.array(time_values, dtype="timedelta64[D]").astype("timedelta64[ns]")
base_dt = np.datetime64("0000-01-01", "ns")
time_datetime = (base_dt + time_delta).astype("datetime64[ns]")
# Use assign_coords to properly set dimension coordinate
ds = ds.assign_coords({time_var: time_datetime})
ds[time_var].attrs = _TIME_METADATA
amocatlas/data_sources/sf2021.py:102
read_sf2021is annotated/documented as returning onlylist[xr.Dataset], but whentrack_added_attrs=Trueit returns(datasets, added_attrs_per_dataset). Please update the return type annotation and docstring to reflect the dual return shape (matching the pattern used in other readers likeread_nac).
) -> list[xr.Dataset]:
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (2)
amocatlas/data_sources/sf2021.py:82
- This time conversion is very likely broken:
np.datetime64('0000-01-01', 'ns')is out of the representable range fordatetime64[ns](it will raise / produce NaT), and casting float days totimedelta64[D]drops fractional days (e.g. 0.5). Consider converting relative to a representable epoch (e.g. compute days since 1970-01-01 by subtracting the 0000→1970 day offset) and preserve sub-day fractions, or usecftime.num2dateand then convert todatetime64[ns]for in-range dates. Also ensure the attrs you set match the fact the coordinate is datetime-like (avoid 'seconds since 1970...' when values are datetime64).
time_values = ds[time_var].values
time_delta = np.array(time_values, dtype="timedelta64[D]").astype("timedelta64[ns]")
base_dt = np.datetime64("0000-01-01", "ns")
time_datetime = (base_dt + time_delta).astype("datetime64[ns]")
amocatlas/data_sources/sf2021.py:184
- SF2021 time decoding depends on having the raw numeric "days since 0000-01-01" values available.
ReaderUtils.safe_load_dataset()usesxr.open_dataset()with defaultdecode_times=True, which can coerce this into cftime/object or otherwise make downstream normalization skip. Consider loading this dataset withdecode_times=False(as MOVE does) or explicitly handling cftime/object TIME coordinates in_normalize_sf2021_time_coordinate.
# Use ReaderUtils for consistent dataset loading
ds = ReaderUtils.safe_load_dataset(file_path)
update test to capture the change in months in the first two timesteps
changed the time handling function to something more robust
Description:
The feature includes a reader for the Sanchez-Franks satellite reconstruction from 2021.
New files were added for that which enable reading the data and mapping metadata:
sf2021.pyandsf2021.ymland the new data file calledaltimetry_moc_transport_1993_2020_18mos_smoothed.ncwas also added.The reader
sf2021.pyalso handles a special time format case which was present in the dataset. As this was the only data file which had this issue yet, this was handled as a local issue within the reader, with the possibility to generalize in the future when this problim arises again.Some files were adjusted to account for the new reader like:
__init__.py(added sf2021 link)read.py(added sf2021 link)reader_utils.py(added sf2021 link)readers.py(added sf2021 link)standardise.py(added snippet for sf2021)Also tests were added. A new file was created called
test_sf2021.py, which tests the basic reader function much liketest_noac47n.pyortest_fbc.pyand also especially looks at the time decoding done insf2021.py.Related Issues / Pull Requests: