add ifs-forecast loader#12
Conversation
|
Thanks @leifdenby! Linting is fixed and test have passed. I think we can merge this. |
leifdenby
left a comment
There was a problem hiding this comment.
Looking good :) A few suggestions for changes
| paths = [paths] if isinstance(paths, str) else paths | ||
|
|
||
| if chunks is None: | ||
| # open_mfdataset requires dask; open individually and concat when chunks=None |
There was a problem hiding this comment.
does this mean we should check if dask is installed here?
There was a problem hiding this comment.
Or we could make dask a dependency of the package. In a way it make sense as one of the main functionalities of the relying on xarray is the dask functionality
There was a problem hiding this comment.
yes, I'm not sure I see a situation where having dask installed would harm us. I guess we just should be quite lenient about what minimum version we require so that we don't mess things up for people.
I still don't quite understand the if-statement comment though. Currently it doesn't check for whether dask is installed. So do we expect an exception if chunks != None and dask isn't installed? Or shouldn't the xr.open_mfdataset(...) call just fail to use dask but continue anyway? I also don't quite understand why one of these calls uses cfgrib as the engine explicitly, but the second one doesn't. Are we hoping that cfgrib will be detected as the right engine and used implicitly?
| pytest.importorskip("cfgrib") | ||
|
|
||
| DET_FILES = [ | ||
| "/scratch/cu0k/ifs-example/ifs_det_fcst_20260101.grib", |
There was a problem hiding this comment.
shouldn't these point to path in the EWC S3 now?
There was a problem hiding this comment.
Since these are single files that must be downloaded what do you think about using pooch instead for fetching the files (like I did for the HARP obstables: https://github.com/mlwp-tools/mlwp-data-loaders/blob/main/tests/test_harp_obstable_integration.py)? That ensure that we get the right content (pooch uses a checksum for this) and it would easier to implementing caching of the test datasets in github CI down the line (since pooch has a single cache dir we can put in the ci action cache)
There was a problem hiding this comment.
Didn't know about pooch. Yes your solution for the obstable is much more elegant. Let's go for that one!
Adds