Skip to content

feat: add Intake driver and intake catalog for test datasets#13

Open
leifdenby wants to merge 1 commit into
mlwp-tools:mainfrom
leifdenby:feat/intake-driver
Open

feat: add Intake driver and intake catalog for test datasets#13
leifdenby wants to merge 1 commit into
mlwp-tools:mainfrom
leifdenby:feat/intake-driver

Conversation

@leifdenby

@leifdenby leifdenby commented Jun 5, 2026

Copy link
Copy Markdown
Member

I thought it could be cool to be able to create intake catalogs of the datasets that we might use for verification, and do this in a way where we can specify that mlwp-data-loaders should be used for actually loading the datasets. This PR is a proof of concept in that direction that adds an Intake driver (MLWPLoaderDriver class, called mlwp_loader in intake) that wraps any mlwp loader module.

Sample intake catalog entry:

sources:
  cerra_sample:
    description: >
      Small CERRA sample dataset stored on EWC (European Weather Cloud)
      S3-compatible object store, used for testing anemoi-datasets loading.
    driver: mlwp_loader
    args:
      dataset_path: s3://mlwp-sample-datasets/anemoi-datasets/cerra-rr-an-oper-0001-mars-5p5km-2017-2017-6h-v3-testing.zarr/
      loader: mlwp_data_loaders.loaders.anemoi.anemoi_datasets
      storage_options:
        endpoint_url: https://object-store.os-api.cci2.ecmwf.int
        anon: true
      chunks: null
    metadata:
      url: https://object-store.os-api.cci2.ecmwf.int

Sample use:

# uvx --with "mlwp-data-loaders[intake] @ git+https://github.com/leifdenby/mlwp-data-loaders@feat/intake-driver" ipython

import intake
cat = intake.open_catalog(
    "https://raw.githubusercontent.com/leifdenby/mlwp-data-loaders"
    "/feat/intake-driver/tests/catalog/test_datasets.yaml"
)
cat.anemoi_datasets.cerra_sample.to_dask()

This PR also adds test catalog structure, and updates the harp obstable loader to handle HTTP URLs via fsspec simplecache.

I'm just creating this as an example for now, not saying we should necessarily merge it in :)

Adds an Intake driver (MLWPLoaderDriver, called `mlwp_loader` in intake)
that wraps any mlwp loader module, a test catalog structure, and updates
the harp obstable loader to handle HTTP URLs via fsspec simplecache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant