Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5c8bfa9
move to single accessor
mpvginde May 11, 2026
02de5fa
add time alignment tests
mpvginde May 11, 2026
7d1e4ce
update pyproject
mpvginde May 11, 2026
5ee9674
remove loaders
mpvginde May 12, 2026
0584f93
remove properties
mpvginde May 12, 2026
f1d86aa
use mlwp-data-specs traits
mpvginde May 12, 2026
f48a93c
use traits accross package
mpvginde May 12, 2026
14283cd
update readme
mpvginde May 12, 2026
b48a266
Add IFS-forecast loader (#20)
mpvginde May 18, 2026
3d7a5fb
Merge branch 'refactor/decouple_loading_validation' into refactor/ali…
mpvginde May 18, 2026
199a6a0
remove files reintroduces by merge
mpvginde May 18, 2026
03259a8
port to use mlwp-data-specs
mpvginde May 18, 2026
099be41
refactor space alignment
mpvginde May 18, 2026
40ff525
interpolator bugfixes
mpvginde May 18, 2026
c8489d6
add interpolation and space-alignment test
mpvginde May 18, 2026
76bebba
remove leftover loader files
mpvginde Jun 23, 2026
542b4e3
update pyproject
mpvginde Jun 23, 2026
89e55a4
add global space and time alignment
mpvginde Jun 23, 2026
a94cc83
switch runner to mlwp-data-loaders
mpvginde Jun 23, 2026
faab341
fix chunked non-spatial dims
mpvginde Jun 23, 2026
2c7d387
add tests
mpvginde Jun 23, 2026
eafa041
remove leftover loader functionality
mpvginde Jun 23, 2026
2c7c905
add custom loader
mpvginde Jun 23, 2026
576b090
update introduction
mpvginde Jun 23, 2026
7c035d4
update pyproject
mpvginde Jun 23, 2026
0eabcfc
pre-commit fixes
mpvginde Jun 23, 2026
fb289a4
fix docstrings for utils
mpvginde Jun 23, 2026
ae1b0dc
add docstrings to accessors
mpvginde Jun 23, 2026
22a432e
fix docstring and typehinting
mpvginde Jun 29, 2026
8197689
fix docstring and typehinting
mpvginde Jun 29, 2026
d4b2183
fix docstring and typehinting
mpvginde Jun 29, 2026
03d5079
add custom loader
mpvginde Jun 29, 2026
81482db
pre-commit fixes
mpvginde Jun 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 18 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,42 @@

## What is this?

`mxalign` is an `xarray`-based package designed for the alignment and verification of meteorological datasets. It standardizes operations across datasets by attaching properties along three main axes:
- **Space:** Grid or point-based data
- **Time:** Forecasts, observations, or climatology
- **Uncertainty:** Deterministic, ensemble, or quantile forecasts
`mxalign` is an `xarray`-based package for aligning meteorological datasets. It operates on datasets that carry **traits** — metadata attributes that describe the nature of a dataset along three axes:

Currently, `mxalign` also acts as a full execution engine. It can load datasets (e.g., Anemoi inference outputs, observation datasets), apply transformations, align datasets in both space and time to match a reference, safely broadcast NaNs, and execute verification metrics on scaled Dask clusters (Local or Slurm).
`mxalign` is an `xarray`-based package for aligning meteorological datasets. It operates on datasets that carry **traits** — metadata attributes that describe the nature of a dataset along three axes:
- **Space:** `grid` or `point`
- **Time:** `forecast`, `observation`, or `climatology`
- **Uncertainty:** `deterministic`, `ensemble`, or `quantile`

> ⚠️ **Roadmap & Future Architecture Changes (planned for v0.2.0):**
> Currently, `mxalign` handles both alignment and the execution of the verification tooling pipeline, including loading and validation. In the upcoming `v0.2.0` release, this architecture will be refactored:
> - **Loading** will be split out into [`mlwp-data-loaders`](https://github.com/mlwp-tools/mlwp-data-loaders).
> - **Validation** of loaded `xr.Dataset`s will be moved to [`mlwp-data-specs`](https://github.com/mlwp-tools/mlwp-data-specs) (which will contain the requirements for each of the dataset traits and the validation logic).
> - **Execution** of the full verification pipeline (loading, transformations, alignment, and verification) from configuration files may be moved to a separate package in future releases.
> - **Tests** will be added to `mxalign` (building on test datasets already integrated into `mlwp-data-loaders`) that ensure that all alignment operations work correctly (Testing notebook execution inside `mxalign` is explicitly excluded from the current roadmap).
These traits are defined and validated by [`mlwp-data-specs`](https://github.com/mlwp-tools/mlwp-data-specs) and attached to datasets by [`mlwp-data-loaders`](https://github.com/mlwp-tools/mlwp-data-loaders). `mxalign` reads them to infer how datasets should be aligned, without needing to know how they were loaded.

`mxalign` currently supports alignment in **space** and **time**. Alignment along the **uncertainty** axis (e.g. ensemble to deterministic) is planned for a future release.

## Python API

`mxalign` provides building blocks for manual alignment, transformations, and interpolations of `xarray` datasets. This is ideal for interactive use in Jupyter notebooks or custom Python scripts.
`mxalign` provides building blocks for spatial and temporal alignment of `xarray` datasets. This is ideal for interactive use in Jupyter notebooks or custom Python scripts.

```python
import xarray as xr
from mxalign import load, align_space, align_time, transform
import mlwp_data_loaders as dl
import mxalign as mx

# Load datasets (using registered loaders)
ds_obs = load(name="observations_loader", files=["obs.nc"])
ds_fcst = load(name="anemoi_inference", files=["forecast.nc"])
# Load datasets — traits are attached by the loader
ds_obs = dl.load("observations_loader", files=["obs.nc"])
ds_fcst = dl.load("anemoi_inference", files=["forecast.nc"])

# Align the forecast spatially to match the observation reference
ds_fcst_aligned_space = align_space(ds_fcst, reference=ds_obs, method="interpolation")
ds_fcst_aligned = mx.align_space(ds_fcst, reference=ds_obs, method="interpolation")

# Align datasets temporally
datasets = {"obs": ds_obs, "fcst": ds_fcst_aligned_space}
aligned_datasets = align_time(datasets, method="intersection")
datasets = {"obs": ds_obs, "fcst": ds_fcst_aligned}
aligned_datasets = mx.align_time(datasets, method="intersection")
```

For a more comprehensive interactive example, check out the [introductory notebook](./examples/introduction.ipynb).

## Executing via a Configuration

For full verification pipeline execution, `mxalign` uses a YAML configuration file. This allows you to declaratively define how datasets are loaded, transformed, aligned, and verified.
`mxalign` can drive a full verification pipeline from a YAML configuration file, orchestrating dataset loading (via `mlwp-data-loaders`), transformations, alignment, and verification.

### Configuration Contents

Expand Down
6 changes: 3 additions & 3 deletions examples/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ dates:

datasets:
la-1024-ea-00:
loader: anemoi-inference
loader: mlwp_data_loaders.loaders.anemoi.anemoi_inference
files: /scratch/project_465000527/vandenbl/sg-la-comparison/inference-out/la-1024-01-ea-00/{reference_time:strftime(%Y%m%d%H)}.nc
#grid_mapping: CERRA
variables: ["2t","10u","10v","msl"]
cerra:
loader: anemoi-datasets
loader: mlwp_data_loaders.loaders.anemoi.anemoi_datasets
files: /scratch/project_465002133/datasets/cerra-rr-an-oper-0001-mars-5p5km-1984-2020-3h-v2-rmi.zarr
#grid_mapping: CERRA
variables: ["2t_2", "10u_10", "10v_10", "msl_0"]
# synop:
# loader: mxalign
# loader: ./myloader.py
# files: /scratch/project_465002133/datasets/observations/cerra_synops_2020.nc
# variables: ["2t","10si","msl"]
transformations:
Expand Down
87,102 changes: 7,764 additions & 79,338 deletions examples/introduction.ipynb

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions examples/myloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import xarray as xr


def load_dataset(file):
ds = xr.open_mfdataset([file], engine="h5netcdf")

ds = ds.rename_dims(code="point_index")

ds.attrs["mlwp_time_trait"] = "observation"
ds.attrs["mlwp_space_trait"] = "point"
ds.attrs["mlwp_uncertainty_trait"] = "deterministic"

ds.coords["valid_time"].attrs["standard_name"] = "time"
ds.coords["latitude"].attrs.update(
{"standard_name": "latitude", "units": "degrees_north"}
)
ds.coords["longitude"].attrs.update(
{"standard_name": "longitude", "units": "degrees_east"}
)

return ds
10 changes: 9 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ dependencies = [
"dask>=2026.1.2",
"earthkit-data>=0.19.0",
"h5netcdf>=1.8.1",
"h5py>=3.15.1",
"h5py<3.15",
"mlwp-data-loaders",
"netcdf4>=1.7.4",
"pyyaml>=6.0.3",
"scipy>=1.17.0",
Expand Down Expand Up @@ -40,7 +41,14 @@ jobqueue = [
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
testpaths = ["tests"]

[dependency-groups]
dev = [
"ipykernel>=7.2.0",
"pytest>=8.0",
]

[tool.uv.sources]
mlwp-data-loaders = { git = "https://github.com/mlwp-tools/mlwp-data-loaders", branch = "main" }
12 changes: 0 additions & 12 deletions src/mxalign/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
from .properties.properties import Properties, Time, Space, Uncertainty
from .loaders.loader import load
from .loaders.registry import available_loaders, register_loader
from .transformations.transform import transform
from .transformations.registry import available_transformations, register_transformation
from .interpolations.interpolate import interpolate
Expand All @@ -9,18 +6,10 @@
from .align.space import align_space

from . import accessors
from . import loaders
from . import transformations
from . import interpolations

__all__ = [
"Properties",
"Time",
"Space",
"Uncertainty",
"load",
"available_loaders",
"register_loader",
"transform",
"available_transformations",
"register_transformation",
Expand All @@ -30,7 +19,6 @@
"align_time",
"align_space",
"accessors",
"loaders",
"transformations",
"interpolations",
]
8 changes: 2 additions & 6 deletions src/mxalign/accessors/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
from . import space
from . import time
from . import mx

__all__ = [
"space",
"time",
]
__all__ = ["mx"]
Loading