Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts

This repository contains the code for the article "Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts" by R. Pic, C. Dombry, P. Naveau, and M. Taillardat available on arXiv and HAL.

@misc{Pic2024DRU,
      author        = {Pic, Romain and Dombry, Clément and Naveau, Philippe and Taillardat, Maxime},
      title         = {Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts},
      journal       = {Artificial Intelligence for the Earth Systems},
      month         = jul, 
      year          = {2025},
      doi = {10.1175/AIES-D-24-0067.1},
      publisher = {American Meteorological Society},
      address = {Boston MA, USA},
      volume = {4},
      number = {4},
      pages=      {240067},
      url = {https://journals.ametsoc.org/view/journals/aies/aop/AIES-D-24-0067.1/AIES-D-24-0067.1.xml},
}

This repository does not provide the data required to run the code. The dataset used is available on Zenodo. Guidance is provided on the expected data shapes and the description of the files, in case you want to use the code on your own data.

Data

The dataset used is not in this repository but is available on Zenodo.

The code of this repository relies on data in the form of numpy arrays in the data folder. The data folder should contain the following files:

X_trainval.npy, Y_trainval.npy : predictors and observations for the training/validation set.
X_test.npy, Y_test.npy : predictors and observations for the test set.
X_trainval_log.npy, X_test_log.npy : alternative predictors for the training/validation and test sets, where the raw precipitation is transformed using a log transformation before applying mean/min/max/sd. Use for the DRU models.
trainval_dow.npy : day of the week for the training/validation set.
X_constant.npy : constant fields (e.g., orography) used for both the training/validation and test sets.
X_raw_trainval.npy, X_raw_test.npy : raw ensemble forecasts for the training/validation and test sets.
trainval_dow.npy: day of the week for each sample in the training/validation set.
dates.txt: comma-separated list of the dates corresponding to the data of the training/validation and test sets.

Expected data shapes

File name	Expected shape
`X_trainval.npy` / `X_trainval_log.npy`	($n_{trainval}$, $H$, $W$, $n_{pred}$)
`X_test.npy` / `X_test_log.npy`	($n_{test}$, $H$, $W$, $n_{pred}$)
`Y_trainval.npy`	($n_{trainval}$, $H$, $W$)
`Y_test.npy`	($n_{test}$, $H$, $W$)
`trainval_dow.npy`	($n_{trainval}$,)
`X_constant.npy`	($H$, $W$, $n_{constant}$)
`X_raw_trainval.npy`	($n_{trainval}$, $H$, $W$, $n_{member}$)
`X_raw_test.npy`	($n_{test}$, $H$, $W$, $n_{member}$)

where:

$n_{trainval}$ is the number of samples in the training/validation set,
$n_{test}$ is the number of samples in the test set,
$W$ and $H$ are the width and height of the grid considered,
$n_{pred}$ is the number of predictors (without the constant fields),
$n_{constant}$ is the number of constant fields,
$n_{member}$ is the number of members in the raw ensemble.

Description of the files

The following sections describe the files of the repository in the order of a standard workflow. The utils folder contains utility functions used in the scripts. The output folder contains the results of the scripts both raw results and figures.

References models

The models are trained using the predictors (X_trainval.npy) and the observations (Y_trainval.npy). The models are saved in the output/reference_models/models folder. Hyperparameters can be provided as arguments.

Quantile Regression Forests (QRF)

qrf_pred.R : R script to train a QRF at each grid point using the predictors.

QRF with tail extension (TQRF)

qrf+gtcnd_pred.R : R script to train a TQRF for a generalized truncated/censored normal distribution (GTCND), at each grid point using the predictors.
qrf+csgd_pred.R : R script to train a TQRF for a censored shifted gamma distribution (CSGD), at each grid point using the predictors.

U-Net-based methods

The U-Net-based methods are trained on the predictors (X_trainval.npy) and the observations (Y_trainval.npy). The models are saved in the output/unet_models/parameters folder. Hyperparameters can be provided as arguments.

unet_pred.py : Python script to train a U-Net model over the whole grid.
group_seq.py : Python script to group the different repetitions and folds of the parameters predicted into a single file.

Metrics

The metrics are computed using the models trained on the training/validation set and the test set. The metrics are saved in subfolders within the output folder. All the scripts have parameters that can be provided as arguments.

Continuous Ranked Probability Score (CRPS)

compute_crps.py : Python script to compute the CRPS of the reference models and the U-Net-based methods. Ouputs are saved in the output/{model}/CRPS folder with {model} is reference_models or unet_models.
plot_crps.py : Python script to plot the CRPS of the reference models and the U-Net-based methods. Outputs are saved in the output/plots/CRPS folder.
plot_crpss_raw.py : Python script to plot the Continuous Ranked Probability Skill Score (CRPSS) of the reference models and the U-Net-based methods with respect to the raw ensemble. Outputs are saved in the output/plots/CRPSS_raw folder.
plot_crpss_qrf.py : Python script to plot the Continuous Ranked Probability Skill Score (CRPSS) of the TQRF models and the U-Net-based methods with respect to the best QRF. Outputs are saved in the output/plots/CRPSS_qrf folder.

Rank Histograms

compute_rank_histograms.py : Python script to compute the rank histograms of the reference models and the U-Net-based methods. Outputs are saved in the output/{model}/RankHistograms folder with {model} is reference_models or unet_models.
plot_rank_histograms.py : Python script to plot the rank histograms of the reference models and the U-Net-based methods. Outputs are saved in the output/plots/RankHistograms folder.

Receiver Operating Characteristic (ROC) curve

plot_roc.py : Python script to plot the ROC curve of the reference models and the U-Net-based methods. Outputs are saved in the output/plots/ROC folder.

References and dependencies

Here is a non-exhaustive list of the libraries and references used in this repository:

scoringRules : R package to compute scoring rules.
ranger : R package providing a fast implementation of random forests.
cartopy : Python package for cartographic data visualization.
reticulate : R package providing interoperability between Python and R
Tensorflow and Keras : Python libraries for deep learning.

Feedback

If you have any questions or feedback, please do not hesitate to inform us by opening an issue on this repository. I will do my best to answer your questions and improve the code if necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts

Table of contents

Data

Expected data shapes

Description of the files

References models

U-Net-based methods

Metrics

References and dependencies

Feedback

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
metrics		metrics
output		output
reference_models		reference_models
unet		unet
utils		utils
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts

Table of contents

Data

Expected data shapes

Description of the files

References models

U-Net-based methods

Metrics

References and dependencies

Feedback

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages