This repository contains the code for the article "Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts" by R. Pic, C. Dombry, P. Naveau, and M. Taillardat available on arXiv and HAL.
@misc{Pic2024DRU,
author = {Pic, Romain and Dombry, Clément and Naveau, Philippe and Taillardat, Maxime},
title = {Distributional Regression U-Nets for the Postprocessing of Precipitation Ensemble Forecasts},
journal = {Artificial Intelligence for the Earth Systems},
month = jul,
year = {2025},
doi = {10.1175/AIES-D-24-0067.1},
publisher = {American Meteorological Society},
address = {Boston MA, USA},
volume = {4},
number = {4},
pages= {240067},
url = {https://journals.ametsoc.org/view/journals/aies/aop/AIES-D-24-0067.1/AIES-D-24-0067.1.xml},
}This repository does not provide the data required to run the code. The dataset used is available on Zenodo. Guidance is provided on the expected data shapes and the description of the files, in case you want to use the code on your own data.
The dataset used is not in this repository but is available on Zenodo.
The code of this repository relies on data in the form of numpy arrays in the data folder. The data folder should contain the following files:
X_trainval.npy,Y_trainval.npy: predictors and observations for the training/validation set.X_test.npy,Y_test.npy: predictors and observations for the test set.X_trainval_log.npy,X_test_log.npy: alternative predictors for the training/validation and test sets, where the raw precipitation is transformed using a log transformation before applying mean/min/max/sd. Use for the DRU models.trainval_dow.npy: day of the week for the training/validation set.X_constant.npy: constant fields (e.g., orography) used for both the training/validation and test sets.X_raw_trainval.npy,X_raw_test.npy: raw ensemble forecasts for the training/validation and test sets.trainval_dow.npy: day of the week for each sample in the training/validation set.dates.txt: comma-separated list of the dates corresponding to the data of the training/validation and test sets.
| File name | Expected shape |
|---|---|
X_trainval.npy / X_trainval_log.npy
|
( |
X_test.npy / X_test_log.npy
|
( |
Y_trainval.npy |
( |
Y_test.npy |
( |
trainval_dow.npy |
( |
X_constant.npy |
( |
X_raw_trainval.npy |
( |
X_raw_test.npy |
( |
where:
-
$n_{trainval}$ is the number of samples in the training/validation set, -
$n_{test}$ is the number of samples in the test set, -
$W$ and$H$ are the width and height of the grid considered, -
$n_{pred}$ is the number of predictors (without the constant fields), -
$n_{constant}$ is the number of constant fields, -
$n_{member}$ is the number of members in the raw ensemble.
The following sections describe the files of the repository in the order of a standard workflow. The utils folder contains utility functions used in the scripts. The output folder contains the results of the scripts both raw results and figures.
The models are trained using the predictors (X_trainval.npy) and the observations (Y_trainval.npy). The models are saved in the output/reference_models/models folder. Hyperparameters can be provided as arguments.
Quantile Regression Forests (QRF)
qrf_pred.R: R script to train a QRF at each grid point using the predictors.
QRF with tail extension (TQRF)
qrf+gtcnd_pred.R: R script to train a TQRF for a generalized truncated/censored normal distribution (GTCND), at each grid point using the predictors.qrf+csgd_pred.R: R script to train a TQRF for a censored shifted gamma distribution (CSGD), at each grid point using the predictors.
The U-Net-based methods are trained on the predictors (X_trainval.npy) and the observations (Y_trainval.npy). The models are saved in the output/unet_models/parameters folder. Hyperparameters can be provided as arguments.
unet_pred.py: Python script to train a U-Net model over the whole grid.group_seq.py: Python script to group the different repetitions and folds of the parameters predicted into a single file.
The metrics are computed using the models trained on the training/validation set and the test set. The metrics are saved in subfolders within the output folder. All the scripts have parameters that can be provided as arguments.
Continuous Ranked Probability Score (CRPS)
compute_crps.py: Python script to compute the CRPS of the reference models and the U-Net-based methods. Ouputs are saved in theoutput/{model}/CRPSfolder with{model}isreference_modelsorunet_models.plot_crps.py: Python script to plot the CRPS of the reference models and the U-Net-based methods. Outputs are saved in theoutput/plots/CRPSfolder.plot_crpss_raw.py: Python script to plot the Continuous Ranked Probability Skill Score (CRPSS) of the reference models and the U-Net-based methods with respect to the raw ensemble. Outputs are saved in theoutput/plots/CRPSS_rawfolder.plot_crpss_qrf.py: Python script to plot the Continuous Ranked Probability Skill Score (CRPSS) of the TQRF models and the U-Net-based methods with respect to the best QRF. Outputs are saved in theoutput/plots/CRPSS_qrffolder.
Rank Histograms
compute_rank_histograms.py: Python script to compute the rank histograms of the reference models and the U-Net-based methods. Outputs are saved in theoutput/{model}/RankHistogramsfolder with{model}isreference_modelsorunet_models.plot_rank_histograms.py: Python script to plot the rank histograms of the reference models and the U-Net-based methods. Outputs are saved in theoutput/plots/RankHistogramsfolder.
Receiver Operating Characteristic (ROC) curve
plot_roc.py: Python script to plot the ROC curve of the reference models and the U-Net-based methods. Outputs are saved in theoutput/plots/ROCfolder.
Here is a non-exhaustive list of the libraries and references used in this repository:
- scoringRules : R package to compute scoring rules.
- ranger : R package providing a fast implementation of random forests.
- cartopy : Python package for cartographic data visualization.
- reticulate : R package providing interoperability between Python and R
- Tensorflow and Keras : Python libraries for deep learning.
If you have any questions or feedback, please do not hesitate to inform us by opening an issue on this repository. I will do my best to answer your questions and improve the code if necessary.