You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/joss/paper.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,7 @@ bibliography: paper.bib
22
22
---
23
23
24
24
# Summary
25
+
25
26
The heatwave diagnostics package (`HDP`) is a Python package that provides the climate research community with tools to compute heatwave metrics for the large volumes of data produced by earth system model large ensembles, across multiple measures of heat, extreme heat thresholds, and heatwave definitions. The `HDP` leverages performance-oriented design using xarray, Dask, and Numba to maximize the use of available hardware resources while maintaining accessibility through an intuitive interface and well-documented user guide. This approach empowers the user to generate metrics for a wide and diverse range of heatwave types across the parameter space.
26
27
27
28
# Statement of Need
@@ -30,9 +31,11 @@ Accurate quantification of the evolution of heatwave trends in climate model out
30
31
Metrics such as heatwave frequency and duration are commonly used in hazard assessments, but there are few centralized tools and no universal heatwave criteria for computing them. This has resulted in parameter heterogeneity across the literature and has prompted some studies to adopt multiple definitions to build robustness (@perkins_review_2015). However, many studies rely on only a handful of metrics and definitions due to the excessive data management and computational burden of sampling a greater number of parameters (@perkins_measurement_2013). The introduction of large ensembles has further complicated the development of software tools, which have remained mostly specific to individual studies. Some generalized tools have been developed to address this problem, but do not contain explicit methods for evaluating the potential sensitivities of heatwave hazard to the choices of heat measure, extreme heat threshold, and heatwave definition.
31
32
32
33
Development of the `HDP` was started in 2023 primarily to address the computational obstacles around handling terabyte-scale large ensembles, but quickly evolved to investigate new scientific questions around how the selection of characteristic heatwave parameters may impact hazard analysis. The `HDP` can provide insight into how the spatial-temporal response of heatwaves to climate perturbations depends on the choice of heatwave parameters. Although software does exist for calculating heatwave metrics (e.g. [heatwave3](https://robwschlegel.github.io/heatwave3/index.html), [xclim](https://xclim.readthedocs.io/en/stable/indices.html), [ehfheatwaves](https://tammasloughran.github.io/ehfheatwaves/)), these tools are not optimized to analyze more than a few definitions and thresholds at a time nor do they offer diagnostic plots.
34
+
33
35
# Key Features
34
36
35
37
## Extension of XArray with Implementations of Dask and Numba
38
+
36
39
`xarray` is a popular Python package used for geospatial analysis and for working with the netCDF files produced by climate models. The `HDP` workflow is based around `xarray` and seamlessly integrates with the `xarray.DataArray` data structure. Parallelization of `HDP` functions is achieved through the integration of `dask` with automated chunking and task graph construction features built into the `xarray` library.
37
40
38
41
## Heatwave Metrics for Multiple Measures, Thresholds, and Definitions
@@ -59,6 +62,7 @@ The `HDP` allows the user to test a range of parameter values: for example, heat
59
62
: Description of the heatwave metrics produced by the HDP. \label{table:metrics}
60
63
61
64
## Diagnostic Notebooks and Figures
65
+
62
66
The automatic workflow compiles a "figure deck" containing diagnostic plots for multiple heatwave parameters and input variables. To simplify this process, figure decks are serialized and stored in a single Jupyter Notebook separated into descriptive sections. Basic descriptions are included in markdown cells at the top of each figure. The `HDPNotebook` class in `hdp.graphics.notebook` is utilized to facilitate the generation of these Notebooks internally, but can be called through the API as well to build custom notebooks. An example of a Notebook of the standard figure deck is shown in Figure \ref{fig:notebook}.
63
67
64
68

Copy file name to clipboardExpand all lines: docs/user.rst
+47-43Lines changed: 47 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,45 +30,60 @@ The HDP can be installed using PyPI. You can view the webpage `here <https://pyp
30
30
31
31
Quick Start
32
32
-----------
33
-
Below is example code that computes heatwave metrics for multiple measures, thresholds, and definitions. Heatwave metrics are obtained for the test dataset by comparing against the thresholds generated from the baseline dataset.
33
+
Below is example code that computes heatwave metrics for multiple measures, thresholds, and definitions from sample data generated by the HDP. Heatwave metrics are obtained for the "warming" data by comparing against the thresholds generated from the "control" data.
The sample data, metric data, and summary figures are all saved to the specified `hdp/docs/sample_data/` but this path can be changed as needed. The sample input data is the same data used in unit testing, where temperature is generated using a sine wave over time with a period of one year and a gradient is applied to decrease the temperature uniformly over latitude. This processes is encapsulated in the function `hdp.utils.generate_test_control_dataarray`. For the warming dataset, a slight warming trend is applied uniformly over time to simulate global warming. By generating these input datasets instead of supplying them directly, we reduce disk space needed to install/use the package with sample data included.
85
+
86
+
Example: Generating Heatwave Diagnostics
72
87
------------------------------------------
73
88
In this first example, we will produce heatwave metrics for one IPCC AR6 emission scenario, SSP3-7.0, run by the CESM2 climate model to produce a large ensemble called the "CESM2 Large Ensemble Community Project" or `LENS2 <https://www.cesm.ucar.edu/community-projects/lens2>`_. We will explore the following set of heatwave parameters:
74
89
@@ -98,11 +113,11 @@ To fully utilize the performance enhancments offered by the HDP, we must first s
Once a Dask cluster is initialized, we then need to organize our data into `xarray.DataArray <https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html>`_ objects. The entire HDP is built around xarray data structures to ensure ease of use and remain agnostic to input file types. Since we are working with a large ensemble, we need to make sure to concatenate the ensemble members along a "member" dimension. If we weren't using a large ensemble (a single long-running simulation for example), we would just omit this step. To read data from disk, we can use the `xarray.open_mfdataset <https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html>`_ function. Reading and post-processing data will look different from system to system, but the final format should be the same. Below is a list of xarray.DataArrays with the data structure for baseline_tasmax dataset visualized below:
120
+
Once a Dask cluster is initialized, we then need to organize our data into `xarray.DataArray <https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html>`_ objects. The entire HDP is built around xarray data structures to ensure ease of use and remain agnostic to input file types. Since we are working with a large ensemble, we need to make sure to concatenate the ensemble members along a "member" dimension. If we weren't using a large ensemble (a single long-running simulation for example), we would just omit this step. To read data from disk, we can use the `xarray.open_mfdataset <https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html>`_ function. Reading and post-processing data will look different from system to system, but the final format should be the same. Below is a list of `xarray.DataArrays` with the data structure for `baseline_tasmax` dataset visualized below:
106
121
107
122
.. code-block:: python
108
123
@@ -116,7 +131,7 @@ Once a Dask cluster is initialized, we then need to organize our data into `xarr
116
131
.. image:: assets/tasmax_dataarray_example.png
117
132
:width:600
118
133
119
-
The spatial coordinates for latitude and longitude should be named "lat" and "lon" respectively. The "time" coordinates should be decoded into CFTime objects and a "member" dimension should be created if an ensemble is being used.
134
+
The spatial coordinates for latitude and longitude should be named "lat" and "lon" respectively. The "time" coordinates should be decoded into `CFTime`` objects and a "member" dimension should be created if an ensemble is being used.
120
135
121
136
To begin, we first need to format these measures so that they are in the correct units. This process will also compute heat index values using the relative humidity (rh) datasets.
122
137
@@ -156,18 +171,7 @@ Since we are connected to a Dask cluster, we can write the output to a zarr stor
The Regional Aerosol Model Intercomparison Project (RAMIP) is a multi-model large ensemble of earth system model experiments conducted to quantify the role of regional aerosol emissions changes in near-term climate change projections (`Wilcox et al., 2023 <https://gmd.copernicus.org/articles/16/4451/2023/>`_). For the sake of simplicity, we will only investigate CESM2 (one of the 8 models available in this MIP) for this example. For CESM2, there are 10 ensemble members for each of the six model experiments. Each experiment is essentially a different emission scenario where regional aerosol emissions are held constant over different parts of the globe. We will use a historical simulation from 1960 to 1970 run produced by CESM2 from the same ensemble as the baseline for calculating the extreme heat threshold.
0 commit comments