Skip to content

Commit e2c68d5

Browse files
authored
Merge pull request #21 from AgentOxygen/docs-update
Updated documentation
2 parents 924dced + 5477937 commit e2c68d5

8 files changed

Lines changed: 142 additions & 61 deletions

File tree

.readthedocs.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,6 @@ sphinx:
1919
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
2020
python:
2121
install:
22-
- requirements: docs/requirements.txt
22+
- requirements: requirements.txt
23+
- method: pip
24+
path: .

Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ WORKDIR /project
55
COPY . .
66

77
RUN pip install --upgrade pip
8+
RUN pip install -r requirements.txt
89
RUN pip install pytest sphinx sphinx-autobuild
910

1011
RUN pip install -e .

README.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -22,38 +22,43 @@ To learn more about the HDP and how to use it, check out the full ReadTheDocs do
2222
The code block below showcases an example HDP workflow for a 400 GB high performance computer:
2323

2424
```
25-
from dask.distributed import Client, LocalCluster
26-
import numpy as np
27-
import xarray
2825
import hdp
26+
import numpy as np
2927
28+
output_dir = "."
3029
31-
cluster = LocalCluster(n_workers=10, memory_limit="40GB", threads_per_worker=1, processes=True)
32-
client = Client(cluster)
33-
34-
input_dir = "/local1/climate_model_output/"
35-
36-
baseline_tasmax = xarray.open_zarr(f"{input_dir}CESM2_historical_day_tasmax.zarr")["tasmax"]
37-
test_tasmax = xarray.open_zarr(f"{input_dir}CESM2_ssp370_day_tasmax.zarr")["tasmax"]
30+
sample_control_temp = hdp.utils.generate_test_control_dataarray()
31+
sample_warming_temp = hdp.utils.generate_test_warming_dataarray()
3832
39-
baseline_measures = hdp.measure.format_standard_measures(temp_datasets=[baseline_tasmax])
40-
test_measures = hdp.measure.format_standard_measures(temp_datasets=[test_tasmax])
33+
baseline_measures = hdp.measure.format_standard_measures(
34+
temp_datasets=[sample_control_temp]
35+
)
36+
test_measures = hdp.measure.format_standard_measures(
37+
temp_datasets=[sample_warming_temp]
38+
)
4139
4240
percentiles = np.arange(0.9, 1.0, 0.01)
4341
44-
4542
thresholds_dataset = hdp.threshold.compute_thresholds(
4643
baseline_measures,
4744
percentiles
4845
)
4946
5047
definitions = [[3,0,0], [3,1,1], [4,0,0], [4,1,1], [5,0,0], [5,1,1]]
5148
52-
metrics_dataset = hdp.metric.compute_group_metrics(test_measures, thresholds_dataset, definitions)
53-
metrics_dataset = metrics_dataset.to_zarr("/local1/test_metrics.zarr", mode='w')
49+
metrics_dataset = hdp.metric.compute_group_metrics(test_measures, thresholds_dataset, definitions, include_threshold=True)
50+
metrics_dataset.to_netcdf(f"{output_dir}/sample_hw_metrics.nc", mode='w')
51+
52+
figure_notebook = create_notebook(metrics_dataset)
53+
figure_notebook.save_notebook(f"{output_dir}/sample_hw_summary_figures.ipynb")
54+
55+
sample_control_temp = sample_control_temp.to_dataset()
56+
sample_control_temp.attrs["description"] = "Mock control temperature dataset generated by HDP for unit testing."
57+
sample_control_temp.to_netcdf(f"{output_dir}/sample_control_temp.nc", mode='w')
5458
55-
figure_notebook = hdp.hdp.create_notebook(metrics_dataset)
56-
figure_notebook.save_notebook("/local1/heatwave_summary_figures.ipynb")
59+
sample_warming_temp = sample_warming_temp.to_dataset()
60+
sample_warming_temp.attrs["description"] = "Mock temperature dataset with warming trend generated by HDP for unit testing."
61+
sample_warming_temp.to_netcdf(f"{output_dir}/sample_warming_temp.nc", mode='w')
5762
```
5863

5964
# Contributing

docs/joss/paper.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ bibliography: paper.bib
2222
---
2323

2424
# Summary
25+
2526
The heatwave diagnostics package (`HDP`) is a Python package that provides the climate research community with tools to compute heatwave metrics for the large volumes of data produced by earth system model large ensembles, across multiple measures of heat, extreme heat thresholds, and heatwave definitions. The `HDP` leverages performance-oriented design using xarray, Dask, and Numba to maximize the use of available hardware resources while maintaining accessibility through an intuitive interface and well-documented user guide. This approach empowers the user to generate metrics for a wide and diverse range of heatwave types across the parameter space.
2627

2728
# Statement of Need
@@ -30,9 +31,11 @@ Accurate quantification of the evolution of heatwave trends in climate model out
3031
Metrics such as heatwave frequency and duration are commonly used in hazard assessments, but there are few centralized tools and no universal heatwave criteria for computing them. This has resulted in parameter heterogeneity across the literature and has prompted some studies to adopt multiple definitions to build robustness (@perkins_review_2015). However, many studies rely on only a handful of metrics and definitions due to the excessive data management and computational burden of sampling a greater number of parameters (@perkins_measurement_2013). The introduction of large ensembles has further complicated the development of software tools, which have remained mostly specific to individual studies. Some generalized tools have been developed to address this problem, but do not contain explicit methods for evaluating the potential sensitivities of heatwave hazard to the choices of heat measure, extreme heat threshold, and heatwave definition.
3132

3233
Development of the `HDP` was started in 2023 primarily to address the computational obstacles around handling terabyte-scale large ensembles, but quickly evolved to investigate new scientific questions around how the selection of characteristic heatwave parameters may impact hazard analysis. The `HDP` can provide insight into how the spatial-temporal response of heatwaves to climate perturbations depends on the choice of heatwave parameters. Although software does exist for calculating heatwave metrics (e.g. [heatwave3](https://robwschlegel.github.io/heatwave3/index.html), [xclim](https://xclim.readthedocs.io/en/stable/indices.html), [ehfheatwaves](https://tammasloughran.github.io/ehfheatwaves/)), these tools are not optimized to analyze more than a few definitions and thresholds at a time nor do they offer diagnostic plots.
34+
3335
# Key Features
3436

3537
## Extension of XArray with Implementations of Dask and Numba
38+
3639
`xarray` is a popular Python package used for geospatial analysis and for working with the netCDF files produced by climate models. The `HDP` workflow is based around `xarray` and seamlessly integrates with the `xarray.DataArray` data structure. Parallelization of `HDP` functions is achieved through the integration of `dask` with automated chunking and task graph construction features built into the `xarray` library.
3740

3841
## Heatwave Metrics for Multiple Measures, Thresholds, and Definitions
@@ -59,6 +62,7 @@ The `HDP` allows the user to test a range of parameter values: for example, heat
5962
: Description of the heatwave metrics produced by the HDP. \label{table:metrics}
6063

6164
## Diagnostic Notebooks and Figures
65+
6266
The automatic workflow compiles a "figure deck" containing diagnostic plots for multiple heatwave parameters and input variables. To simplify this process, figure decks are serialized and stored in a single Jupyter Notebook separated into descriptive sections. Basic descriptions are included in markdown cells at the top of each figure. The `HDPNotebook` class in `hdp.graphics.notebook` is utilized to facilitate the generation of these Notebooks internally, but can be called through the API as well to build custom notebooks. An example of a Notebook of the standard figure deck is shown in Figure \ref{fig:notebook}.
6367

6468
![Example of an HDP standard figure deck \label{fig:notebook}](HDP_Notebook_Example.png)

docs/joss/paper.pdf

419 Bytes
Binary file not shown.

docs/sample_data/sample.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
#!/usr/bin/env python
2+
"""
3+
hdp.py
4+
5+
Heatwave Diagnostics Package (HDP)
6+
7+
Entry point for package.
8+
9+
Developer: Cameron Cummins
10+
Contact: cameron.cummins@utexas.edu
11+
"""
12+
from hdp.graphics.notebook import create_notebook
13+
from os.path import isdir
14+
import hdp.measure
15+
import hdp.threshold
16+
import hdp.metric
17+
import numpy as np
18+
import sys
19+
20+
21+
def generate_sample_data(output_dir):
22+
if not isdir(output_dir):
23+
raise RuntimeError(f"Output directory '{output_dir}' does not exist!")
24+
25+
sample_control_temp = hdp.utils.generate_test_control_dataarray()
26+
sample_warming_temp = hdp.utils.generate_test_warming_dataarray()
27+
28+
baseline_measures = hdp.measure.format_standard_measures(
29+
temp_datasets=[sample_control_temp]
30+
)
31+
test_measures = hdp.measure.format_standard_measures(
32+
temp_datasets=[sample_warming_temp]
33+
)
34+
35+
percentiles = np.arange(0.9, 1.0, 0.01)
36+
37+
thresholds_dataset = hdp.threshold.compute_thresholds(
38+
baseline_measures,
39+
percentiles
40+
)
41+
42+
definitions = [[3,0,0], [3,1,1], [4,0,0], [4,1,1], [5,0,0], [5,1,1]]
43+
44+
metrics_dataset = hdp.metric.compute_group_metrics(test_measures, thresholds_dataset, definitions, include_threshold=True)
45+
metrics_dataset.to_netcdf(f"{output_dir}/sample_hw_metrics.nc", mode='w')
46+
47+
figure_notebook = create_notebook(metrics_dataset)
48+
figure_notebook.save_notebook(f"{output_dir}/sample_hw_summary_figures.ipynb")
49+
50+
sample_control_temp = sample_control_temp.to_dataset()
51+
sample_control_temp.attrs["description"] = "Mock control temperature dataset generated by HDP for unit testing."
52+
sample_control_temp.to_netcdf(f"{output_dir}/sample_control_temp.nc", mode='w')
53+
54+
sample_warming_temp = sample_warming_temp.to_dataset()
55+
sample_warming_temp.attrs["description"] = "Mock temperature dataset with warming trend generated by HDP for unit testing."
56+
sample_warming_temp.to_netcdf(f"{output_dir}/sample_warming_temp.nc", mode='w')
57+
58+
if __name__ == "__main__":
59+
print("Generating testing data and simulating a full data-to-figure workflow: ")
60+
if len(sys.argv) != 2:
61+
assert Exception("Please specifiy the path to output sample data and results to.")
62+
generate_sample_data(sys.argv[1])
63+
print("Done!")

docs/user.rst

Lines changed: 47 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -30,45 +30,60 @@ The HDP can be installed using PyPI. You can view the webpage `here <https://pyp
3030
3131
Quick Start
3232
-----------
33-
Below is example code that computes heatwave metrics for multiple measures, thresholds, and definitions. Heatwave metrics are obtained for the test dataset by comparing against the thresholds generated from the baseline dataset.
33+
Below is example code that computes heatwave metrics for multiple measures, thresholds, and definitions from sample data generated by the HDP. Heatwave metrics are obtained for the "warming" data by comparing against the thresholds generated from the "control" data.
3434

3535
.. code-block:: python
3636
37-
from dask.distributed import Client, LocalCluster
38-
import numpy as np
39-
import xarray
4037
import hdp
41-
42-
43-
cluster = LocalCluster(n_workers=10, memory_limit="40GB", threads_per_worker=1, processes=True)
44-
client = Client(cluster)
45-
46-
input_dir = "/local1/climate_model_output/"
47-
48-
baseline_tasmax = xarray.open_zarr(f"{input_dir}CESM2_historical_day_tasmax.zarr")["tasmax"]
49-
test_tasmax = xarray.open_zarr(f"{input_dir}CESM2_ssp370_day_tasmax.zarr")["tasmax"]
50-
51-
baseline_measures = hdp.measure.format_standard_measures(temp_datasets=[baseline_tasmax])
52-
test_measures = hdp.measure.format_standard_measures(temp_datasets=[test_tasmax])
53-
38+
import numpy as np
39+
40+
output_dir = "."
41+
42+
sample_control_temp = hdp.utils.generate_test_control_dataarray()
43+
sample_warming_temp = hdp.utils.generate_test_warming_dataarray()
44+
45+
baseline_measures = hdp.measure.format_standard_measures(
46+
temp_datasets=[sample_control_temp]
47+
)
48+
test_measures = hdp.measure.format_standard_measures(
49+
temp_datasets=[sample_warming_temp]
50+
)
51+
5452
percentiles = np.arange(0.9, 1.0, 0.01)
55-
56-
53+
5754
thresholds_dataset = hdp.threshold.compute_thresholds(
5855
baseline_measures,
5956
percentiles
6057
)
61-
58+
6259
definitions = [[3,0,0], [3,1,1], [4,0,0], [4,1,1], [5,0,0], [5,1,1]]
63-
64-
metrics_dataset = hdp.metric.compute_group_metrics(test_measures, thresholds_dataset, definitions)
65-
metrics_dataset = metrics_dataset.to_zarr("/local1/test_metrics.zarr", mode='w')
66-
67-
figure_notebook = hdp.hdp.create_notebook(metrics_dataset)
68-
figure_notebook.save_notebook("/local1/heatwave_summary_figures.ipynb")
69-
7060
71-
Example 1: Generating Heatwave Diagnostics
61+
metrics_dataset = hdp.metric.compute_group_metrics(test_measures, thresholds_dataset, definitions, include_threshold=True)
62+
metrics_dataset.to_netcdf(f"{output_dir}/sample_hw_metrics.nc", mode='w')
63+
64+
figure_notebook = create_notebook(metrics_dataset)
65+
figure_notebook.save_notebook(f"{output_dir}/sample_hw_summary_figures.ipynb")
66+
67+
sample_control_temp = sample_control_temp.to_dataset()
68+
sample_control_temp.attrs["description"] = "Mock control temperature dataset generated by HDP for unit testing."
69+
sample_control_temp.to_netcdf(f"{output_dir}/sample_control_temp.nc", mode='w')
70+
71+
sample_warming_temp = sample_warming_temp.to_dataset()
72+
sample_warming_temp.attrs["description"] = "Mock temperature dataset with warming trend generated by HDP for unit testing."
73+
sample_warming_temp.to_netcdf(f"{output_dir}/sample_warming_temp.nc", mode='w')
74+
75+
This code snippet is included in the HDP source code and can be executed via:
76+
77+
.. code-block:: console
78+
79+
$ git clone https://github.com/AgentOxygen/HDP.git
80+
$ cd HDP
81+
$ python hdp/docs/sample_data/sample.py hdp/docs/sample_data/
82+
83+
84+
The sample data, metric data, and summary figures are all saved to the specified `hdp/docs/sample_data/` but this path can be changed as needed. The sample input data is the same data used in unit testing, where temperature is generated using a sine wave over time with a period of one year and a gradient is applied to decrease the temperature uniformly over latitude. This processes is encapsulated in the function `hdp.utils.generate_test_control_dataarray`. For the warming dataset, a slight warming trend is applied uniformly over time to simulate global warming. By generating these input datasets instead of supplying them directly, we reduce disk space needed to install/use the package with sample data included.
85+
86+
Example: Generating Heatwave Diagnostics
7287
------------------------------------------
7388
In this first example, we will produce heatwave metrics for one IPCC AR6 emission scenario, SSP3-7.0, run by the CESM2 climate model to produce a large ensemble called the "CESM2 Large Ensemble Community Project" or `LENS2 <https://www.cesm.ucar.edu/community-projects/lens2>`_. We will explore the following set of heatwave parameters:
7489

@@ -98,11 +113,11 @@ To fully utilize the performance enhancments offered by the HDP, we must first s
98113
.. code-block:: python
99114
100115
from dask.distributed import Client, LocalCluster
101-
cluster = LocalCluster(n_workers=20, memory_limit="10GB", threads_per_worker=1, processes=True, dashboard_address=":8004")
116+
cluster = LocalCluster(n_workers=20, memory_limit="10GB", threads_per_worker=1, processes=True)
102117
client = Client(cluster)
103118
104119
105-
Once a Dask cluster is initialized, we then need to organize our data into `xarray.DataArray <https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html>`_ objects. The entire HDP is built around xarray data structures to ensure ease of use and remain agnostic to input file types. Since we are working with a large ensemble, we need to make sure to concatenate the ensemble members along a "member" dimension. If we weren't using a large ensemble (a single long-running simulation for example), we would just omit this step. To read data from disk, we can use the `xarray.open_mfdataset <https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html>`_ function. Reading and post-processing data will look different from system to system, but the final format should be the same. Below is a list of xarray.DataArrays with the data structure for baseline_tasmax dataset visualized below:
120+
Once a Dask cluster is initialized, we then need to organize our data into `xarray.DataArray <https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html>`_ objects. The entire HDP is built around xarray data structures to ensure ease of use and remain agnostic to input file types. Since we are working with a large ensemble, we need to make sure to concatenate the ensemble members along a "member" dimension. If we weren't using a large ensemble (a single long-running simulation for example), we would just omit this step. To read data from disk, we can use the `xarray.open_mfdataset <https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html>`_ function. Reading and post-processing data will look different from system to system, but the final format should be the same. Below is a list of `xarray.DataArrays` with the data structure for `baseline_tasmax` dataset visualized below:
106121

107122
.. code-block:: python
108123
@@ -116,7 +131,7 @@ Once a Dask cluster is initialized, we then need to organize our data into `xarr
116131
.. image:: assets/tasmax_dataarray_example.png
117132
:width: 600
118133

119-
The spatial coordinates for latitude and longitude should be named "lat" and "lon" respectively. The "time" coordinates should be decoded into CFTime objects and a "member" dimension should be created if an ensemble is being used.
134+
The spatial coordinates for latitude and longitude should be named "lat" and "lon" respectively. The "time" coordinates should be decoded into `CFTime`` objects and a "member" dimension should be created if an ensemble is being used.
120135

121136
To begin, we first need to format these measures so that they are in the correct units. This process will also compute heat index values using the relative humidity (rh) datasets.
122137

@@ -156,18 +171,7 @@ Since we are connected to a Dask cluster, we can write the output to a zarr stor
156171

157172
.. code-block:: python
158173
159-
metrics_dataset.to_zarr("/local1/lens2_ssp370_hw_metrics.zarr", mode='w', compute=True)
160-
161-
162-
:ref:`example_2`
163-
164-
Example 2: RAMIP Analysis
165-
-------------------------
166-
The Regional Aerosol Model Intercomparison Project (RAMIP) is a multi-model large ensemble of earth system model experiments conducted to quantify the role of regional aerosol emissions changes in near-term climate change projections (`Wilcox et al., 2023 <https://gmd.copernicus.org/articles/16/4451/2023/>`_). For the sake of simplicity, we will only investigate CESM2 (one of the 8 models available in this MIP) for this example. For CESM2, there are 10 ensemble members for each of the six model experiments. Each experiment is essentially a different emission scenario where regional aerosol emissions are held constant over different parts of the globe. We will use a historical simulation from 1960 to 1970 run produced by CESM2 from the same ensemble as the baseline for calculating the extreme heat threshold.
167-
174+
metrics_dataset.to_zarr("lens2_ssp370_hw_metrics.zarr", mode='w', compute=True)
168175
169-
:ref:`threshold_calc`
170-
Threshold Calculation
171-
---------------------
172176
173177
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,6 @@ netCDF4
1010
tqdm
1111
ipywidgets
1212
nbformat
13+
sphinx
14+
sphinx-autobuild
1315
sphinx-rtd-theme

0 commit comments

Comments
 (0)