Skip to content

Updating drought metrics notebook to using new core#245

Open
claalmve wants to merge 53 commits into
mainfrom
feature/drought-metrics-new-core
Open

Updating drought metrics notebook to using new core#245
claalmve wants to merge 53 commits into
mainfrom
feature/drought-metrics-new-core

Conversation

@claalmve
Copy link
Copy Markdown
Contributor

@claalmve claalmve commented Apr 20, 2026

Summary of changes

Making updates to drought_metrics notebook to use new core functionality, as well as redo some organization of the notebook to make it more concise and clear.

Link to corresponding Jira ticket(s)

Jira ticket

Content & Documentation

  • Introduction clearly explains the purpose of the analysis
  • Intended application of the notebook listed
    • User-focused question/example provided showing how a user may want to use the notebook
  • Incorporates references to appropriate Guidance materials
  • Error messages included for areas with common user errors

Runtime Information

  • Overall runtime on AE JupyterHub documented

Output Management

  • Output from cells removed before committing

Code Quality

  • Notebook cleaned up with:
    • black
    • isort

claalmve and others added 30 commits March 4, 2026 15:24
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Calvin Chen <33679281+claalmve@users.noreply.github.com>
Co-authored-by: Nicole Keeney <nicolejkeeney@gmail.com>
@claalmve claalmve changed the title Feature/drought metrics new core Updating drought metrics notebook to using new core May 12, 2026
@claalmve claalmve marked this pull request as ready for review May 12, 2026 22:58
@acordonez
Copy link
Copy Markdown
Contributor

I've had the notebook running on a medium Hub instance for ~ 35 minutes and it seems to be hanging in this cell in Section 21 - notebook issue or Hub issue?

# Load `combined_ds` into memory
with ProgressBar():
    combined_ds = combined_ds.compute()

@claalmve
Copy link
Copy Markdown
Contributor Author

I've had the notebook running on a medium Hub instance for ~ 35 minutes and it seems to be hanging in this cell in Section 21 - notebook issue or Hub issue?

# Load `combined_ds` into memory
with ProgressBar():
    combined_ds = combined_ds.compute()

Hmm, I think that cell only takes 5-10 min for me. Could you try restarting your kernel and running it again?

@acordonez
Copy link
Copy Markdown
Contributor

I've had the notebook running on a medium Hub instance for ~ 35 minutes and it seems to be hanging in this cell in Section 21 - notebook issue or Hub issue?

# Load `combined_ds` into memory
with ProgressBar():
    combined_ds = combined_ds.compute()

Hmm, I think that cell only takes 5-10 min for me. Could you try restarting your kernel and running it again?

Still hanging, when I hit keyboard interrupt I get this:

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[9], line 3
      1 # Load `combined_ds` into memory
      2 with ProgressBar():
----> 3     combined_ds = combined_ds.compute()

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:792](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=791), in Dataset.compute(self, **kwargs)
    762 """Trigger loading data into memory and return a new dataset.
    763 
    764 Data will be computed and[/or](https://hub.cal-adapt.org/or) loaded from disk or a remote source.
   (...)
    789 Variable.compute
    790 """
    791 new = self.copy(deep=False)
--> 792 return new.load(**kwargs)

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:558](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=557), in Dataset.load(self, **kwargs)
    555 chunkmanager = get_chunked_array_type(*chunked_data.values())
    557 # evaluate all the chunked arrays simultaneously
--> 558 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    559     *chunked_data.values(), **kwargs
    560 )
    562 for k, data in zip(chunked_data, evaluated_data, strict=False):
    563     self.variables[k].data = data

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py:85](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py#line=84), in DaskManager.compute(self, *data, **kwargs)
     80 def compute(
     81     self, *data: Any, **kwargs: Any
     82 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     83     from dask.array import compute
---> 85     return compute(*data, **kwargs)

File ~/.local/lib/python3.12/site-packages/dask/base.py:685, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File ~/.local/lib/python3.12/site-packages/dask/_expr.py:1152, in _HLGExprSequence.__dask_graph__(self)
   1149 def __dask_graph__(self):
   1150     # This class has to override this and not just _layer to ensure the HLGs
   1151     # are not optimized individually
-> 1152     return ensure_dict(self._optimized_dsk)

File [/srv/conda/envs/notebook/lib/python3.12/functools.py:998](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/functools.py#line=997), in cached_property.__get__(self, instance, owner)
    996 val = cache.get(self.attrname, _NOT_FOUND)
    997 if val is _NOT_FOUND:
--> 998     val = self.func(instance)
    999     try:
   1000         cache[self.attrname] = val

File ~/.local/lib/python3.12/site-packages/dask/_expr.py:1144, in _HLGExprSequence._optimized_dsk(self)
   1142     dsk = hlgexpr.hlg
   1143     if (optimizer := hlgexpr.low_level_optimizer) is not None:
-> 1144         dsk = optimizer(dsk, keys)
   1145     graphs.append(dsk)
   1147 return HighLevelGraph.merge(*graphs)

File ~/.local/lib/python3.12/site-packages/dask/array/optimization.py:52, in optimize(dsk, keys, **kwargs)
     50 dsk = optimize_blockwise(dsk, keys=keys)
     51 dsk = fuse_roots(dsk, keys=keys)
---> 52 dsk = dsk.cull(set(keys))
     54 # Perform low-level fusion unless the user has
     55 # specified False explicitly.
     56 if config.get("optimization.fuse.active") is False:

File ~/.local/lib/python3.12/site-packages/dask/highlevelgraph.py:769, in HighLevelGraph.cull(self, keys)
    767 layer = self.layers[layer_name]
    768 if keys_set:
--> 769     culled_layer, culled_deps = layer.cull(keys_set, all_ext_keys)
    770     if not culled_deps:
    771         continue

File ~/.local/lib/python3.12/site-packages/dask/blockwise.py:736, in Blockwise.cull(self, keys, all_hlg_keys)
    734 output_blocks: set[tuple[int, ...]] = set()
    735 for key in keys:
--> 736     if key[0] == self.output:
    737         output_blocks.add(key[1:])
    738 culled_deps = self._cull_dependencies(output_blocks)

KeyboardInterrupt:

Copy link
Copy Markdown
Contributor

@acordonez acordonez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review doesn't include looking at the output files yet, but once I can get the notebook running end-to-end I'll also look at that. Mostly documentation comments here

"# Define the lat/lon for the location of interest and the variables needed for the Penman-Monteith PET method\n",
"LAT = 37.787964\n",
"LON = -122.065063\n",
"WARMING_LEVELS = [0.8, 2.0]"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is currently hard-coded to run on only two WLs, it either needs a warning and check to make sure only 2 WLs are set or needs flexibility to work with any number of WLs.

"outputs": [],
"source": [
"# Define the lat/lon for the location of interest and the variables needed for the Penman-Monteith PET method\n",
"LAT = 37.787964\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as familiar with these drought indices, but if it's appropriate to use them over areas like a watershed it'd be nice to give users the option to use areas other than lat/lon.

"metadata": {},
"source": [
"**Variables needed:**\n",
"- `t2` (hourly) → daily `tasmin` and `tasmax` derived via resample\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why the notebook resamples hourly rather than pulling t2min and t2max with new core?

"metadata": {},
"outputs": [],
"source": [
"# Call xclim.indicies.potential_evapotranspiration on the entire dataset as it is\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on the entire dataset as it is" - unclear on what the takeaway is for users from this part of the comment. Is there something about the data formatting we should be aware of?

"id": "50d973f5-8d03-48a7-9ac6-4ac36ef4b120",
"metadata": {},
"source": [
"#### Helper Function"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a little more information beyond the Helper Function heading

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including motivation - it looks like this would help speed up the calculation for PDSI later in the notebook?

"id": "f0645679",
"metadata": {},
"source": [
"EDDI is computed as a standardized anomaly using `xclim`'s `standardized_index`, calibrated over the 0.8°C warming level period (2000–2029), with values clipped to [−2.5, 2.5]."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to use 0.8 as the baseline? Also the (2000-2029) range gives me the impression that the warming level is linked to those years in the models as opposed to being the dummy time range.

"id": "3dc2e324",
"metadata": {},
"source": [
"### 2.2. Calculate EDDI"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to have an intro about what the EDDI is, along with a reference.

"source": [
"This notebook demonstrates how you can use `climakitae` variables to compute and export two drought indices for a given location, broken down in the table below:\n",
"\n",
"| Index | Method | Output file |\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice table, wondering if there's somewhere in the intro where this table would fit in.

"id": "64641b82",
"metadata": {},
"source": [
"This notebook calculates two drought metrics, the Palmer Drought Severity Index (PDSI) and the Evaporative Demand Drought Index (EDDI), using WRF data in the AE catalog. PDSI relies on Potential Evapotranspiration (PET), which is computed using the Penman-Monteith method. At the end of the notebook, the user will be able to export monthly PDSI and EDDI under different global warming levels for a specific lat/lon as netcdf files for further analysis."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any drought metrics guidance on AE website to link here?

"metadata": {},
"source": [
"### Penman-Monteith PET Method (most physically consistent)\n",
"We will be using the PET method for calculating PDSI, since it is the most physically consistent method. Below, we list out the variables we will need."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to see a little more explanation and maybe a link to a good source about these methods. I'm a little thrown off just seeing "most physically consistent method" without any other background.

Copy link
Copy Markdown
Contributor

@acordonez acordonez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the EDDI section to run, just not PDSI. I'm finding the warming level dimension on the output dataset confusing. If it only contains the WL 2.0 data, it'd be great to change the warming level dimension name to 2.0 and change "wl" to "warming_level" or whatever the exact equivalent is in the data returned by ClimateData.

Image

A figure that shows the EDDI timeseries for one sim with a little explanation about how to read the figure would be a nice-to-have but not required item.

@acordonez
Copy link
Copy Markdown
Contributor

I've had the notebook running on a medium Hub instance for ~ 35 minutes and it seems to be hanging in this cell in Section 21 - notebook issue or Hub issue?

# Load `combined_ds` into memory
with ProgressBar():
    combined_ds = combined_ds.compute()

Hmm, I think that cell only takes 5-10 min for me. Could you try restarting your kernel and running it again?

Still hanging, when I hit keyboard interrupt I get this:

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[9], line 3
      1 # Load `combined_ds` into memory
      2 with ProgressBar():
----> 3     combined_ds = combined_ds.compute()

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:792](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=791), in Dataset.compute(self, **kwargs)
    762 """Trigger loading data into memory and return a new dataset.
    763 
    764 Data will be computed and[/or](https://hub.cal-adapt.org/or) loaded from disk or a remote source.
   (...)
    789 Variable.compute
    790 """
    791 new = self.copy(deep=False)
--> 792 return new.load(**kwargs)

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:558](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py#line=557), in Dataset.load(self, **kwargs)
    555 chunkmanager = get_chunked_array_type(*chunked_data.values())
    557 # evaluate all the chunked arrays simultaneously
--> 558 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    559     *chunked_data.values(), **kwargs
    560 )
    562 for k, data in zip(chunked_data, evaluated_data, strict=False):
    563     self.variables[k].data = data

File [/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py:85](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py#line=84), in DaskManager.compute(self, *data, **kwargs)
     80 def compute(
     81     self, *data: Any, **kwargs: Any
     82 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     83     from dask.array import compute
---> 85     return compute(*data, **kwargs)

File ~/.local/lib/python3.12/site-packages/dask/base.py:685, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File ~/.local/lib/python3.12/site-packages/dask/_expr.py:1152, in _HLGExprSequence.__dask_graph__(self)
   1149 def __dask_graph__(self):
   1150     # This class has to override this and not just _layer to ensure the HLGs
   1151     # are not optimized individually
-> 1152     return ensure_dict(self._optimized_dsk)

File [/srv/conda/envs/notebook/lib/python3.12/functools.py:998](https://hub.cal-adapt.org/srv/conda/envs/notebook/lib/python3.12/functools.py#line=997), in cached_property.__get__(self, instance, owner)
    996 val = cache.get(self.attrname, _NOT_FOUND)
    997 if val is _NOT_FOUND:
--> 998     val = self.func(instance)
    999     try:
   1000         cache[self.attrname] = val

File ~/.local/lib/python3.12/site-packages/dask/_expr.py:1144, in _HLGExprSequence._optimized_dsk(self)
   1142     dsk = hlgexpr.hlg
   1143     if (optimizer := hlgexpr.low_level_optimizer) is not None:
-> 1144         dsk = optimizer(dsk, keys)
   1145     graphs.append(dsk)
   1147 return HighLevelGraph.merge(*graphs)

File ~/.local/lib/python3.12/site-packages/dask/array/optimization.py:52, in optimize(dsk, keys, **kwargs)
     50 dsk = optimize_blockwise(dsk, keys=keys)
     51 dsk = fuse_roots(dsk, keys=keys)
---> 52 dsk = dsk.cull(set(keys))
     54 # Perform low-level fusion unless the user has
     55 # specified False explicitly.
     56 if config.get("optimization.fuse.active") is False:

File ~/.local/lib/python3.12/site-packages/dask/highlevelgraph.py:769, in HighLevelGraph.cull(self, keys)
    767 layer = self.layers[layer_name]
    768 if keys_set:
--> 769     culled_layer, culled_deps = layer.cull(keys_set, all_ext_keys)
    770     if not culled_deps:
    771         continue

File ~/.local/lib/python3.12/site-packages/dask/blockwise.py:736, in Blockwise.cull(self, keys, all_hlg_keys)
    734 output_blocks: set[tuple[int, ...]] = set()
    735 for key in keys:
--> 736     if key[0] == self.output:
    737         output_blocks.add(key[1:])
    738 culled_deps = self._cull_dependencies(output_blocks)

KeyboardInterrupt:

The slowdown seems to ultimately be coming from calculating pet_pm; maybe I'm having an issue with xclim?

Copy link
Copy Markdown
Collaborator

@nicolejkeeney nicolejkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this error from the following cell:

# Call xclim.indicies.potential_evapotranspiration on the entire dataset as it is
pet_pm = xclim.indices.potential_evapotranspiration(
    tasmin=tasmin.t2min,
    tasmax=tasmax.t2max,
    hurs=hurs_frac.rh,
    rsds=sw_dwn.sw_dwn,
    rsus=rsus_daily.swupb,
    rlds=lw_dwn.lw_dwn,
    rlus=rlus_daily.lwupb,
    sfcWind=wspd10mean.wspd10mean,
    method="FAO_PM98",
)

Error:

ValueError: potential_evapotranspiration could not find latitude coordinate in DataArray. Try passing it explicitly (`lat=ds.lat`).

@acordonez
Copy link
Copy Markdown
Contributor

Getting this error from the following cell:

# Call xclim.indicies.potential_evapotranspiration on the entire dataset as it is
pet_pm = xclim.indices.potential_evapotranspiration(
    tasmin=tasmin.t2min,
    tasmax=tasmax.t2max,
    hurs=hurs_frac.rh,
    rsds=sw_dwn.sw_dwn,
    rsus=rsus_daily.swupb,
    rlds=lw_dwn.lw_dwn,
    rlus=rlus_daily.lwupb,
    sfcWind=wspd10mean.wspd10mean,
    method="FAO_PM98",
)

Error:

ValueError: potential_evapotranspiration could not find latitude coordinate in DataArray. Try passing it explicitly (`lat=ds.lat`).

I got this error, too, today. Small instance on the Hub. If I set lat=tasmin.lat I then get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[8], line 2
      1 # Call xclim.indicies.potential_evapotranspiration on the entire dataset as it is
----> 2 pet_pm = xclim.indices.potential_evapotranspiration(
      3     tasmin=tasmin.t2min,
      4     tasmax=tasmax.t2max,
      5     hurs=hurs_frac.rh,
      6     rsds=sw_dwn.sw_dwn,
      7     rsus=rsus_daily.swupb,
      8     rlds=lw_dwn.lw_dwn,
      9     rlus=rlus_daily.lwupb,
     10     sfcWind=wspd10mean.wspd10mean,
     11     method="FAO_PM98",
     12     lat=tasmin.lat).compute()

File <boltons.funcutils.FunctionBuilder-182>:2, in potential_evapotranspiration(tasmin, tasmax, tas, lat, hurs, rsds, rsus, rlds, rlus, sfcWind, pr, method, peta, petb)

File ~/.local/lib/python3.12/site-packages/xclim/core/units.py:1485, in declare_units.<locals>.dec.<locals>.wrapper(*args, **kwargs)
   1483 bound_args = sig.bind(*args, **kwargs)
   1484 for name, dim in units_by_name.items():
-> 1485     check_units(bound_args.arguments.get(name), dim)
   1487 out = func(*args, **kwargs)
   1489 _check_output_has_units(out)

File <boltons.funcutils.FunctionBuilder-35>:2, in check_units(val, dim)

File ~/.local/lib/python3.12/site-packages/xclim/core/options.py:161, in datacheck.<locals>._run_check(*args, **kwargs)
    159 @wraps(func)
    160 def _run_check(*args, **kwargs):
--> 161     return run_check(func, DATA_VALIDATION, *args, **kwargs)

File ~/.local/lib/python3.12/site-packages/xclim/core/options.py:139, in run_check(func, option, *args, **kwargs)
    119 r"""
    120 Run function and customize exception handling based on option.
    121 
   (...)
    136     If the function raises a ValidationError and the option is set to "raise".
    137 """
    138 try:
--> 139     func(*args, **kwargs)
    140 except ValidationError as err:
    141     raise_warn_or_log(err, OPTIONS[option], stacklevel=4)

File ~/.local/lib/python3.12/site-packages/xclim/core/units.py:1268, in check_units(val, dim)
   1266     val_units = str2pint(val)
   1267 else:
-> 1268     val_units = units2pint(val)
   1269 val_dim = val_units.dimensionality
   1271 if val_dim == expected:

File ~/.local/lib/python3.12/site-packages/xclim/core/units.py:176, in units2pint(value)
    174     metadata = None
    175 elif isinstance(value, dict):
--> 176     unit = value["units"]
    177     metadata = value.get("units_metadata", None)
    178 else:

KeyError: 'units'

@claalmve
Copy link
Copy Markdown
Contributor Author

Ok, I think my xclim version may be on a different version than you both. Let me debug this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants