diff --git a/doc/developer_guide/code_reviews.md b/doc/developer_guide/code_reviews.md new file mode 100644 index 00000000..f0e363a4 --- /dev/null +++ b/doc/developer_guide/code_reviews.md @@ -0,0 +1,25 @@ +# Code reviews + +Before anything is merged into the release branch (`RC_*`), we require that one reviewer accepts the code changes of a pull request. + +## How to do a code review + +* Checkout out pull request locally ([how to checkout a pull request locally](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally)) + +* Run tests locally + +* Go through code and see if it is readable and easy to understand + +* Not required, but often useful: test new features with your own data + +## Tips and expectations + +Doing a code review can be very challenging if you are unfamiliar with the process. Here is a set of documents which might provide a good resource on how to get started: + +https://github.com/google/eng-practices + +## Conventional comments + +The comments in a code review should be clear and constructive. + +A useful way of highlighting the intention of specific comments is to label them according to [conventional comments](https://conventionalcomments.org/). diff --git a/doc/developer_guide/code_reviews.rst b/doc/developer_guide/code_reviews.rst deleted file mode 100644 index 5e94507b..00000000 --- a/doc/developer_guide/code_reviews.rst +++ /dev/null @@ -1,38 +0,0 @@ -.. _code-reviews: - -Code reviews ------------------- - -Before anything is merged into the release branch (:code:`RC_*`), we require that one reviewer accepts the code changes of a pull request. - -============================ -How to do a code review -============================ - -* Checkout out pull request locally (`how to checkout a pull request locally `_) - -* Run tests locally - -* Go through code and see if it is readable and easy to understand - -* Not required, but often useful: test new features with your own data - - -============================ -Tips and expectations -============================ - - -Doing a code review can be very challenging if you are unfamiliar with the process. Here is a set of documents which might provide a good resource on how to get started: - -https://github.com/google/eng-practices - - -========================= -Conventional comments -========================= - -The comments in a code review should be clear and constructive. - -A useful way of highlighting the intention of specific comments is to label them according to `conventional comments `_. - diff --git a/doc/developer_guide/code_structure.rst b/doc/developer_guide/code_structure.md similarity index 63% rename from doc/developer_guide/code_structure.rst rename to doc/developer_guide/code_structure.md index 9ef5741a..3904c004 100644 --- a/doc/developer_guide/code_structure.rst +++ b/doc/developer_guide/code_structure.md @@ -1,74 +1,51 @@ -.. _code_structure: +(code_structure)= +# Code structure and key design concepts -Code structure and key design concepts --------------------------------------- - -================================== -Modules -================================== +## Modules **tobac** aims to provide a flexible and modular framework which can be seen as a toolbox to create tracking algorithms according to the user's specific research needs. The **tobac** package currently consists of three **main modules**: -1. The :py:mod:`tobac.feature_detection` contains methods to identify objects (*features*) in 2D or 3D (3D or 4D when including the time dimensions) gridded data. This is done by identifying contiguous regions above or below one or multiple user-defined thresholds. The module makes use of :py:mod:`scipy.ndimage.label`, a generic image processing method that labels features in an array. The methods in :py:mod:`tobac.feature_detection` are high-level functions that enable a fast and effective feature detection and create easy-to-use output in form of a :py:mod:`pandas.DataFrame` that contains the coordinates and some basic information on each detected feature. The most high-level methods that is commonly used by users is :py:func:`tobac.feature_detection_multithreshold`. +1. The {py:mod}`tobac.feature_detection` contains methods to identify objects (*features*) in 2D or 3D (3D or 4D when including the time dimensions) gridded data. This is done by identifying contiguous regions above or below one or multiple user-defined thresholds. The module makes use of {py:mod}`scipy.ndimage.label`, a generic image processing method that labels features in an array. The methods in {py:mod}`tobac.feature_detection` are high-level functions that enable a fast and effective feature detection and create easy-to-use output in form of a {py:mod}`pandas.DataFrame` that contains the coordinates and some basic information on each detected feature. The most high-level methods that is commonly used by users is {py:func}`tobac.feature_detection_multithreshold`. -2. The :py:mod:`tobac.segmentation` module contains methods to define the extent of the identified feature areas or volumes. This step is needed to create a mask of the identified features because the feature detection currently only saves the center points of the features. The segmentation procedure is performed by using the watershedding method, but more methods are to be implemented in the future. Just as the feature detection, this module can handle both 2D and 3D data. +2. The {py:mod}`tobac.segmentation` module contains methods to define the extent of the identified feature areas or volumes. This step is needed to create a mask of the identified features because the feature detection currently only saves the center points of the features. The segmentation procedure is performed by using the watershedding method, but more methods are to be implemented in the future. Just as the feature detection, this module can handle both 2D and 3D data. -3. The :py:mod:`tobac.tracking` module is responsible for linking identified features over time. This module makes primarily use of the python package :py:mod:`trackpy`. Note that the linking using :py:mod:`trackpy` is based on particle tracking principles which means that only the feature center positions (not the entire area or volume associated with each feature) are needed to link features over time. Other methods such as tracking based on overlapping areas from the segmented features are to be implemented. +3. The {py:mod}`tobac.tracking` module is responsible for linking identified features over time. This module makes primarily use of the python package {py:mod}`trackpy`. Note that the linking using {py:mod}`trackpy` is based on particle tracking principles which means that only the feature center positions (not the entire area or volume associated with each feature) are needed to link features over time. Other methods such as tracking based on overlapping areas from the segmented features are to be implemented. In addition to the main modules, there are three **postprocessing modules**: -4. The :py:mod:`tobac.merge_split` module provides functionality to identify mergers and splitters in the tracking output and to add labels such that one can reconstruct the parent and child tracks of each cell. +4. The {py:mod}`tobac.merge_split` module provides functionality to identify mergers and splitters in the tracking output and to add labels such that one can reconstruct the parent and child tracks of each cell. -5. The :py:mod:`tobac.analysis` module contains methods to analyze the tracking output and derive statistics about individual tracks as well as summary statistics of the entire populations of tracks or subsets of the latter. +5. The {py:mod}`tobac.analysis` module contains methods to analyze the tracking output and derive statistics about individual tracks as well as summary statistics of the entire populations of tracks or subsets of the latter. -6. The :py:mod:`tobac.plotting` module provides methods to visualize the tracking output, for example for creating maps and animations of identified features, segmented areas and tracks. +6. The {py:mod}`tobac.plotting` module provides methods to visualize the tracking output, for example for creating maps and animations of identified features, segmented areas and tracks. Finally, there are two modules that are primarily **important for developers**: -7. The :py:mod:`tobac.utils` module is a collection of smaller, not necessarily tracking-specific methods that facilitate and support the methods of the main modules. This module has multiple submodules. We separate methods that are rather generic and could also be practical for tobac users who build their own tracking algorithms (:py:mod:`tobac.utils.general`) and methods that mainly facilitate the development of **tobac** (:py:mod:`tobac.utils.internal`). Sometimes, new features come with the need of a whole set of new methods, so it could make sense to save these in their own submodule (see e.g. :py:mod:`tobac.periodic_boundaries`) +7. The {py:mod}`tobac.utils` module is a collection of smaller, not necessarily tracking-specific methods that facilitate and support the methods of the main modules. This module has multiple submodules. We separate methods that are rather generic and could also be practical for tobac users who build their own tracking algorithms ({py:mod}`tobac.utils.general`) and methods that mainly facilitate the development of **tobac** ({py:mod}`tobac.utils.internal`). Sometimes, new features come with the need of a whole set of new methods, so it could make sense to save these in their own submodule (see e.g. {py:mod}`tobac.periodic_boundaries`) -8. The :py:mod:`tobac.testing` module provides support for writing of unit tests. This module contains several methods to create simplified test data sets on which the various methods and parameters for feature detection, segmentation, and tracking can be tested. +8. The {py:mod}`tobac.testing` module provides support for writing of unit tests. This module contains several methods to create simplified test data sets on which the various methods and parameters for feature detection, segmentation, and tracking can be tested. For more information on each submodule, refer to the respective source code documentation. One thing to note is that **tobac** as of now is purely functional. The plan is, however, to move towards a more object-oriented design with base classes for the main operations such as feature detection and tracking. +## Examples -======== -Examples -======== - -To help users get started with **tobac** and to demonstrate the various functionalities, **tobac** hosts several detailed and **illustrated examples** in the form of Jupyter notebooks. They are hosted under the directory `examples/` and be executed by the user. Our readthedocs page also hosts a rendered version of our examples as `gallery `_ +To help users get started with **tobac** and to demonstrate the various functionalities, **tobac** hosts several detailed and **illustrated examples** in the form of Jupyter notebooks. They are hosted under the directory `examples/` and be executed by the user. Our readthedocs page also hosts a rendered version of our examples as [gallery](https://tobac.readthedocs.io/en/latest/examples.html) +## Migrating to xarray and dask -============================ -Migrating to xarray and dask -============================ - -Currently, **tobac** uses `iris cubes `_ as the +Currently, **tobac** uses [iris cubes](https://scitools-iris.readthedocs.io/en/latest/userguide/iris_cubes.html) as the primary data container. However, we are currently working on migrating the source code to -`xarray `_ such that all internal functions are based on `xr.DataArray -objects `_. +[xarray](https://docs.xarray.dev/en/stable/) such that all internal functions are based on [xr.DataArray +objects](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html). To ensure a robust transition from **iris** to **xarray**, we make use of various decorators that convert input and -output data for the main functions without changing their actual code. These decorators are located in the `decorator -submodule `_. - -In addition, one of our main goals for the future is to fully support `dask `_, in order to scale -to large datasets and enable parallelization. - - - - - - - - - - - - +output data for the main functions without changing their actual code. These decorators are located in the [decorator +submodule](https://github.com/tobac-project/tobac/blob/main/tobac/utils/decorators.py). +In addition, one of our main goals for the future is to fully support [dask](https://www.dask.org/), in order to scale +to large datasets and enable parallelization. \ No newline at end of file diff --git a/doc/developer_guide/testing_sphinx-based_rendering.md b/doc/developer_guide/testing_sphinx-based_rendering.md new file mode 100644 index 00000000..df50024b --- /dev/null +++ b/doc/developer_guide/testing_sphinx-based_rendering.md @@ -0,0 +1,139 @@ +(testing-sphinx-rendering)= +# How to check the Sphinx-based rendering + +The workflow has been tested in a linux system. We aim to build a static +website out of the documentation material present in `tobac`. + +## 1. Preparing the Local Environment + +- **choose a separate place for your testing** + + I will use the temporary directory `/tmp/website-testing` which I + need to create. You can use a dedicated place of your choice … + + ```bash + > mkdir /tmp/website-testing + > cd /tmp/website-testing + ``` + + I will indicate my position now with the `/tmp/website-testing>` + prompt. + +- **get the official repository** + + ```bash + /tmp/website-testing> git clone https://github.com/tobac-project/tobac + ``` + + You might like to test a certain remote branch `` then do: + + ```bash + /tmp/website-testing> cd tobac + /tmp/website-testing/tobac> git fetch --all + /tmp/website-testing/tobac> git checkout -t origin/ + /tmp/website-testing/tobac> cd .. + ``` + +- **Python environment** + + - create a python virtual env + + ```bash + /tmp/website-testing> python -m venv .python3-venv + ``` + + - and install requirements + + ```bash + # deactivation conda is only necessary if your loaded conda before … + /tmp/website-testing> conda deactivate + + # activate the new env and upgrade ``pip`` + /tmp/website-testing> source .python3-venv/bin/activate + /tmp/website-testing> pip install --upgrade pip + + # now everything is installed into the local python env! + /tmp/website-testing> pip install -r tobac/doc/requirements.txt + ``` + + `pip`-based installation takes a bit of time, but is much faster than `conda`. + + +If the installation runs without problems, you are ready to build the website. + +## 2. Building the Website + +Actually, only few steps are needed to build the website, i.e. + +- **running sphinx for rendering** + + ```bash + /tmp/website-testing> cd tobac + + /tmp/website-testing/tobac> sphinx-build -b html doc doc/_build/html + ``` + + If no severe error appeared + +- **view the HTML content** + + ```bash + /tmp/website-testing/tobac> firefox doc/_build/html/index.html + ``` + +## 3. Parsing Your Local Changes + +Now, we connect to your locally hosted `tobac` repository and your +development branch. + +- **connect to your local repo**: Assume your repo is located at + `/tmp/tobac-testing/tobac`, then add a new remote alias and fetch + all content with + + ```bash + /tmp/website-testing/tobac> git remote add local-repo /tmp/tobac-testing/tobac + /tmp/website-testing/tobac> git fetch --all + ``` + +- **check your development branch out**: Now, assume the your + development branch is called `my-devel`, then do + + ```bash + # to get a first overview on available branches + /tmp/website-testing/tobac> git branch --all + + # and then actually get your development branch + /tmp/website-testing/tobac> git checkout -b my-devel local-repo/my-devel + ``` + + You should see your developments, now … + +- **build and view website again** + + ```bash + /tmp/website-testing/tobac> sphinx-build -M clean doc doc/_build + /tmp/website-testing/tobac> sphinx-build -b html doc doc/_build/html + /tmp/website-testing/tobac> firefox _build/html/index.html + ``` + +## Option: Check Rendering of a Pull requests + +- **check the pull request out**: Now, assume the PR has the ID `` and you define the branch name `BRANCH_NAME` as you like + + ```bash + # to get PR shown as dedicated branch + /tmp/website-testing/tobac> git fetch upstream pull/ID/head:BRANCH_NAME + + # and then actually get this PR as branch + /tmp/website-testing/tobac> git checkout BRANCH_NAME + ``` + + You should see the PR now ... + +- **build and view website again** + + ```bash + /tmp/website-testing/tobac> sphinx-build -M clean doc doc/_build + /tmp/website-testing/tobac> sphinx-build -b html doc doc/_build/html + /tmp/website-testing/tobac> firefox _build/html/index.html + ``` diff --git a/doc/developer_guide/testing_sphinx-based_rendering.rst b/doc/developer_guide/testing_sphinx-based_rendering.rst deleted file mode 100644 index 379e42ac..00000000 --- a/doc/developer_guide/testing_sphinx-based_rendering.rst +++ /dev/null @@ -1,155 +0,0 @@ -.. _testing-sphinx-rendering: - -How to check the Sphinx-based rendering ---------------------------------------- - - -The workflow has been tested in a linux system. We aim to build a static -website out of the documentation material present in ``tobac``. - -================================== -1. Preparing the Local Environment -================================== - -- **choose a separate place for your testing** - - I will use the temporary directory ``/tmp/website-testing`` which I - need to create. You can use a dedicated place of your choice … - - .. code:: bash - - > mkdir /tmp/website-testing - > cd /tmp/website-testing - - I will indicate my position now with the ``/tmp/website-testing>`` - prompt. - -- **get the official repository** - - .. code:: bash - - /tmp/website-testing> git clone https://github.com/tobac-project/tobac - - You might like to test a certain remote branch ```` then do: - - .. code:: bash - - /tmp/website-testing> cd tobac - /tmp/website-testing/tobac> git fetch --all - /tmp/website-testing/tobac> git checkout -t origin/ - /tmp/website-testing/tobac> cd .. - -- **Python environment** - - - create a python virtual env - - .. code:: bash - - /tmp/website-testing> python -m venv .python3-venv - - - - and install requirements - - .. code:: bash - - # deactivation conda is only necessary if your loaded conda before … - /tmp/website-testing> conda deactivate - - # activate the new env and upgrade ``pip`` - /tmp/website-testing> source .python3-venv/bin/activate - /tmp/website-testing> pip install --upgrade pip - - # now everything is installed into the local python env! - /tmp/website-testing> pip install -r tobac/doc/requirements.txt - - `pip`-based installation takes a bit of time, but is much faster than `conda`. - - -If the installation runs without problems, you are ready to build the website. - - -================================== -1. Building the Website -================================== - -Actually, only few steps are needed to build the website, i.e. - -- **running sphinx for rendering** - - .. code:: bash - - /tmp/website-testing> cd tobac - - /tmp/website-testing/tobac> sphinx-build -b html doc doc/_build/html - - If no severe error appeared - -- **view the HTML content** - - .. code:: bash - - /tmp/website-testing/tobac> firefox doc/_build/html/index.html - -================================== -3. Parsing Your Local Changes -================================== - -Now, we connect to your locally hosted ``tobac`` repository and your -development branch. - -- **connect to your local repo**: Assume your repo is located at - ``/tmp/tobac-testing/tobac``, then add a new remote alias and fetch - all content with - - .. code:: bash - - /tmp/website-testing/tobac> git remote add local-repo /tmp/tobac-testing/tobac - /tmp/website-testing/tobac> git fetch --all - -- **check your development branch out**: Now, assume the your - development branch is called ``my-devel``, then do - - .. code:: bash - - # to get a first overview on available branches - /tmp/website-testing/tobac> git branch --all - - # and then actually get your development branch - /tmp/website-testing/tobac> git checkout -b my-devel local-repo/my-devel - - You should see your developments, now … - -- **build and view website again** - - .. code:: bash - - /tmp/website-testing/tobac> sphinx-build -M clean doc doc/_build - /tmp/website-testing/tobac> sphinx-build -b html doc doc/_build/html - /tmp/website-testing/tobac> firefox _build/html/index.html - - -========================================== -Option: Check Rendering of a Pull requests -========================================== - -- **check the pull request out**: Now, assume the PR has the ID ```` and you define the branch name ``BRANCH_NAME`` as you like - - .. code:: bash - - # to get PR shown as dedicated branch - /tmp/website-testing/tobac> git fetch upstream pull/ID/head:BRANCH_NAME - - # and then actually get this PR as branch - /tmp/website-testing/tobac> git checkout BRANCH_NAME - - You should see the PR now ... - -- **build and view website again** - - .. code:: bash - - /tmp/website-testing/tobac> sphinx-build -M clean doc doc/_build - /tmp/website-testing/tobac> sphinx-build -b html doc doc/_build/html - /tmp/website-testing/tobac> firefox _build/html/index.html - - diff --git a/doc/getting_started/data_input.rst b/doc/getting_started/data_input.md similarity index 62% rename from doc/getting_started/data_input.rst rename to doc/getting_started/data_input.md index 675fc159..2eae539d 100644 --- a/doc/getting_started/data_input.rst +++ b/doc/getting_started/data_input.md @@ -1,15 +1,10 @@ -.. _data_input: - -Data Input -========== +(data_input)= +# Data Input Input data for tobac should consist of one or more fields on a common, regular grid with a time dimension and two or three spatial dimensions. The input data can also include latitude and longitude coordinates, either as 1-d or 2-d variables depending on the grid used. As of version 1.6 of tobac, xarray DataArrays are the default format for input fields, with all internal operations performed using DataArrays. Backward compatibility with Iris Cube input is maintained using a conversion wrapper. Workflows using Iris should produce identical results to previous versions, but moving forward xarray is the recommended data format. -======= -3D Data -======= - -As of *tobac* version 1.5.0, 3D data are now fully supported for feature detection, tracking, and segmentation. Similar to how *tobac* requires some information on the horizontal grid spacing of the data (e.g., through the :code:`dxy` parameter), some information on the vertical grid spacing is also required. This is documented in detail in the API docs, but briefly, users must specify either :code:`dz`, where the grid has uniform grid spacing, or users must specify :code:`vertical_coord`, where :code:`vertical_coord` is the name of the coordinate representing the vertical, with the same units as :code:`dxy`. +## 3D Data +As of *tobac* version 1.5.0, 3D data are now fully supported for feature detection, tracking, and segmentation. Similar to how *tobac* requires some information on the horizontal grid spacing of the data (e.g., through the `dxy` parameter), some information on the vertical grid spacing is also required. This is documented in detail in the API docs, but briefly, users must specify either `dz`, where the grid has uniform grid spacing, or users must specify `vertical_coord`, where `vertical_coord` is the name of the coordinate representing the vertical, with the same units as `dxy`. \ No newline at end of file diff --git a/doc/getting_started/feature_detection_overview.md b/doc/getting_started/feature_detection_overview.md new file mode 100644 index 00000000..283031ca --- /dev/null +++ b/doc/getting_started/feature_detection_overview.md @@ -0,0 +1,18 @@ +(feature_detection_overview)= +# Feature Detection Basics + +The feature detection is the first step in using *tobac*. + +## Currently Implemented Feature Detection Methods + +### Multiple thresholds + +Features are identified as regions above or below a sequence of subsequent thresholds (if searching for either maxima or minima in the data). Subsequently more restrictive threshold values are used to further refine the resulting features and allow for separation of features that are connected through a continuous region of less restrictive threshold values. + +```{image} ./images/detection_multiplethresholds.png +:width: 500px +``` + +### Current development + +We are currently working on additional methods for the identification of cloud features in different types of datasets. Some of these methods are specific to the input data such a combination of different channels from specific satellite imagers. Some of these methods will combine the feature detection and segmentations step in one single algorithm. diff --git a/doc/getting_started/feature_detection_overview.rst b/doc/getting_started/feature_detection_overview.rst deleted file mode 100644 index 6e18b618..00000000 --- a/doc/getting_started/feature_detection_overview.rst +++ /dev/null @@ -1,18 +0,0 @@ -.. _feature_detection_overview: - -Feature Detection Basics ------------------------- - -The feature detection is the first step in using **tobac**. - -**Currently implemented methods:** - - **Multiple thresholds:** - - Features are identified as regions above or below a sequence of subsequent thresholds (if searching for eather maxima or minima in the data). Subsequently more restrictive threshold values are used to further refine the resulting features and allow for separation of features that are connected through a continuous region of less restrictive threshold values. - - .. image:: ./images/detection_multiplethresholds.png - :width: 500 px - -**Current development:** -We are currently working on additional methods for the identification of cloud features in different types of datasets. Some of these methods are specific to the input data such a combination of different channels from specific satellite imagers. Some of these methods will combine the feature detection and segmentations step in one single algorithm. diff --git a/doc/getting_started/get_help.md b/doc/getting_started/get_help.md index 98a40e6d..2c73b19a 100644 --- a/doc/getting_started/get_help.md +++ b/doc/getting_started/get_help.md @@ -4,6 +4,6 @@ *tobac* can be challenging to use, especially for those not familiar with the Python/SciPy ecosystem. While we strive for robust documentation, this is sometimes not enough or not helpful. We've created this set of pages to help guide you to the best ways to get help. -We encourage you to look through the [Example Gallery](../examples/index) and the [Frequently Asked Questions](./faqs) pages to see if either of those can help resolve your issue. +We encourage you to look through the [Example Gallery](../examples/index) pages to see if it can help resolve your issue. The developers of *tobac*, from the lead developers to individual contributors, want you to have success in your research and in using this package. We **strongly encourage** you to reach out if you encounter any difficulty. You can post questions to the *tobac* GitHub, or you can e-mail any of the core *tobac* developers for assistance. \ No newline at end of file diff --git a/doc/getting_started/merge_split.md b/doc/getting_started/merge_split.md new file mode 100644 index 00000000..a19a4bf6 --- /dev/null +++ b/doc/getting_started/merge_split.md @@ -0,0 +1,41 @@ +# Merge and Split + +This submodule is a post processing step to address tracked cells which merge/split. +The first iteration of this module is to combine the cells which are merging but have received a new cell id (and are considered a new cell) once merged. +This module uses a minimum euclidian spanning tree to combine merging cells, thus the postfix for the function is MEST. +This submodule will label merged/split cells with a TRACK number in addition to its CELL number. + +Features, cells, and tracks are combined using parent/child nomenclature. +(quick note on terms; "feature" is a detected object at a single time step (see {doc}`feature_detection_overview`). "cell" is a series of features linked together over multiple timesteps (see {doc}`linking`). "track" may be an individual cell or series of cells which have merged and/or split.) + +## Overview of the output dataframe from merge_split + +`d: xarray.core.dataset.Dataset` + +xarray dataset of tobac merge/split cells with parent and child designations. + +Parent/child variables include: + +* `cell_parent_track_id`: The associated track id for each cell. All cells that have merged or split will have the same parent track id. If a cell never merges/splits, only one cell will have a particular track id. + +* `feature_parent_cell_id`: The associated parent cell id for each feature. All feature in a given cell will have the same cell id. + +* `feature_parent_track_id`: The associated parent track id for each feature. This is not the same as the cell id number. + +* `track_child_cell_count`: The total number of features belonging to all child cells of a given track id. + +* `cell_child_feature_count`: The total number of features for each cell. + + +Example usage: + +`d = merge_split_MEST(Track)` + +merge_split outputs an `xarray` dataset with several variables. The variables, (with column names listed in the `Variable Name` column), are described below with units. Coordinates and dataset dimensions are Feature, Cell, and Track. + +Variables that are unique to merge/split output files: +```{csv-table} tobac Merge_Split Track Output Variables +:file: ./merge_split_out_vars.csv +:widths: 3, 35, 3, 3 +:header-rows: 1 +``` \ No newline at end of file diff --git a/doc/getting_started/merge_split.rst b/doc/getting_started/merge_split.rst deleted file mode 100644 index b6b56cbd..00000000 --- a/doc/getting_started/merge_split.rst +++ /dev/null @@ -1,43 +0,0 @@ -Merge and Split -====================== - -This submodule is a post processing step to address tracked cells which merge/split. -The first iteration of this module is to combine the cells which are merging but have received a new cell id (and are considered a new cell) once merged. -This module uses a minimum euclidian spanning tree to combine merging cells, thus the postfix for the function is MEST. -This submodule will label merged/split cells with a TRACK number in addition to its CELL number. - -Features, cells, and tracks are combined using parent/child nomenclature. -(quick note on terms; “feature” is a detected object at a single time step (see :doc:`feature_detection_overview`). “cell” is a series of features linked together over multiple timesteps (see :doc:`linking`). "track" may be an individual cell or series of cells which have merged and/or split.) - -Overview of the output dataframe from merge_split - -d : `xarray.core.dataset.Dataset` - -xarray dataset of tobac merge/split cells with parent and child designations. - -Parent/child variables include: - -* cell_parent_track_id: The associated track id for each cell. All cells that have merged or split will have the same parent track id. If a cell never merges/splits, only one cell will have a particular track id. - -* feature_parent_cell_id: The associated parent cell id for each feature. All feature in a given cell will have the same cell id. - -* feature_parent_track_id: The associated parent track id for each feature. This is not the same as the cell id number. - -* track_child_cell_count: The total number of features belonging to all child cells of a given track id. - -* cell_child_feature_count: The total number of features for each cell. - - -Example usage: - -``d = merge_split_MEST(Track)`` - -merge_split outputs an `xarray` dataset with several variables. The variables, (with column names listed in the `Variable Name` column), are described below with units. Coordinates and dataset dimensions are Feature, Cell, and Track. - -Variables that are common to all feature detection files: - -.. csv-table:: tobac Merge_Split Track Output Variables - :file: ./merge_split_out_vars.csv - :widths: 3, 35, 3, 3 - :header-rows: 1 - diff --git a/doc/getting_started/plotting.md b/doc/getting_started/plotting.md new file mode 100644 index 00000000..d97906e8 --- /dev/null +++ b/doc/getting_started/plotting.md @@ -0,0 +1,3 @@ +# Plotting + +*tobac* provides functions to conveniently visualise the tracking results and analyses. \ No newline at end of file diff --git a/doc/getting_started/plotting.rst b/doc/getting_started/plotting.rst deleted file mode 100644 index c23d4458..00000000 --- a/doc/getting_started/plotting.rst +++ /dev/null @@ -1,6 +0,0 @@ -.. _plotting: - -Plotting -======== - -tobac provides functions to conveniently visualise the tracking results and analyses. \ No newline at end of file diff --git a/doc/getting_started/segmentation.md b/doc/getting_started/segmentation.md new file mode 100644 index 00000000..030167bb --- /dev/null +++ b/doc/getting_started/segmentation.md @@ -0,0 +1,12 @@ +# Segmentation + +The segmentation step aims at associating cloud areas (2D data) or cloud volumes (3D data) with the identified and tracked features. + +## Currently implemented methods + +### Watershedding in 2D +Markers are set at the position of the individual feature positions identified in the detection step. Then watershedding with a fixed threshold is used to determine the area around each feature above/below that threshold value. This results in a mask with the feature id at all pixels identified as part of the clouds and zeros in all cloud free areas. + +### Watershedding in 3D +Markers are set in the entire column above the individual feature positions identified in the detection step. Then watershedding with a fixed threshold is used to determine the volume around each feature above/below that threshold value. This results in a mask with the feature id at all voxels identified as part of the clouds and zeros in all cloud free areas. + diff --git a/doc/getting_started/segmentation.rst b/doc/getting_started/segmentation.rst deleted file mode 100644 index 4260cc69..00000000 --- a/doc/getting_started/segmentation.rst +++ /dev/null @@ -1,12 +0,0 @@ -Segmentation ----------------- -The segmentation step aims at associating cloud areas (2D data) or cloud volumes (3D data) with the identified and tracked features. - -**Currently implemented methods:** - - **Watershedding in 2D:** - Markers are set at the position of the individual feature positions identified in the detection step. Then watershedding with a fixed threshold is used to determine the area around each feature above/below that threshold value. This results in a mask with the feature id at all pixels identified as part of the clouds and zeros in all cloud free areas. - - **Watershedding in 3D:** - Markers are set in the entire column above the individual feature positions identified in the detection step. Then watershedding with a fixed threshold is used to determine the volume around each feature above/below that threshold value. This results in a mask with the feature id at all voxels identified as part of the clouds and zeros in all cloud free areas. - diff --git a/doc/userguide/analysis.rst b/doc/userguide/analysis.md similarity index 76% rename from doc/userguide/analysis.rst rename to doc/userguide/analysis.md index 34074650..d8ecfd8a 100644 --- a/doc/userguide/analysis.rst +++ b/doc/userguide/analysis.md @@ -1,5 +1,4 @@ -.. _analysis-functions: +(analysis-functions)= +# Analysis -Analysis -======== -tobac provides several analysis functions that allow for the calculation of important quantities based on the tracking results. This includes the calculation of properties such as feature lifetimes and feature areas/volumes, but also allows for a convenient calculation of statistics for arbitrary fields of the same shape as as the input data used for the tracking analysis. +tobac provides several analysis functions that allow for the calculation of important quantities based on the tracking results. This includes the calculation of properties such as feature lifetimes and feature areas/volumes, but also allows for a convenient calculation of statistics for arbitrary fields of the same shape as as the input data used for the tracking analysis. \ No newline at end of file diff --git a/doc/userguide/big_datasets.md b/doc/userguide/big_datasets.md new file mode 100644 index 00000000..1585f0ca --- /dev/null +++ b/doc/userguide/big_datasets.md @@ -0,0 +1,44 @@ +(handling-big-datasets)= +# Handling Large Datasets + +Often, one desires to use *tobac* to identify and track features in large datasets ("big data"). This documentation strives to suggest various methods for doing so efficiently. Current versions of *tobac* do not support out-of-core (e.g., `dask`) computation, meaning that these strategies may need to be employed for both computational and memory reasons. + +(Split Feature Detection)= +## Split Feature Detection and Run in Parallel + +Current versions of threshold feature detection (see {doc}`./feature_detection/index`) are time independent, meaning that one can easily parallelize feature detection across all times (although not across space). *tobac* provides the {meth}`tobac.utils.general.combine_feature_dataframes` function to combine a list of dataframes produced by a parallelization method (such as `jug`, `multiprocessing.pool`, or `dask.bag`) into a single combined dataframe suitable to perform tracking with. + +Below is a snippet from a larger notebook demonstrating how to run feature detection in parallel ({doc}`../examples/big_data_processing/parallel_processing_tobac`): +```python +# build list of tracked variables using Dask.Bag + +b = db.from_sequence( + [ + combined_ds["data"][x : x + 1] + for x in range(len(combined_ds["time"])) + ], + npartitions=1, +) +out_feature_dfs = db.map( + lambda x: tobac.feature_detection_multithreshold( + x, 4000, **parameters_features + ), + b, +).compute() + +combined_dataframes = tobac.utils.general.combine_feature_dataframes(out_feature_dfs) +``` + +(Split Segmentation)= +## Split Segmentation and Run in Parallel + +Recall that the segmentation mask (see {doc}`segmentation_output`) is the same size as the input grid, which results in large files when handling large input datasets. The following strategies can help reduce the output size and make segmentation masks more useful for the analysis. + +The first strategy is to only segment on features *after tracking and quality control*. While this will not directly impact performance, waiting to run segmentation on the final set of features (after discarding, e.g., non-tracked cells) can make analysis of the output segmentation dataset easier. + +To enhance the speed at which segmentation runs, one can process multiple segmentation times in parallel independently, similar to feature detection. Unlike feature detection, however, there is currently no built-in *tobac* method to combine multiple segmentation times into a single file. While one can do this using typical NetCDF tools such as `nccat` or with xarray utilities such as `xr.concat`, you can also leave the segmentation mask output as separate files, opening them later with multiple file retrievals such as `xr.open_mfdataset`. + +(Tracking Hanging)= +## Tracking Hangs with too many Features + +When tracking on a large dataset, {meth}`tobac.tracking.linking_trackpy` can hang using the default parameters. This is due to the tracking library `trackpy` searching for the next timestep's feature in too large of an area. This can be solved *without impact to scientific output* by lowering the `subnetwork_size` parameter in {meth}`tobac.tracking.linking_trackpy`. \ No newline at end of file diff --git a/doc/userguide/big_datasets.rst b/doc/userguide/big_datasets.rst deleted file mode 100644 index 6621b414..00000000 --- a/doc/userguide/big_datasets.rst +++ /dev/null @@ -1,57 +0,0 @@ -.. _handling-big-datasets: - -Handling Large Datasets -------------------------------------- - -Often, one desires to use *tobac* to identify and track features in large datasets ("big data"). This documentation strives to suggest various methods for doing so efficiently. Current versions of *tobac* do not support out-of-core (e.g., :code:`dask`) computation, meaning that these strategies may need to be employed for both computational and memory reasons. - -.. _Split Feature Detection: - -======================= -Split Feature Detection and Run in Parallel -======================= -Current versions of threshold feature detection (see :doc:`feature_detection_overview`) are time independent, meaning that one can easily parallelize feature detection across all times (although not across space). *tobac* provides the :py:meth:`tobac.utils.combine_feature_dataframes` function to combine a list of dataframes produced by a parallelization method (such as :code:`jug`, :code:`multiprocessing.pool`, or :code:`dask.bag`) into a single combined dataframe suitable to perform tracking with. - -Below is a snippet from a larger notebook demonstrating how to run feature detection in parallel ( :doc:`big_datasets_examples/notebooks/parallel_processing_tobac`): - -:: - - # build list of tracked variables using Dask.Bag - - b = db.from_sequence( - [ - combined_ds["data"][x : x + 1] - for x in range(len(combined_ds["time"])) - ], - npartitions=1, - ) - out_feature_dfs = db.map( - lambda x: tobac.feature_detection_multithreshold( - x, 4000, **parameters_features - ), - b, - ).compute() - - combined_dataframes = tobac.utils.general.combine_feature_dataframes(out_feature_dfs) - - -.. _Split Segmentation: - -====================================== -Split Segmentation and Run in Parallel -====================================== -Recall that the segmentation mask (see :doc:`segmentation_output`) is the same size as the input grid, which results in large files when handling large input datasets. The following strategies can help reduce the output size and make segmentation masks more useful for the analysis. - -The first strategy is to only segment on features *after tracking and quality control*. While this will not directly impact performance, waiting to run segmentation on the final set of features (after discarding, e.g., non-tracked cells) can make analysis of the output segmentation dataset easier. - -To enhance the speed at which segmentation runs, one can process multiple segmentation times in parallel independently, similar to feature detection. Unlike feature detection, however, there is currently no built-in *tobac* method to combine multiple segmentation times into a single file. While one can do this using typical NetCDF tools such as :code:`nccat` or with xarray utilities such as :code:`xr.concat`, you can also leave the segmentation mask output as separate files, opening them later with multiple file retrievals such as :code:`xr.open_mfdataset`. - - -.. _Tracking Hanging: - -===================================== -Tracking Hangs with too many Features -===================================== - -When tracking on a large dataset, :code:`tobac.tracking.linking_trackpy` can hang using the default parameters. This is due to the tracking library :code:`trackpy` searching for the next timestep's feature in too large of an area. This can be solved *without impact to scientific output* by lowering the :code:`subnetwork_size` parameter in :code:`tobac.tracking.linking_trackpy`. - diff --git a/doc/userguide/bulk_statistics/index.rst b/doc/userguide/bulk_statistics/index.md similarity index 58% rename from doc/userguide/bulk_statistics/index.rst rename to doc/userguide/bulk_statistics/index.md index 461cd45b..abfc5d4e 100644 --- a/doc/userguide/bulk_statistics/index.rst +++ b/doc/userguide/bulk_statistics/index.md @@ -1,17 +1,16 @@ -.. _bulk-statistics: - -########################## - Compute bulk statistics -########################## +(bulk-statistics)= +# Compute bulk statistics Bulk statistics allow for a wide range of properties of detected objects to be calculated during feature detection and segmentation or as a postprocessing step. -The :py:meth:`tobac.utils.bulk_statistics.get_statistics_from_mask` function applies one or more functions over one or more data fields for each detected object. +The {py:meth}`tobac.utils.bulk_statistics.get_statistics_from_mask` function applies one or more functions over one or more data fields for each detected object. For example, one could calculate the convective mass flux for each detected feature by providing fields of vertical velocity, cloud water content and area. Numpy-like broadcasting is supported, allowing 2D and 3D data to be combined. -.. toctree:: - :maxdepth: 1 - notebooks/compute_statistics_during_feature_detection - notebooks/compute_statistics_during_segmentation - notebooks/compute_statistics_postprocessing_example +```{toctree} +:maxdepth: 1 + +notebooks/compute_statistics_during_feature_detection +notebooks/compute_statistics_during_segmentation +notebooks/compute_statistics_postprocessing_example +``` \ No newline at end of file diff --git a/doc/userguide/publications.md b/doc/userguide/publications.md new file mode 100644 index 00000000..41e3d65a --- /dev/null +++ b/doc/userguide/publications.md @@ -0,0 +1,44 @@ +(Refereed-Publications)= +# Refereed Publications Using *tobac* + +**List of peer-reviewed publications in which tobac has been used:** + +--- +```{list-table} +:widths: 30 +:class: wy-table-responsive + +* - Sokolowsky, G. A., Freeman, S. W., Jones, W. K., Kukulies, J., Senf, F., Marinescu, P. J., Heikenfeld, M., Brunner, K. N., Bruning, E. C., Collis, S. M., Jackson, R. C., Leung, G. R., Pfeifer, N., Raut, B. A., Saleeby, S. M., Stier, P., and van den Heever, S. C.: tobac v1.5 (2024). Introducing Fast 3D Tracking, Splits and Mergers, and Other Enhancements for Identifying and Analysing Meteorological Phenomena. Geoscientific Model Development, 17(13), 5309-5330. https://doi.org/10.5194/gmd-17-5309-2024. + +* - Heikenfeld, M., Marinescu, P. J., Christensen, M., Watson-Parris, D., Senf, F., van den Heever, S. C., and Stier, P. (2019). tobac 1.2: towards a flexible framework for tracking and analysis of clouds in diverse datasets, Geosci. Model Dev., 12, 4551–4570, https://doi.org/10.5194/gmd-12-4551-2019 + +* - Bukowski, J., & van den Heever, S. C. (2021). Direct radiative effects in haboobs. *Journal of Geophysical Research: Atmospheres*, 126(21), e2021JD034814, doi:10.1029/2021JD034814. + +* - Bukowski, J. (2021). Mineral Dust Lofting and Interactions with Cold Pools (Doctoral dissertation, Colorado State University). + +* - Heikenfeld, M. (2019). Aerosol effects on microphysical processes and deep convective clouds (Doctoral dissertation, University of Oxford). + +* - Kukulies, J., Chen, D., & Curio, J. (2021). The role of mesoscale convective systems in precipitation in the Tibetan Plateau region. *Journal of Geophysical Research: Atmospheres*, 126(23), e2021JD035279. doi:10.1029/2021JD035279. + +* - Kukulies, J., Lai, H. W., Curio, J., Feng, Z., Lin, C., Li, P., Ou, T., Sugimoto, S. & Chen, D. (2023). Mesoscale convective systems in the Third pole region: Characteristics, mechanisms and impact on precipitation. *Frontiers in Earth Science*, 11, 1143380. + +* - Li, Y., Liu, Y., Chen, Y., Chen, B., Zhang, X., Wang, W. & Huo, Z. (2021). Characteristics of Deep Convective Systems and Initiation during Warm Seasons over China and Its Vicinity. *Remote Sensing*, 13(21), 4289. doi:10.3390/rs13214289. + +* - Leung, G. R., Saleeby, S. M., Sokolowsky, G. A., Freeman, S. W., & van den Heever, S. C. (2023). Aerosol–cloud impacts on aerosol detrainment and rainout in shallow maritime tropical clouds. *Atmospheric Chemistry and Physics*, 23(9), 5263-5278. + +* - Marinescu, P. J., Van Den Heever, S. C., Heikenfeld, M., Barrett, A. I., Barthlott, C., Hoose, C., Fan, J., Fridlind, A. M., Matsui, T., Miltenberger, A. K., Stier, P., Vie, B., White, B. A., & Zhang, Y. (2021). Impacts of varying concentrations of cloud condensation nuclei on deep convective cloud updrafts—a multimodel assessment. *Journal of the Atmospheric Sciences*, 78(4), 1147-1172, doi: 10.1175/JAS-D-20-0200.1. + +* - Marinescu, P. J. (2020). Observations of Aerosol Particles and Deep Convective Updrafts and the Modeling of Their Interactions (Doctoral dissertation, Colorado State University). + +* - Oue, M., Saleeby, S. M., Marinescu, P. J., Kollias, P., & van den Heever, S. C. (2022). Optimizing radar scan strategies for tracking isolated deep convection using observing system simulation experiments. *Atmospheric Measurement Techniques*, 15(16), 4931-4950. + +* - Raut, B. A., Jackson, R., Picel, M., Collis, S. M., Bergemann, M., & Jakob, C. (2021). An Adaptive Tracking Algorithm for Convection in Simulated and Remote Sensing Data. *Journal of Applied Meteorology and Climatology*, 60(4), 513-526, doi:10.1175/JAMC-D-20-0119.1. + +* - Whitaker, J. W. (2021). An Investigation of an East Pacific Easterly Wave Genesis Pathway and the Impact of the Papagayo and Tehuantepec Wind Jets on the East Pacific Mean State and Easterly Waves (Doctoral dissertation, Colorado State University). + +* - Zhang, X., Yin, Y., Kukulies, J., Li, Y., Kuang, X., He, C., .. & Chen, J. (2021). Revisiting Lightning Activity and Parameterization Using Geostationary Satellite Observations. *Remote Sensing*, 13(19), 3866, doi: 10.3390/rs13193866. +``` + +**Have you used tobac in your research?** + +Please contact us (e.g. by joining our [tobac google group](https://groups.google.com/g/tobac/about)) or submit a pull request containing your reference in our [main repo on GitHub](https://github.com/tobac-project/tobac)! \ No newline at end of file diff --git a/doc/userguide/publications.rst b/doc/userguide/publications.rst deleted file mode 100644 index 717452af..00000000 --- a/doc/userguide/publications.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. _Refereed-Publications: - -Refereed Publications Using *tobac* -=================================== - -**List of peer-reviewed publications in which tobac has been used:** - ------------- - -.. list-table:: - :widths: 30 - :class: wy-table-responsive - - * - Sokolowsky, G. A., Freeman, S. W., Jones, W. K., Kukulies, J., Senf, F., Marinescu, P. J., Heikenfeld, M., Brunner, K. N., Bruning, E. C., Collis, S. M., Jackson, R. C., Leung, G. R., Pfeifer, N., Raut, B. A., Saleeby, S. M., Stier, P., and van den Heever, S. C.: tobac v1.5 (2024). Introducing Fast 3D Tracking, Splits and Mergers, and Other Enhancements for Identifying and Analysing Meteorological Phenomena. Geoscientific Model Development, 17(13), 5309-5330. https://doi.org/10.5194/gmd-17-5309-2024. - - * - Heikenfeld, M., Marinescu, P. J., Christensen, M., Watson-Parris, D., Senf, F., van den Heever, S. C., and Stier, P. (2019). tobac 1.2: towards a flexible framework for tracking and analysis of clouds in diverse datasets, Geosci. Model Dev., 12, 4551–4570, https://doi.org/10.5194/gmd-12-4551-2019 - - * - Bukowski, J., & van den Heever, S. C. (2021). Direct radiative effects in haboobs. *Journal of Geophysical Research: Atmospheres*, 126(21), e2021JD034814, doi:10.1029/2021JD034814. - - * - Bukowski, J. (2021). Mineral Dust Lofting and Interactions with Cold Pools (Doctoral dissertation, Colorado State University). - - * - Heikenfeld, M. (2019). Aerosol effects on microphysical processes and deep convective clouds (Doctoral dissertation, University of Oxford). - * - Kukulies, J., Chen, D., & Curio, J. (2021). The role of mesoscale convective systems in precipitation in the Tibetan Plateau region. *Journal of Geophysical Research: Atmospheres*, 126(23), e2021JD035279. doi:10.1029/2021JD035279. - - * - Kukulies, J., Lai, H. W., Curio, J., Feng, Z., Lin, C., Li, P., Ou, T., Sugimoto, S. & Chen, D. (2023). Mesoscale convective systems in the Third pole region: Characteristics, mechanisms and impact on precipitation. *Frontiers in Earth Science*, 11, 1143380. - - * - Li, Y., Liu, Y., Chen, Y., Chen, B., Zhang, X., Wang, W. & Huo, Z. (2021). Characteristics of Deep Convective Systems and Initiation during Warm Seasons over China and Its Vicinity. *Remote Sensing*, 13(21), 4289. doi:10.3390/rs13214289. - - * - Leung, G. R., Saleeby, S. M., Sokolowsky, G. A., Freeman, S. W., & van den Heever, S. C. (2023). Aerosol–cloud impacts on aerosol detrainment and rainout in shallow maritime tropical clouds. *Atmospheric Chemistry and Physics*, 23(9), 5263-5278. - - * - Marinescu, P. J., Van Den Heever, S. C., Heikenfeld, M., Barrett, A. I., Barthlott, C., Hoose, C., Fan, J., Fridlind, A. M., Matsui, T., Miltenberger, A. K., Stier, P., Vie, B., White, B. A., & Zhang, Y. (2021). Impacts of varying concentrations of cloud condensation nuclei on deep convective cloud updrafts—a multimodel assessment. *Journal of the Atmospheric Sciences*, 78(4), 1147-1172, doi: 10.1175/JAS-D-20-0200.1. - - * - Marinescu, P. J. (2020). Observations of Aerosol Particles and Deep Convective Updrafts and the Modeling of Their Interactions (Doctoral dissertation, Colorado State University). - - * - Oue, M., Saleeby, S. M., Marinescu, P. J., Kollias, P., & van den Heever, S. C. (2022). Optimizing radar scan strategies for tracking isolated deep convection using observing system simulation experiments. *Atmospheric Measurement Techniques*, 15(16), 4931-4950. - - * - Raut, B. A., Jackson, R., Picel, M., Collis, S. M., Bergemann, M., & Jakob, C. (2021). An Adaptive Tracking Algorithm for Convection in Simulated and Remote Sensing Data. *Journal of Applied Meteorology and Climatology*, 60(4), 513-526, doi:10.1175/JAMC-D-20-0119.1. - - * - Whitaker, J. W. (2021). An Investigation of an East Pacific Easterly Wave Genesis Pathway and the Impact of the Papagayo and Tehuantepec Wind Jets on the East Pacific Mean State and Easterly Waves (Doctoral dissertation, Colorado State University). - - * - Zhang, X., Yin, Y., Kukulies, J., Li, Y., Kuang, X., He, C., .. & Chen, J. (2021). Revisiting Lightning Activity and Parameterization Using Geostationary Satellite Observations. *Remote Sensing*, 13(19), 3866, doi: 10.3390/rs13193866. - - -**Have you used tobac in your research?** - -Please contact us (e.g. by joining our `tobac google group `_) or submit a pull request containing your reference in our `main repo on GitHub `_! - - -