You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[ ] Benchmark against other tools (PyReshaper, NCO)
61
-
-[x] Build well-documented API
62
-
-[x] Test on CESM1/2/3 model components, compare against existing time series
63
-
-[x] Couple with CMOR process
64
-
-[x] Test portability on other machines
43
+
## Contributor/Bug Reporting Guidelines
44
+
45
+
Please report all issues to the [GitHub issue tracker](https://github.com/AgentOxygen/GenTS/issues). When submitting a bug, run `gents.utils.enable_logging(verbose=True)` at the top of your script to include all log output. This will aid in reproducing the bug and quickly developing a solution.
46
+
47
+
For development, it is recommended to use the [Docker method for testing](https://gents.readthedocs.io/en/latest/). These tests are automatically run in the GitHub workflow, but should be run before committing changes.
Copy file name to clipboardExpand all lines: docs/index.rst
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,10 @@ GenTS consolidates the conversion of history files to time series files into thr
15
15
16
16
#. Detect and read the metadata for all history files into a ``HFCollection``
17
17
#. Apply filters to include/exclude certain history files and then group them by model component (sub directory) and namelist (file name).
18
-
#. Generate an embarrasingly parallel workload that reads each variable across all of the history files within each of the formed groups, concatenate, and write them out as time series files.
18
+
#. Derive a ``TSCollection`` from the ``HFCollection`` and apply configurations/filters to obtain the desired time series files
19
+
#. Generate an embarrasingly parallel workload that can be executed using a Dask cluster to generate each fo the time series files
19
20
20
-
Each of these steps requires the use of a Dask cluster, either created locally (using a LocalCluster) or connected to over a distributed system (such as PBS or SLURM using Dask-Jobqueue). This process is visualized below.
21
+
Each of these steps can be accelerated by a Dask cluster, either created locally (using a LocalCluster) or connected to over a distributed system (such as PBS or SLURM using Dask-Jobqueue). This process is visualized below.
21
22
22
23
Note that after groups are created, the user can specify additional group settings such as slicing the timeseries into chunks of a specified length (10 years by default). All filtering and grouping functions are described in the User Guide (with more to come in the future).
23
24
@@ -28,3 +29,4 @@ Note that after groups are created, the user can specify additional group settin
Tests are written using PyTest and are located in ``gents/tests/``. A Dockerfile is provided for running these tests in a containerized environment. Alternatively, tests can be run in a locally constructed environment.
5
+
6
+
Some unit tests are stand-alone, but many rely on ``gents/tests/test_cases.py`` to generate sample history files to initialize the GenTS workflow.
7
+
8
+
Docker (recommended)
9
+
------------------------
10
+
11
+
Make sure you have `Docker <https://www.docker.com/>`_ installed on your system. Then clone the GitHub repository:
Build the Docker container, you should only need to do this once (unless the environment needs to be updated or changed):
19
+
20
+
.. code-block:: console
21
+
22
+
docker build -t gents .
23
+
24
+
Now run the container. Make sure to bind the repo directory to the ``/project`` mount:
25
+
26
+
.. code-block:: console
27
+
28
+
docker run --rm -v .:/project -t gents:latest
29
+
30
+
To run individual tests, specify the ``pytest`` command:
31
+
32
+
.. code-block:: console
33
+
34
+
docker run --rm -v .:/project -t gents:latest pytest gents/tests/test_workflow.py
35
+
36
+
If making contributions to documentation, you may want to locally build the webpages before committing. ``sphinx`` and ``sphinx-autobuild`` are included in Docker image, and can be run using the following command:
The webpages should then be accessible via `http://localhost:8000 <http://localhost:8000>`_.
43
+
44
+
Local environment
45
+
-----------------
46
+
47
+
Make sure you have a Python instance installed. Ideally, create a virtual environment using ``python -m venv`` or `miniconda <https://www.anaconda.com/docs/getting-started/miniconda/main>`_ before installing GenTS and its dependencies:
``hf_collection`` now contains an internal dictionary that maps history files to metadata stored in the ``gents.meta.netCDFMeta`` class. For example, to print all history files by path and obtain the first entry's metadata:
56
55
@@ -76,7 +75,7 @@ Similarly, we can exclude patterns using ``HFCollection.exclude_patterns`` too:
Note that pulling metadata for ``hf_atm_only`` in this case does not pull metadata for the other two collections. However, if metadata was pulled for ``hf_collection``, all three sub-collections would inherit those metadata objects (and thus would not need to pull again).
93
92
94
93
A common step may be to filter by a date-time string in the file name:
This may work in most cases, but file names are not always reliable and may be difficult to apply across multiple model components. A more robust way of filtering is to operate over the time bounds provided in the metadata. This requires a metadata pull before running, so there is a performance hit for large datasets, but for smaller datasets the decrease is negligible:
101
100
@@ -124,29 +123,29 @@ Metadata for ``hf_collection`` will automatically be pulled if not done so alre
Note that the last inclusive filter only includes history files with a path that contains ".h1." and only derives time series for variables that start with "PREC". You can also exclude time series in the same manner:
Once filtered, custom arguments can be applied to all time series or just a subset. Currently supported arguments include whether to overwrite existing time series, compression level, and compression algorithm. These arguments are passed to the ``netCDF4 Python API <https://unidata.github.io/netcdf4-python/>``_. The arguments can be applied using glob patterns for both paths and variable names:
Note that add arguments modifies the existing ts_collection and does not return a copy. The first line sets all time series output to overwrite existing files. The second line applies level 5 compression using the "zlib" algorithm only to time series output derived from history files that contain "/atm/" in their path. The third line applies level 2 compression to all time series output with primary variables that contain the characters "HD". Note that line 3 overrides any possible overlap with line 2.
@@ -155,7 +154,7 @@ By default, the output path templates ("templates" are incompate path strings wh
Note that swaps are made using the built-in ``replace`` string function, so matches can be made to any part of the path string and should not use glob or re patterns.
0 commit comments