Skip to content

Commit c3fc953

Browse files
authored
Merge pull request #11 from AgentOxygen/docs
Cleaned up the docs
2 parents 0e5c8bc + ab7a599 commit c3fc953

9 files changed

Lines changed: 106 additions & 4851 deletions

File tree

.readthedocs.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,11 @@ sphinx:
1717
# Optionally, but recommended,
1818
# declare the Python requirements required to build your documentation
1919
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
20-
# python:
21-
# install:
22-
# - requirements: docs/requirements.txt
20+
python:
21+
install:
22+
- requirements: requirements.txt
23+
- method: pip
24+
path: .
25+
2326

2427

Dockerfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@ COPY pyproject.toml requirements.txt ./
66

77
RUN pip install --upgrade pip
88
RUN pip install --no-cache-dir -r requirements.txt
9-
RUN pip install pytest
9+
RUN pip install pytest sphinx sphinx-autobuild
1010

1111
COPY . .
1212

1313
RUN pip install -e .
1414

15+
EXPOSE 8000
16+
1517
CMD ["pytest", "-v", "gents/tests/"]

README.md

Lines changed: 14 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# **Gen**erate **T**ime **S**eries Tool (GenTS)
1+
# **Gen**erate **T**ime **S**eries (GenTS)
22

33
[![Available on pypi](https://img.shields.io/pypi/v/GenTS.svg)](https://pypi.org/project/GenTS/)
44
[![Docs](https://readthedocs.org/projects/GenTS/badge/?version=latest)](https://gents.readthedocs.io/en/latest/)
@@ -22,9 +22,8 @@ Barebones starting example:
2222

2323
```
2424
from gents.hfcollection import HFCollection
25-
from gents.gents import generate_ts_from_hfcollection
26-
from dask.distributed import LocalCluster
27-
from dask.distributed import Client
25+
from gents.timeseries import TSCollection
26+
from dask.distributed import LocalCluster, Client
2827
2928
cluster = LocalCluster(n_workers=30, threads_per_worker=1, memory_limit="2GB")
3029
client = cluster.get_client()
@@ -33,32 +32,17 @@ input_head_dir = "... case directory with model output ..."
3332
output_head_dir = "... scratch directory to output time series to ..."
3433
3534
hf_collection = HFCollection(input_head_dir)
36-
hf_collection.include_patterns(["*/lnd/*"])
37-
hf_collection.include_years(0, 20)
35+
hf_collection = hf_collection.include_patterns(["*/atm/*", "*/ocn/*", "*.h4.*"])
36+
hf_collection.pull_metadata()
3837
39-
paths = generate_ts_from_hfcollection(hf_collection, output_head_dir, overwrite=True, dask_client=client)
38+
ts_collection = TSCollection(hf_collection.include_years(0, 5), output_head_dir)
39+
ts_collection = ts_collection.apply_overwrite("*")
40+
ts_collection.execute()
4041
```
4142

42-
## Future Planning
43-
Features:
44-
45-
- [x] Automatic directory structure and file name parsing
46-
- [x] Automatic hsitory file grouping (h0, h1, h2, etc.)
47-
- [ ] Custom time slicing
48-
- [x] Custom compression
49-
- [x] Custom output directory structure
50-
- [x] Customizeable per history file group
51-
- [x] Customizeable per variable
52-
- [x] Resumeable process, can handle interrupts
53-
- [ ] Output validation
54-
- [ ] Automated unit testing
55-
- [ ] Command line interface
56-
- [ ] Automatic Dask cluster configuration
57-
58-
Tasks
59-
- [x] Build barebones functional version
60-
- [ ] Benchmark against other tools (PyReshaper, NCO)
61-
- [x] Build well-documented API
62-
- [x] Test on CESM1/2/3 model components, compare against existing time series
63-
- [x] Couple with CMOR process
64-
- [x] Test portability on other machines
43+
## Contributor/Bug Reporting Guidelines
44+
45+
Please report all issues to the [GitHub issue tracker](https://github.com/AgentOxygen/GenTS/issues). When submitting a bug, run `gents.utils.enable_logging(verbose=True)` at the top of your script to include all log output. This will aid in reproducing the bug and quickly developing a solution.
46+
47+
For development, it is recommended to use the [Docker method for testing](https://gents.readthedocs.io/en/latest/). These tests are automatically run in the GitHub workflow, but should be run before committing changes.
48+

docs/api.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
11
API
22
===
33

4-
.. automodule:: gents.gents
5-
:members:
6-
74
.. automodule:: gents.hfcollection
85
:members:
96

docs/index.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,10 @@ GenTS consolidates the conversion of history files to time series files into thr
1515

1616
#. Detect and read the metadata for all history files into a ``HFCollection``
1717
#. Apply filters to include/exclude certain history files and then group them by model component (sub directory) and namelist (file name).
18-
#. Generate an embarrasingly parallel workload that reads each variable across all of the history files within each of the formed groups, concatenate, and write them out as time series files.
18+
#. Derive a ``TSCollection`` from the ``HFCollection`` and apply configurations/filters to obtain the desired time series files
19+
#. Generate an embarrasingly parallel workload that can be executed using a Dask cluster to generate each fo the time series files
1920

20-
Each of these steps requires the use of a Dask cluster, either created locally (using a LocalCluster) or connected to over a distributed system (such as PBS or SLURM using Dask-Jobqueue). This process is visualized below.
21+
Each of these steps can be accelerated by a Dask cluster, either created locally (using a LocalCluster) or connected to over a distributed system (such as PBS or SLURM using Dask-Jobqueue). This process is visualized below.
2122

2223
Note that after groups are created, the user can specify additional group settings such as slicing the timeseries into chunks of a specified length (10 years by default). All filtering and grouping functions are described in the User Guide (with more to come in the future).
2324

@@ -28,3 +29,4 @@ Note that after groups are created, the user can specify additional group settin
2829
install
2930
user
3031
api
32+
tests

docs/tests.rst

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
Testing
2+
=======
3+
4+
Tests are written using PyTest and are located in ``gents/tests/``. A Dockerfile is provided for running these tests in a containerized environment. Alternatively, tests can be run in a locally constructed environment.
5+
6+
Some unit tests are stand-alone, but many rely on ``gents/tests/test_cases.py`` to generate sample history files to initialize the GenTS workflow.
7+
8+
Docker (recommended)
9+
------------------------
10+
11+
Make sure you have `Docker <https://www.docker.com/>`_ installed on your system. Then clone the GitHub repository:
12+
13+
.. code-block:: console
14+
15+
git clone https://github.com/AgentOxygen/GenTS.git
16+
cd GenTS
17+
18+
Build the Docker container, you should only need to do this once (unless the environment needs to be updated or changed):
19+
20+
.. code-block:: console
21+
22+
docker build -t gents .
23+
24+
Now run the container. Make sure to bind the repo directory to the ``/project`` mount:
25+
26+
.. code-block:: console
27+
28+
docker run --rm -v .:/project -t gents:latest
29+
30+
To run individual tests, specify the ``pytest`` command:
31+
32+
.. code-block:: console
33+
34+
docker run --rm -v .:/project -t gents:latest pytest gents/tests/test_workflow.py
35+
36+
If making contributions to documentation, you may want to locally build the webpages before committing. ``sphinx`` and ``sphinx-autobuild`` are included in Docker image, and can be run using the following command:
37+
38+
.. code-block:: console
39+
40+
docker run --rm -v .:/project -p 8000:8000 -it gents:latest sphinx-autobuild docs docs/_build/html --host 0.0.0.0
41+
42+
The webpages should then be accessible via `http://localhost:8000 <http://localhost:8000>`_.
43+
44+
Local environment
45+
-----------------
46+
47+
Make sure you have a Python instance installed. Ideally, create a virtual environment using ``python -m venv`` or `miniconda <https://www.anaconda.com/docs/getting-started/miniconda/main>`_ before installing GenTS and its dependencies:
48+
49+
.. code-block:: console
50+
51+
git clone https://github.com/AgentOxygen/GenTS.git
52+
cd GenTS
53+
pip install --no-cache-dir -r requirements.txt
54+
pip install pytest
55+
pip install -e .
56+
57+
Then execute the tests using ``pytest``:
58+
59+
.. code-block:: console
60+
61+
pytest gents/tests/

docs/user.rst

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ The GenTS (Generate Time Series) is an open-source Python Package designed to si
66
.. code-block:: python
77
88
from gents.hfcollection import HFCollection
9-
from gents.gents import generate_ts_from_hfcollection
10-
from dask.distributed import LocalCluster
11-
from dask.distributed import Client
9+
from gents.timeseries import TSCollection
10+
from dask.distributed import LocalCluster, Client
1211
1312
cluster = LocalCluster(n_workers=30, threads_per_worker=1, memory_limit="2GB")
1413
client = cluster.get_client()
@@ -50,7 +49,7 @@ The ``HFCollection`` class provides an intuitive interface for the user to inter
5049
.. code-block:: python
5150
5251
from gents.hfcollection import HFCollection
53-
hf_collection = HFCollection("my/file/system/scratch/GCM_run/output/history_files/")
52+
hf_collection = HFCollection(hf_dir="my/file/system/scratch/GCM_run/output/history_files/")
5453
5554
``hf_collection`` now contains an internal dictionary that maps history files to metadata stored in the ``gents.meta.netCDFMeta`` class. For example, to print all history files by path and obtain the first entry's metadata:
5655

@@ -76,7 +75,7 @@ Similarly, we can exclude patterns using ``HFCollection.exclude_patterns`` too:
7675

7776
.. code-block:: python
7877
79-
hf_collection = hf_collection.exclude_patterns(["*.once.*", "*/rof/*"])
78+
hf_collection = hf_collection.exclude_patterns(glob_patterns=["*.once.*", "*/rof/*"])
8079
first_entry_path = list(hf_collection)[0]
8180
hf_collection.pull_metadata()
8281
first_entry_meta = hf_collection[first_entry_path]
@@ -85,17 +84,17 @@ Note that the user can specify multiple entries as glob patterns which can filte
8584

8685
.. code-block:: python
8786
88-
hf_atm_only = hf_collection.include_patterns(["*/atm/*"])
89-
hf_ocn_only = hf_collection.include_patterns(["*/ocn/*"])
90-
hf_lnd_only = hf_collection.include_patterns(["*/lnd/*"])
87+
hf_atm_only = hf_collection.include_patterns(glob_patterns=["*/atm/*"])
88+
hf_ocn_only = hf_collection.include_patterns(glob_patterns=["*/ocn/*"])
89+
hf_lnd_only = hf_collection.include_patterns(glob_patterns=["*/lnd/*"])
9190
9291
Note that pulling metadata for ``hf_atm_only`` in this case does not pull metadata for the other two collections. However, if metadata was pulled for ``hf_collection``, all three sub-collections would inherit those metadata objects (and thus would not need to pull again).
9392

9493
A common step may be to filter by a date-time string in the file name:
9594

9695
.. code-block:: python
9796
98-
hf_2010_2019 = hf_collection.include_patterns(["*20100101-20191231.nc"])
97+
hf_2010_2019 = hf_collection.include_patterns(glob_patterns=["*20100101-20191231.nc"])
9998
10099
This may work in most cases, but file names are not always reliable and may be difficult to apply across multiple model components. A more robust way of filtering is to operate over the time bounds provided in the metadata. This requires a metadata pull before running, so there is a performance hit for large datasets, but for smaller datasets the decrease is negligible:
101100

@@ -124,29 +123,29 @@ Metadata for ``hf_collection`` will automatically be pulled if not done so alre
124123

125124
.. code-block:: python
126125
127-
ts_tmax_only = ts_collection.include("*", "TMAX")
128-
ts_prec_only = ts_collection.include("*", "PREC*")
129-
ts_h1_prec_only = ts_collection.include("*.h1.*", "PREC*")
126+
ts_tmax_only = ts_collection.include(path_glob="*", var_glob="TMAX")
127+
ts_prec_only = ts_collection.include(path_glob="*", var_glob="PREC*")
128+
ts_h1_prec_only = ts_collection.include(path_glob="*.h1.*", var_glob="PREC*")
130129
131130
Note that the last inclusive filter only includes history files with a path that contains ".h1." and only derives time series for variables that start with "PREC". You can also exclude time series in the same manner:
132131

133132
.. code-block:: python
134133
135-
ts_without_h4_hurs = ts_collection.exclude("*.h4.*", "HURS")
134+
ts_without_h4_hurs = ts_collection.exclude(path_glob="*.h4.*", var_glob="HURS")
136135
137136
Just like with ``HFCollection``, both ``TSCollection.include`` and ``TSCollection.exclude`` operations return copies, allowing for advanced filtering:
138137

139138
.. code-block:: python
140139
141-
ts_h2_temps_only = ts_collection.include("*.h2.*", "T*")
142-
ts_h2_temps_no_pop = ts_h2_only.exclude("*.pop.*", "*")
140+
ts_h2_temps_only = ts_collection.include(path_glob="*.h2.*", var_glob="T*")
141+
ts_h2_temps_no_pop = ts_h2_only.exclude(path_glob="*.pop.*", var_glob="*")
143142
144143
Once filtered, custom arguments can be applied to all time series or just a subset. Currently supported arguments include whether to overwrite existing time series, compression level, and compression algorithm. These arguments are passed to the ``netCDF4 Python API <https://unidata.github.io/netcdf4-python/>``_. The arguments can be applied using glob patterns for both paths and variable names:
145144
146145
.. code-block:: python
147146
148147
ts_collection.add_args("*", "*", overwrite=True)
149-
ts_collection.add_args("*/atm/*", "*", alg="zlib", level=5)
148+
ts_collection.apply_compression(alg="zlib", level=5, path_glob="*/atm/*", var_glob="*")
150149
ts_collection.add_args("*", "*HD*", alg="zlib", level=2)
151150
152151
Note that add arguments modifies the existing ts_collection and does not return a copy. The first line sets all time series output to overwrite existing files. The second line applies level 5 compression using the "zlib" algorithm only to time series output derived from history files that contain "/atm/" in their path. The third line applies level 2 compression to all time series output with primary variables that contain the characters "HD". Note that line 3 overrides any possible overlap with line 2.
@@ -155,7 +154,7 @@ By default, the output path templates ("templates" are incompate path strings wh
155154

156155
.. code-block:: python
157156
158-
ts_collection.apply_path_swap("/hist", "/tseries/")
157+
ts_collection.apply_path_swap(string_match="/hist", string_swap="/tseries/")
159158
160159
Note that swaps are made using the built-in ``replace`` string function, so matches can be made to any part of the path string and should not use glob or re patterns.
161160

pyproject.toml

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "GenTS"
7-
version = "0.8.0"
7+
version = "0.9.0"
88
authors = [
99
{ name="Cameron Cummins", email="cameron.cummins@utexas.edu" },
1010
]
@@ -28,7 +28,4 @@ Homepage = "https://github.com/AgentOxygen/GenTS"
2828
Issues = "https://github.com/AgentOxygen/GenTS"
2929

3030
[tool.setuptools]
31-
packages = ["gents"]
32-
33-
[project.scripts]
34-
gents = "gents.cli:main"
31+
packages = ["gents"]

0 commit comments

Comments
 (0)