Merge pull request #11 from AgentOxygen/docs

AgentOxygen · web-flow · commit c3fc9538d541 · 2025-09-24T15:38:25.000-05:00
Cleaned up the docs
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -17,8 +17,11 @@ sphinx:
 # Optionally, but recommended,
 # declare the Python requirements required to build your documentation
 # See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
-# python:
-#    install:
-#    - requirements: docs/requirements.txt
+python:
+   install:
+   - requirements: requirements.txt
+   - method: pip
+     path: .
+
         
 
diff --git a/Dockerfile b/Dockerfile
@@ -6,10 +6,12 @@ COPY pyproject.toml requirements.txt ./
 
 RUN pip install --upgrade pip
 RUN pip install --no-cache-dir -r requirements.txt
-RUN pip install pytest
+RUN pip install pytest sphinx sphinx-autobuild
 
 COPY . .
 
 RUN pip install -e .
 
+EXPOSE 8000
+
 CMD ["pytest", "-v", "gents/tests/"]
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# **Gen**erate **T**ime **S**eries Tool (GenTS)
+# **Gen**erate **T**ime **S**eries (GenTS)
 
 [![Available on pypi](https://img.shields.io/pypi/v/GenTS.svg)](https://pypi.org/project/GenTS/)
 [![Docs](https://readthedocs.org/projects/GenTS/badge/?version=latest)](https://gents.readthedocs.io/en/latest/)
@@ -22,9 +22,8 @@ Barebones starting example:
 
 ```
 from gents.hfcollection import HFCollection
-from gents.gents import generate_ts_from_hfcollection
-from dask.distributed import LocalCluster
-from dask.distributed import Client
+from gents.timeseries import TSCollection
+from dask.distributed import LocalCluster, Client
 
 cluster = LocalCluster(n_workers=30, threads_per_worker=1, memory_limit="2GB")
 client = cluster.get_client()
@@ -33,32 +32,17 @@ input_head_dir = "... case directory with model output ..."
 output_head_dir = "... scratch directory to output time series to ..."
 
 hf_collection = HFCollection(input_head_dir)
-hf_collection.include_patterns(["*/lnd/*"])
-hf_collection.include_years(0, 20)
+hf_collection = hf_collection.include_patterns(["*/atm/*", "*/ocn/*", "*.h4.*"])
+hf_collection.pull_metadata()
 
-paths = generate_ts_from_hfcollection(hf_collection, output_head_dir, overwrite=True, dask_client=client)
+ts_collection = TSCollection(hf_collection.include_years(0, 5), output_head_dir)
+ts_collection = ts_collection.apply_overwrite("*")
+ts_collection.execute()
 ```
 
-## Future Planning
-Features:
-
-- [x] Automatic directory structure and file name parsing
-- [x] Automatic hsitory file grouping (h0, h1, h2, etc.)
-- [ ] Custom time slicing
-- [x] Custom compression
-- [x] Custom output directory structure
-- [x] Customizeable per history file group
-- [x] Customizeable per variable
-- [x] Resumeable process, can handle interrupts
-- [ ] Output validation
-- [ ] Automated unit testing
-- [ ] Command line interface
-- [ ] Automatic Dask cluster configuration
-
-Tasks
-- [x] Build barebones functional version
-- [ ] Benchmark against other tools (PyReshaper, NCO)
-- [x] Build well-documented API
-- [x] Test on CESM1/2/3 model components, compare against existing time series
-- [x] Couple with CMOR process
-- [x] Test portability on other machines
+## Contributor/Bug Reporting Guidelines
+
+Please report all issues to the [GitHub issue tracker](https://github.com/AgentOxygen/GenTS/issues). When submitting a bug, run `gents.utils.enable_logging(verbose=True)` at the top of your script to include all log output. This will aid in reproducing the bug and quickly developing a solution.
+
+For development, it is recommended to use the [Docker method for testing](https://gents.readthedocs.io/en/latest/). These tests are automatically run in the GitHub workflow, but should be run before committing changes.
+
diff --git a/docs/api.rst b/docs/api.rst
@@ -1,9 +1,6 @@
 API
 ===
 
-.. automodule:: gents.gents
-   :members:
-
 .. automodule:: gents.hfcollection
    :members:
 
diff --git a/docs/index.rst b/docs/index.rst
@@ -15,9 +15,10 @@ GenTS consolidates the conversion of history files to time series files into thr
 
 #. Detect and read the metadata for all history files into a ``HFCollection`` 
 #. Apply filters to include/exclude certain history files and then group them by model component (sub directory) and namelist (file name).
-#. Generate an embarrasingly parallel workload that reads each variable across all of the history files within each of the formed groups, concatenate, and write them out as time series files.
+#. Derive a ``TSCollection`` from the ``HFCollection`` and apply configurations/filters to obtain the desired time series files
+#. Generate an embarrasingly parallel workload that can be executed using a Dask cluster to generate each fo the time series files
 
-Each of these steps requires the use of a Dask cluster, either created locally (using a LocalCluster) or connected to over a distributed system (such as PBS or SLURM using Dask-Jobqueue). This process is visualized below.
+Each of these steps can be accelerated by a Dask cluster, either created locally (using a LocalCluster) or connected to over a distributed system (such as PBS or SLURM using Dask-Jobqueue). This process is visualized below.
 
 Note that after groups are created, the user can specify additional group settings such as slicing the timeseries into chunks of a specified length (10 years by default). All filtering and grouping functions are described in the User Guide (with more to come in the future).
 
@@ -28,3 +29,4 @@ Note that after groups are created, the user can specify additional group settin
     install
     user
     api
+    tests
diff --git a/docs/tests.rst b/docs/tests.rst
@@ -0,0 +1,61 @@
+Testing
+=======
+
+Tests are written using PyTest and are located in ``gents/tests/``. A Dockerfile is provided for running these tests in a containerized environment. Alternatively, tests can be run in a locally constructed environment.
+
+Some unit tests are stand-alone, but many rely on ``gents/tests/test_cases.py`` to generate sample history files to initialize the GenTS workflow.
+
+Docker (recommended)
+------------------------
+
+Make sure you have `Docker <https://www.docker.com/>`_ installed on your system. Then clone the GitHub repository:
+
+.. code-block:: console
+
+    git clone https://github.com/AgentOxygen/GenTS.git
+    cd GenTS
+
+Build the Docker container, you should only need to do this once (unless the environment needs to be updated or changed):
+
+.. code-block:: console
+
+    docker build -t gents .
+
+Now run the container. Make sure to bind the repo directory to the ``/project`` mount:
+
+.. code-block:: console
+
+    docker run --rm -v .:/project -t gents:latest
+
+To run individual tests, specify the ``pytest`` command:
+
+.. code-block:: console
+
+    docker run --rm -v .:/project -t gents:latest pytest gents/tests/test_workflow.py
+
+If making contributions to documentation, you may want to locally build the webpages before committing. ``sphinx`` and ``sphinx-autobuild`` are included in Docker image, and can be run using the following command:
+
+.. code-block:: console
+
+    docker run --rm -v .:/project -p 8000:8000 -it gents:latest sphinx-autobuild docs docs/_build/html --host 0.0.0.0
+
+The webpages should then be accessible via `http://localhost:8000 <http://localhost:8000>`_.
+
+Local environment
+-----------------
+
+Make sure you have a Python instance installed. Ideally, create a virtual environment using ``python -m venv`` or `miniconda <https://www.anaconda.com/docs/getting-started/miniconda/main>`_ before installing GenTS and its dependencies:
+
+.. code-block:: console
+
+    git clone https://github.com/AgentOxygen/GenTS.git
+    cd GenTS
+    pip install --no-cache-dir -r requirements.txt
+    pip install pytest
+    pip install -e .
+
+Then execute the tests using ``pytest``:
+
+.. code-block:: console
+
+    pytest gents/tests/
diff --git a/docs/user.rst b/docs/user.rst
@@ -6,9 +6,8 @@ The GenTS (Generate Time Series) is an open-source Python Package designed to si
 .. code-block:: python
 
     from gents.hfcollection import HFCollection
-    from gents.gents import generate_ts_from_hfcollection
-    from dask.distributed import LocalCluster
-    from dask.distributed import Client
+    from gents.timeseries import TSCollection
+    from dask.distributed import LocalCluster, Client
     
     cluster = LocalCluster(n_workers=30, threads_per_worker=1, memory_limit="2GB")
     client = cluster.get_client()
@@ -50,7 +49,7 @@ The ``HFCollection`` class provides an intuitive interface for the user to inter
 .. code-block:: python
 
     from gents.hfcollection import HFCollection
-    hf_collection = HFCollection("my/file/system/scratch/GCM_run/output/history_files/")
+    hf_collection = HFCollection(hf_dir="my/file/system/scratch/GCM_run/output/history_files/")
 
 ``hf_collection`` now contains an internal dictionary that maps history files to metadata stored in the ``gents.meta.netCDFMeta`` class. For example, to print all history files by path and obtain the first entry's metadata:
 
@@ -76,7 +75,7 @@ Similarly, we can exclude patterns using ``HFCollection.exclude_patterns`` too:
 
 .. code-block:: python
 
-    hf_collection = hf_collection.exclude_patterns(["*.once.*", "*/rof/*"])
+    hf_collection = hf_collection.exclude_patterns(glob_patterns=["*.once.*", "*/rof/*"])
     first_entry_path = list(hf_collection)[0]
     hf_collection.pull_metadata()
     first_entry_meta = hf_collection[first_entry_path]
@@ -85,17 +84,17 @@ Note that the user can specify multiple entries as glob patterns which can filte
 
 .. code-block:: python
 
-    hf_atm_only = hf_collection.include_patterns(["*/atm/*"])
-    hf_ocn_only = hf_collection.include_patterns(["*/ocn/*"])
-    hf_lnd_only = hf_collection.include_patterns(["*/lnd/*"])
+    hf_atm_only = hf_collection.include_patterns(glob_patterns=["*/atm/*"])
+    hf_ocn_only = hf_collection.include_patterns(glob_patterns=["*/ocn/*"])
+    hf_lnd_only = hf_collection.include_patterns(glob_patterns=["*/lnd/*"])
 
 Note that pulling metadata for ``hf_atm_only`` in this case does not pull metadata for the other two collections. However, if metadata was pulled for ``hf_collection``, all three sub-collections would inherit those metadata objects (and thus would not need to pull again).
 
 A common step may be to filter by a date-time string in the file name:
 
 .. code-block:: python
 
-    hf_2010_2019 = hf_collection.include_patterns(["*20100101-20191231.nc"])
+    hf_2010_2019 = hf_collection.include_patterns(glob_patterns=["*20100101-20191231.nc"])
 
 This may work in most cases, but file names are not always reliable and may be difficult to apply across multiple model components. A more robust way of filtering is to operate over the time bounds provided in the metadata. This requires a metadata pull before running, so there is a performance hit for large datasets, but for smaller datasets the decrease is negligible:
 
@@ -124,29 +123,29 @@ Metadata for ``hf_collection`` will automatically  be pulled if not done so alre
 
 .. code-block:: python
 
-    ts_tmax_only = ts_collection.include("*", "TMAX")
-    ts_prec_only = ts_collection.include("*", "PREC*")
-    ts_h1_prec_only = ts_collection.include("*.h1.*", "PREC*")
+    ts_tmax_only = ts_collection.include(path_glob="*", var_glob="TMAX")
+    ts_prec_only = ts_collection.include(path_glob="*", var_glob="PREC*")
+    ts_h1_prec_only = ts_collection.include(path_glob="*.h1.*", var_glob="PREC*")
 
 Note that the last inclusive filter only includes history files with a path that contains ".h1." and only derives time series for variables that start with "PREC". You can also exclude time series in the same manner:
 
 .. code-block:: python
 
-    ts_without_h4_hurs = ts_collection.exclude("*.h4.*", "HURS")
+    ts_without_h4_hurs = ts_collection.exclude(path_glob="*.h4.*", var_glob="HURS")
 
 Just like with ``HFCollection``, both ``TSCollection.include`` and ``TSCollection.exclude`` operations return copies, allowing for advanced filtering:
 
 .. code-block:: python
 
-    ts_h2_temps_only = ts_collection.include("*.h2.*", "T*")
-    ts_h2_temps_no_pop = ts_h2_only.exclude("*.pop.*", "*")
+    ts_h2_temps_only = ts_collection.include(path_glob="*.h2.*", var_glob="T*")
+    ts_h2_temps_no_pop = ts_h2_only.exclude(path_glob="*.pop.*", var_glob="*")
 
 Once filtered, custom arguments can be applied to all time series or just a subset. Currently supported arguments include whether to overwrite existing time series, compression level, and compression algorithm. These arguments are passed to the ``netCDF4 Python API <https://unidata.github.io/netcdf4-python/>``_. The arguments can be applied using glob patterns for both paths and variable names:
 
 .. code-block:: python
 
     ts_collection.add_args("*", "*", overwrite=True)
-    ts_collection.add_args("*/atm/*", "*", alg="zlib", level=5)
+    ts_collection.apply_compression(alg="zlib", level=5, path_glob="*/atm/*", var_glob="*")
     ts_collection.add_args("*", "*HD*", alg="zlib", level=2)
 
 Note that add arguments modifies the existing ts_collection and does not return a copy. The first line sets all time series output to overwrite existing files. The second line applies level 5 compression using the "zlib" algorithm only to time series output derived from history files that contain "/atm/" in their path. The third line applies level 2 compression to all time series output with primary variables that contain the characters "HD". Note that line 3 overrides any possible overlap with line 2.
@@ -155,7 +154,7 @@ By default, the output path templates ("templates" are incompate path strings wh
 
 .. code-block:: python
 
-    ts_collection.apply_path_swap("/hist", "/tseries/")
+    ts_collection.apply_path_swap(string_match="/hist", string_swap="/tseries/")
 
 Note that swaps are made using the built-in ``replace`` string function, so matches can be made to any part of the path string and should not use glob or re patterns.
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "GenTS"
-version = "0.8.0"
+version = "0.9.0"
 authors = [
   { name="Cameron Cummins", email="cameron.cummins@utexas.edu" },
 ]
@@ -28,7 +28,4 @@ Homepage = "https://github.com/AgentOxygen/GenTS"
 Issues = "https://github.com/AgentOxygen/GenTS"
 
 [tool.setuptools]
-packages = ["gents"]
-
-[project.scripts]
-gents = "gents.cli:main"
+packages = ["gents"]
diff --git a/utils/template_notebook.ipynb b/utils/template_notebook.ipynb