ocean-uhh · eleanorfrajka · Sep 12, 2025 · Sep 12, 2025 · Sep 12, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,96 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Common Development Commands
+
+### Testing
+```bash
+pytest                           # Run all tests
+pytest tests/test_process_rodb.py  # Run specific test file
+pytest tests/test_mooring_rodb.py  # Run mooring RODB test file
+pytest -v                        # Verbose output
+pytest --cov=oceanarray         # With coverage report
+```
+
+### Code Quality and Linting
+```bash
+black .                         # Format code with black
+ruff check .                    # Run ruff linter
+ruff check . --fix              # Auto-fix issues where possible
+pre-commit run --all-files      # Run all pre-commit hooks
+codespell                       # Check for spelling errors
+```
+
+### Documentation
+```bash
+cd docs
+make html                       # Build documentation locally
+make clean html                 # Clean build and rebuild
+```
+
+### Environment Setup
+```bash
+pip install -r requirements-dev.txt  # Install development dependencies
+pip install -e .                     # Install package in development mode
+```
+
+### Jupyter Notebooks
+```bash
+jupyter nbconvert --clear-output --inplace notebooks/*.ipynb  # Clear notebook outputs
+```
+
+## High-Level Architecture
+
+### Core Processing Stages
+The codebase implements a multi-stage processing pipeline for oceanographic mooring data:
+
+1. **Stage 1** (`stage1.py`): Raw data conversion and initial processing using ctd_tools readers
+2. **Stage 2** (`stage2.py`): Advanced processing, calibration, and quality control
+3. **Time Gridding** (`time_gridding.py`): Multi-instrument coordination, filtering, and interpolation onto common time grids (supersedes `mooring_rodb.py`)
+4. **Array Level** (`transports.py`): Cross-mooring calculations and transport computations (work in progress)
+
+### Key Components
+
+- **Data Readers** (`readers.py`, `rodb.py`): Handle various oceanographic data formats
+- **Data Writers** (`writers.py`): Output processed data in standardized formats
+- **Processing Tools** (`tools.py`, `utilities.py`): Core algorithms for data manipulation
+- **Time Operations** (`time_gridding.py`, `clock_offset.py`, `find_deployment.py`): Temporal processing
+- **Visualization** (`plotters.py`): Data visualization and quality assessment
+- **Logging** (`logger.py`): Configurable logging system
+
+### Data Flow Architecture
+1. Raw instrument files → Stage1 → CF-NetCDF standardized format
+2. Stage1 outputs → Stage2 → Advanced processing and quality control
+3. Multiple instruments → Time Gridding → Common time grid with optional filtering
+4. Multiple moorings → Array-level transport calculations (in development)
+
+### File Type Support
+Supports multiple instrument formats via ctd_tools:
+- SeaBird CNV/ASC files (`sbe-cnv`, `sbe-asc`)
+- RBR RSK/DAT files (`rbr-rsk`, `rbr-dat`) 
+- Nortek AQD files (`nortek-aqd`)
+
+### Key Design Patterns
+- Uses xarray.Dataset as primary data structure throughout pipeline
+- Implements CF conventions for metadata and naming
+- Modular processing stages that can be run independently
+- Configurable logging with different verbosity levels
+- YAML-based configuration for processing parameters
+
+### Legacy Modules
+- `process_rodb.py`: Legacy RODB-format processing functions (for RAPID-style workflows)
+- `mooring_rodb.py`: Legacy RODB mooring-level processing (superseded by time_gridding.py)
+
+### Testing Structure
+- Comprehensive test coverage with pytest
+- Tests organized by module (`test_*.py` files)
+- Uses sample data for integration testing
+- Pre-commit hooks ensure code quality
+
+### Dependencies
+- **Core**: numpy, pandas, xarray, netcdf4, scipy
+- **Oceanographic**: gsw (seawater calculations), ioos_qc (quality control), ctd_tools
+- **Development**: pytest, black, ruff, pre-commit, sphinx
+
+The codebase emphasizes reproducible scientific data processing with clear documentation and methodological transparency.
diff --git a/docs/source/project_structure.md b/docs/source/project_structure.md
diff --git a/docs/source/roadmap.rst b/docs/source/roadmap.rst
@@ -26,7 +26,7 @@ The OceanArray framework currently provides a solid foundation for oceanographic
 
 🟡 **Partially Implemented**
   - Stage 3: Auto QC - basic QARTOD functions exist (``tools.py``)
-  - Stage 4: Calibration - microcat calibration exists (``instrument.py``) 
+  - Stage 4: Calibration - microcat calibration exists (``process_rodb.py``) 
   - Step 2: Vertical Gridding - physics-based interpolation exists (``rapid_interp.py``)
 
 ❌ **Documented but Not Implemented**
@@ -235,7 +235,7 @@ Priority 3: Enhanced Calibration System
 
 **Documentation**: ``docs/source/methods/calibration.rst``
 
-**Current State**: Basic microcat calibration exists in ``instrument.py``.
+**Current State**: Basic microcat calibration exists in ``process_rodb.py``.
 
 **Missing Implementation**:
 - Multi-instrument calibration support (not just microcat)
@@ -247,7 +247,7 @@ Priority 3: Enhanced Calibration System
 **Estimated Effort**: 2-3 weeks
 
 **Implementation Plan**:
-  1. Expand ``instrument.py`` calibration functions
+  1. Expand ``process_rodb.py`` calibration functions
   2. Create calibration configuration system
   3. Add uncertainty propagation
   4. Design calibration workflow automation

diff --git a/notebooks/demo_batch_instrument.ipynb b/notebooks/demo_batch_instrument.ipynb
@@ -16,18 +16,7 @@
    "id": "6a1920f3",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "from pathlib import Path\n",
-    "import numpy as np\n",
-    "import xarray as xr\n",
-    "import numpy as np\n",
-    "from oceanarray import readers, mooring, plotters, tools\n",
-    "from oceanarray import writers, convertOS\n",
-    "from ioos_qc import qartod\n",
-    "from ioos_qc.config import Config\n",
-    "import numpy as np\n",
-    "import gsw\n"
-   ]
+   "source": "from pathlib import Path\nimport numpy as np\nimport xarray as xr\nimport numpy as np\nfrom oceanarray import readers, mooring_rodb, plotters, tools, process_rodb\nimport pandas as pd"
   },
   {
    "cell_type": "markdown",
@@ -84,20 +73,7 @@
    "id": "c782730e",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "import importlib\n",
-    "importlib.reload(mooring)\n",
-    "# Flag bad data to convert from P to D\n",
-    "data_dir = Path(\"..\", \"data\")\n",
-    "files = list(data_dir.glob(\"OS_wb2_9_201114_*_P.nc\"))\n",
-    "\n",
-    "ds_list_OS1 = readers.load_dataset(files)\n",
-    "\n",
-    "ds_stack = mooring.combine_mooring_OS(ds_list_OS)\n",
-    "ds_stack = tools.calc_psal(ds_stack)\n",
-    "\n",
-    "ds_stack\n"
-   ]
+   "source": "import importlib\nimportlib.reload(mooring_rodb)\n# Flag bad data to convert from P to D\ndata_dir = Path(\"..\", \"data\")\nfiles = list(data_dir.glob(\"OS_wb2_9_201114_*_P.nc\"))\n\nds_list_OS1 = readers.load_dataset(files)\n\nds_stack = mooring_rodb.combine_mooring_OS(ds_list_OS)\nds_stack = tools.calc_psal(ds_stack)\n\nds_stack"
   },
   {
    "cell_type": "code",
@@ -529,33 +505,7 @@
    "id": "b16cbb1d",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "for i, (idx_leq, *rest) in enumerate(depth_indices):\n",
-    "    plt.figure(figsize=(10, 4))\n",
-    "    # Main index (black)\n",
-    "    main_data = ds_stack.CNDC[:, i].values\n",
-    "    plt.plot(ds_stack.TIME, tools.normalize_by_middle_percent(main_data, percent=95), color='k', label=f'DEPTH={depths[i]}m (main)')\n",
-    "\n",
-    "    # Next shallower (red)\n",
-    "    if idx_leq is not None:\n",
-    "        shallower_data = ds_stack.CNDC[:, idx_leq].values\n",
-    "        plt.plot(ds_stack.TIME, tools.normalize_by_middle_percent(shallower_data, percent=95), color='r', label=f'DEPTH={depths[idx_leq]}m (shallower)')\n",
-    "\n",
-    "    # Next deeper (blue)\n",
-    "    if rest and rest[0] is not None:\n",
-    "        idx_gt = rest[0]\n",
-    "        deeper_data = ds_stack.CNDC[:, idx_gt].values\n",
-    "        plt.plot(ds_stack.TIME, tools.normalize_by_middle_percent(deeper_data, percent=95), color='b', label=f'DEPTH={depths[idx_gt]}m (deeper)')\n",
-    "\n",
-    "    plt.title(f'CNDC at DEPTH={depths[i]}m and neighbors (normalized)')\n",
-    "    plt.xlabel('Time')\n",
-    "    plt.ylabel('Normalized CNDC')\n",
-    "    plt.legend()\n",
-    "    plt.tight_layout()\n",
-    "    plt.show()\n"
-   ]
+   "source": "import matplotlib.pyplot as plt\n\nfor i, (idx_leq, *rest) in enumerate(depth_indices):\n    plt.figure(figsize=(10, 4))\n    # Main index (black)\n    main_data = ds_stack.CNDC[:, i].values\n    plt.plot(ds_stack.TIME, process_rodb.normalize_by_middle_percent(main_data, percent=95), color='k', label=f'DEPTH={depths[i]}m (main)')\n\n    # Next shallower (red)\n    if idx_leq is not None:\n        shallower_data = ds_stack.CNDC[:, idx_leq].values\n        plt.plot(ds_stack.TIME, process_rodb.normalize_by_middle_percent(shallower_data, percent=95), color='r', label=f'DEPTH={depths[idx_leq]}m (shallower)')\n\n    # Next deeper (blue)\n    if rest and rest[0] is not None:\n        idx_gt = rest[0]\n        deeper_data = ds_stack.CNDC[:, idx_gt].values\n        plt.plot(ds_stack.TIME, process_rodb.normalize_by_middle_percent(deeper_data, percent=95), color='b', label=f'DEPTH={depths[idx_gt]}m (deeper)')\n\n    plt.title(f'CNDC at DEPTH={depths[i]}m and neighbors (normalized)')\n    plt.xlabel('Time')\n    plt.ylabel('Normalized CNDC')\n    plt.legend()\n    plt.tight_layout()\n    plt.show()"
   },
   {
    "cell_type": "code",

diff --git a/notebooks/demo_check_clock.ipynb b/notebooks/demo_check_clock.ipynb
@@ -4,35 +4,7 @@
    "cell_type": "markdown",
    "id": "71edb016",
    "metadata": {},
-   "source": [
-    "## Demo: Clock check - for offsets in instrument clocks\n",
-    "\n",
-    "This is an intermediate step between stage1 and stage 2.  We are trying to determine whether the timestamps for any of the instruments on the same mooring are incorrect.  This is slightly faulty because they could *all* be wrong, unless we are comparing against UTC or have more exact timing knowledge.  For more exact timing knowledge, the deployment time and recovery time (anchor release, either dropping from the ship or release from the seabed) have been added to the yaml file in UTC.  This can be compared against the times estimated through lag correlations.\n",
-    "\n",
-    "### This notebook \n",
-    "\n",
-    "**It does not change anything in the data files.**  You run this notebook in order to update the field `clock_offset` (in seconds) in the YAML file for each instrument on a mooring.  This is normally due to the instruments being set up incorrectly (i.e., with a clock time that did not match UTC).\n",
-    "\n",
-    "After determining the appropriate clock offset, then run the stage2 processing to apply the clock offset to the netCDF files for each instrument.\n",
-    "\n",
-    "Then, running this notebook again using the stage2 files (`*_use.nc`) should predict no additional clock offsets.\n",
-    "\n",
-    "Clock offset is in integer seconds ADDED to the original instrument time.  I.e., shifts the record later.\n",
-    "\n",
-    "### Main check\n",
-    "\n",
-    "- We look at when--according to the instrument clocks--the `temperature` values are cold.  This assumes that in the middle of the record, the temperatures are colder than the near-surface temperatures (may fail for polar deployments).  Cold is within the mean +- 3 * std of the deep values.\n",
-    "\n",
-    "- Then check when the instrument first reads a temperature within those bounds: `start_time`\n",
-    "- And check when the instrument last reads a temperature within those bounds: `end_time`\n",
-    "\n",
-    "Check whether the first timestamp within the cold water for that instrument is similar in time to the first timestamp for another instrument.  This should be reasonably good at getting large offsets in clocks.\n",
-    "\n",
-    "### Secondary check\n",
-    "\n",
-    "- We interpolate data onto a common time grid (rough and ready)\n",
-    "- Check for lag correlation between instruments, and use this to estimate an offset"
-   ]
+   "source": "## Demo: Clock check - for offsets in instrument clocks\n\n**Note: This is the original file by the user. See `demo_clock_offset.ipynb` for a refactored version by Claude.**\n\nThis is an intermediate step between stage1 and stage 2.  We are trying to determine whether the timestamps for any of the instruments on the same mooring are incorrect.  This is slightly faulty because they could *all* be wrong, unless we are comparing against UTC or have more exact timing knowledge.  For more exact timing knowledge, the deployment time and recovery time (anchor release, either dropping from the ship or release from the seabed) have been added to the yaml file in UTC.  This can be compared against the times estimated through lag correlations.\n\n### This notebook \n\n**It does not change anything in the data files.**  You run this notebook in order to update the field `clock_offset` (in seconds) in the YAML file for each instrument on a mooring.  This is normally due to the instruments being set up incorrectly (i.e., with a clock time that did not match UTC).\n\nAfter determining the appropriate clock offset, then run the stage2 processing to apply the clock offset to the netCDF files for each instrument.\n\nThen, running this notebook again using the stage2 files (`*_use.nc`) should predict no additional clock offsets.\n\nClock offset is in integer seconds ADDED to the original instrument time.  I.e., shifts the record later.\n\n### Main check\n\n- We look at when--according to the instrument clocks--the `temperature` values are cold.  This assumes that in the middle of the record, the temperatures are colder than the near-surface temperatures (may fail for polar deployments).  Cold is within the mean +- 3 * std of the deep values.\n\n- Then check when the instrument first reads a temperature within those bounds: `start_time`\n- And check when the instrument last reads a temperature within those bounds: `end_time`\n\nCheck whether the first timestamp within the cold water for that instrument is similar in time to the first timestamp for another instrument.  This should be reasonably good at getting large offsets in clocks.\n\n### Secondary check\n\n- We interpolate data onto a common time grid (rough and ready)\n- Check for lag correlation between instruments, and use this to estimate an offset"
   },
   {
    "cell_type": "code",

diff --git a/notebooks/demo_climatology.ipynb b/notebooks/demo_climatology.ipynb
@@ -16,14 +16,7 @@
    "id": "6a1920f3",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "from pathlib import Path\n",
-    "import numpy as np\n",
-    "import xarray as xr\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from oceanarray import readers, plotters, tools, convertOS, writers, mooring, rapid_interp\n"
-   ]
+   "source": "from pathlib import Path\nimport numpy as np\nimport xarray as xr\nimport numpy as np\nfrom oceanarray import readers, plotters, tools, convertOS, writers, mooring_rodb, rapid_interp\nimport pandas as pd"
   },
   {
    "cell_type": "markdown",

diff --git a/notebooks/demo_clock_offset.ipynb b/notebooks/demo_clock_offset.ipynb
@@ -4,21 +4,7 @@
    "cell_type": "markdown",
    "id": "streamlined-demo",
    "metadata": {},
-   "source": [
-    "# Demo: Streamlined Clock Offset Analysis\n",
-    "\n",
-    "This notebook provides a streamlined version of clock offset analysis for oceanographic instruments.\n",
-    "It uses the new `oceanarray.clock_offset` module for cleaner, more maintainable code.\n",
-    "\n",
-    "## Purpose\n",
-    "\n",
-    "This notebook helps determine whether instrument timestamps are incorrect by:\n",
-    "1. Analyzing deployment timing based on temperature profiles\n",
-    "2. Performing lag correlation analysis between instruments\n",
-    "3. Calculating recommended clock offset corrections\n",
-    "\n",
-    "**Note:** This notebook does not modify data files. It only analyzes and suggests clock_offset values for the YAML configuration.\n"
-   ]
+   "source": "# Demo: Streamlined Clock Offset Analysis\n\n**Note: This is a refactored version by Claude. See `demo_check_clock.ipynb` for the original user file.**\n\nThis notebook provides a streamlined version of clock offset analysis for oceanographic instruments.\nIt uses the new `oceanarray.clock_offset` module for cleaner, more maintainable code.\n\n## Purpose\n\nThis notebook helps determine whether instrument timestamps are incorrect by:\n1. Analyzing deployment timing based on temperature profiles\n2. Performing lag correlation analysis between instruments\n3. Calculating recommended clock offset corrections\n\n**Note:** This notebook does not modify data files. It only analyzes and suggests clock_offset values for the YAML configuration."
   },
   {
    "cell_type": "code",

diff --git a/notebooks/demo_instrument.ipynb b/notebooks/demo_instrument.ipynb
@@ -4,11 +4,7 @@
    "cell_type": "markdown",
    "id": "c6a29764-f39c-431c-8e77-fbc6bfe20f01",
    "metadata": {},
-   "source": [
-    "# Demo: instrument-level processing (Stage 1 and Stage 2)\n",
-    "\n",
-    "This notebook walks through the instrument-level processing in the oceanarray code.\n"
-   ]
+   "source": "# Demo: Instrument-Level Processing (Compact Workflow)\n\nThis notebook walks through the complete instrument-level processing pipeline in the oceanarray codebase, from raw files to science-ready datasets. It demonstrates the same processing steps as `demo_stage1.ipynb` and `demo_stage2.ipynb` but in a more compact, streamlined format.\n\n## Processing Overview\n\n### Stage 1: Format Conversion (`*_raw.nc`)\n- **Purpose**: Convert raw instrument files to standardized NetCDF format\n- **Input**: Raw instrument files (`.cnv`, `.rsk`, `.dat`, `.mat`)  \n- **Output**: Standardized NetCDF files (`*_raw.nc`)\n- **Processing**: Uses `oceanarray.stage1.MooringProcessor` - same as `demo_stage1.ipynb`\n\n### Stage 2: Temporal Corrections & Trimming (`*_use.nc`)\n- **Purpose**: Apply clock corrections and trim to deployment periods\n- **Input**: Stage1 files (`*_raw.nc`) + updated YAML with clock offsets\n- **Output**: Time-corrected files (`*_use.nc`)\n- **Processing**: Uses `oceanarray.stage2.process_multiple_moorings_stage2` - same as `demo_stage2.ipynb`\n\n### Stage 3: Calibrations & Corrections (Optional)\n- **Purpose**: Apply sensor-specific calibrations and corrections\n- **Status**: Commented out sections showing how to apply additional calibrations\n\n### Stage 4: Format Conversion (Optional)\n- **Purpose**: Convert to OceanSites or other standardized formats\n- **Status**: Commented out sections for format conversion\n\n## Key Features\n\n- **Compact Format**: Covers the same ground as separate stage notebooks in one place\n- **Instrument-Level Processing**: Each instrument processed independently before mooring-level coordination\n- **Multiple Instrument Types**: Handles various instrument types with analysis functions\n- **Visualization**: Includes plotting and analysis of processed results\n- **Metadata Management**: YAML configuration files drive processing parameters\n\n## Comparison with Other Notebooks\n\n- **vs demo_stage1.ipynb**: Same Stage1 processing but more concise\n- **vs demo_stage2.ipynb**: Same Stage2 processing but integrated workflow  \n- **vs demo_step1.ipynb**: Focuses on individual instruments rather than mooring-level time gridding\n\nChoose this notebook if you want a complete instrument processing workflow in one place, or use the separate stage notebooks for more detailed exploration of each processing step.\n\nVersion: 1.0  \nDate: 2025-01-15"
   },
   {
    "cell_type": "code",
@@ -227,10 +223,7 @@
    "id": "60fab40c",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "#ds_cal = instrument.apply_microcat_calibration_from_txt(data_dir / 'wb1_12_2015_005.microcat.txt', data_dir / 'wb1_12_2015_6123.use')\n",
-    "#ds_cal\n"
-   ]
+   "source": "#ds_cal = process_rodb.apply_microcat_calibration_from_txt(data_dir / 'wb1_12_2015_005.microcat.txt', data_dir / 'wb1_12_2015_6123.use')\n#ds_cal"
   },
   {
    "cell_type": "code",