This document outlines the development roadmap for the OceanArray processing framework, focusing on features documented in the processing workflow that need implementation, technical improvements, and future functionality priorities.
- Status Overview
- Priority 1: Core Missing Features
- 1. Stage 3: Automatic Quality Control using QARTOD Standards
- 2. Stage 4: Calibration Information Integration (Microcat Focus)
- 3. Stage 5: OceanSites Format Conversion
- 4. Step 3: Deployment Concatenation
- 5. Enhanced Visualization System
- 6. Intelligent Metadata Fallback System
- 7. Comprehensive Mooring Processing Reports
- Priority 2: Advanced Processing Features
- Priority 3: Enhanced Calibration System
- Priority 4: System Architecture Improvements
- Priority 5: Advanced Analysis Features
- Development Milestones
- Technical Debt and Maintenance
- Dependencies and External Integration
- Community and Collaboration
The OceanArray framework currently provides a solid foundation for oceanographic data processing, but several key components documented in the processing framework require implementation or completion.
Current Implementation Status:
- ✅ Implemented & Working
- Stage 1: Standardisation (
stage1.py) - Stage 2: Trimming & Clock Corrections (
stage2.py) - Step 1: Time Gridding (
time_gridding.py) - Clock Offset Analysis (
clock_offset.py) - Data Readers (
readers.py) - Basic QC visualization (
plotters.py) - Configurable Logging System
- Stage 1: Standardisation (
- 🟡 Partially Implemented
- Step 2: Vertical Gridding - physics-based interpolation exists (
rapid_interp.py)
- Step 2: Vertical Gridding - physics-based interpolation exists (
- ❌ Documented but Not Implemented
- Stage 3: Automatic Quality Control using QARTOD standards
- Stage 4: Calibration Information Integration (microcat focus)
- Stage 5: Conversion to OceanSites format
- Step 3: Concatenation of deployments
- Multi-site merging for boundary profiles
Documentation: docs/source/methods/auto_qc.rst
Purpose: Apply systematic quality control checks following QARTOD (Quality Assurance/Quality Control of Real-Time Oceanographic Data) standards to identify and flag suspect data.
Current State: Basic QC functions exist in tools.py:run_qc() with salinity outlier detection, temporal spike detection, and visualization in plotters.py.
Missing Implementation:
- Complete stage3.py module implementing full QARTOD test suite
- Integration with ioos_qc package for standardized tests
- QARTOD-compliant flag value handling (0,1,2,3,4,7,8,9)
- Configurable QC test parameters via YAML
- Automated QC report generation with summary statistics
- QC metadata preservation in NetCDF output
QARTOD Tests to Implement: - Gross range test (min/max bounds) - Climatological test (seasonal expectations) - Spike test (temporal derivatives) - Rate of change test - Flat line test (stuck values) - Multi-variate tests (T-S relationships) - Neighbor test (spatial consistency)
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Create
oceanarray/stage3.pymodule withQCProcessorclass - Design YAML-based QC configuration system for test parameters
- Integrate
ioos_qcpackage for standardized QARTOD implementations - Implement comprehensive QARTOD flag handling and metadata
- Add QC validation and reporting with summary statistics
- Integrate with Stage 2 → Stage 3 → Stage 4 pipeline
- Create
Documentation: docs/source/methods/calibration.rst
Purpose: Apply instrument calibration corrections, with initial focus on Sea-Bird MicroCAT conductivity-temperature sensors, incorporating pre- and post-deployment calibration information.
Current State: Basic microcat calibration functions exist in process_rodb.py for legacy RODB workflows.
Missing Implementation:
- Complete stage4.py module for modern CF-compliant calibration workflow
- Integration with Sea-Bird calibration certificate parsing
- Pre/post-deployment calibration comparison and drift analysis
- Conductivity cell thermal mass corrections
- Calibration uncertainty propagation through processing chain
- Calibration metadata preservation in NetCDF output
- Support for multiple calibration coefficient sets
Calibration Features to Implement: - Sea-Bird calibration certificate parsing (.xmlcon, .cal files) - Conductivity calibration equation application (frequency-based) - Temperature calibration with ITS-90 conversion - Pressure sensor calibration and atmospheric correction - Thermal mass correction for conductivity measurements - Calibration drift analysis between pre/post deployments - Uncertainty quantification and propagation
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Create
oceanarray/stage4.pymodule withCalibrationProcessorclass - Design calibration configuration system for coefficient management
- Implement Sea-Bird calibration certificate parsing
- Add thermal mass correction algorithms
- Create pre/post calibration comparison tools
- Add uncertainty propagation and metadata preservation
- Integrate with Stage 3 → Stage 4 → Stage 5 pipeline
- Create
Documentation: docs/source/methods/conversion.rst
Purpose: Convert processed and calibrated data to OceanSites format specification for community data sharing and archival.
Current State: Some format conversion exists in convertOS.py, but not the full OceanSites specification compliance.
Missing Implementation:
- Complete stage5.py module for OceanSites format conversion
- Global attribute validation and enforcement per OceanSites standards
- CF-convention compliance checking and validation
- Variable attribute standardization according to OceanSites vocabulary
- Comprehensive metadata template system
- Quality flag conversion to OceanSites standards
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Create
oceanarray/stage5.pymodule withOceanSitesProcessorclass - Implement complete OceanSites format validation
- Add CF-compliance checking and enforcement
- Design metadata template system for OceanSites requirements
- Add quality flag conversion from QARTOD to OceanSites standards
- Integrate with Stage 4 → Stage 5 pipeline
- Create
Documentation: docs/source/methods/concatenation.rst
Current State: No implementation found.
Missing Implementation: - Multi-deployment time series merging - Gap handling and interpolation - Consistent time-pressure grid creation - Metadata preservation across deployments - Quality flag propagation
Estimated Effort: 1-2 weeks
- Implementation Plan:
- Create
oceanarray/concatenation.pymodule - Design deployment merging algorithm
- Implement gap filling strategies
- Add time-pressure grid standardization
- Create validation and QC checks
- Create
Current State: Basic plotting functions exist in plotters.py.
Missing Implementation: - Interactive plotting capabilities - Multi-instrument comparison plots - Time series overview with zoom functionality - QC flag visualization overlays - Deployment boundary and gap visualization - Statistical summary plots - Customizable plot templates
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Expand
plotters.pywith interactive features - Add multi-instrument comparison tools
- Implement QC flag overlay visualization
- Create statistical summary plots
- Add customizable plotting templates
- Integrate with processing pipeline for automatic reporting
- Expand
Current State: Metadata extraction relies on explicit YAML configuration.
Missing Implementation: - Filename pattern parsing for instrument type and serial number - Fallback metadata extraction when YAML is incomplete - Intelligent instrument identification from file patterns - Automatic serial number detection from filenames - Validation and warning system for inferred metadata
Estimated Effort: 1 week
- Implementation Plan:
- Create filename parsing utilities in
utilities.py - Design instrument type detection patterns
- Add serial number extraction from common filename formats
- Implement metadata validation and fallback logic
- Add logging and warnings for inferred metadata
- Integrate with Stage 1 processing pipeline
- Create filename parsing utilities in
Current State: No automated reporting system exists.
Missing Implementation: - HTML report generation for each mooring - Processing completeness analysis (YAML vs actual files) - Missing file detection and reporting - Data coverage visualization and statistics - Automated figure generation for all available variables - Processing timeline and status summaries - Integration with existing processing pipeline
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Create
oceanarray/reporting.pymodule withReportGeneratorclass - Design HTML template system for mooring reports
- Implement file completeness checking (YAML vs
*_use.ncvs raw files) - Add automated visualization generation for all data variables
- Create processing status and timeline summaries
- Integrate with processing pipeline for automatic report generation
- Design directory structure:
moor/proc/{mooring}/processing/{report,logs,figures}/
- Create
Documentation: docs/source/methods/multisite_merging.rst
Current State: No implementation found.
Missing Implementation: - Cross-site data integration - Boundary profile construction - Static stability checking - Site-specific weighting strategies - Spatial interpolation methods
Estimated Effort: 3-4 weeks
- Implementation Plan:
- Create
oceanarray/multisite_merging.pymodule - Implement spatial merging algorithms
- Add static stability validation
- Design site weighting strategies
- Create boundary profile outputs
- Create
Documentation: docs/source/methods/vertical_gridding.rst
Current State: Physics-based interpolation exists in rapid_interp.py but needs integration.
Missing Implementation: - Integration with main processing pipeline - Climatology data management - Configuration for different interpolation strategies - Gap filling and extrapolation options - Validation against known profiles
Estimated Effort: 1-2 weeks
- Implementation Plan:
- Refactor
rapid_interp.pyfor general use - Create configuration system for interpolation parameters
- Add climatology data handling
- Integrate with mooring processing workflow
- Add validation and diagnostic tools
- Refactor
Documentation: docs/source/methods/calibration.rst
Current State: Basic microcat calibration exists in process_rodb.py.
Missing Implementation: - Multi-instrument calibration support (not just microcat) - Structured calibration metadata handling - Pre/post-cruise comparison workflows - Calibration uncertainty propagation - Automated calibration log parsing
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Expand
process_rodb.pycalibration functions - Create calibration configuration system
- Add uncertainty propagation
- Design calibration workflow automation
- Add comprehensive logging and provenance
- Expand
Current State: Processing functions scattered across multiple modules.
Improvement: Create organized methods/ directory structure:
oceanarray/methods/
├── __init__.py
├── auto_qc.py
├── calibration.py
├── concatenation.py
├── conversion.py
├── multisite_merging.py
└── vertical_gridding.py
Estimated Effort: 1 week
Current State: Basic logging configuration exists.
Missing Features: - Global processing configuration - Site-specific parameter management - Processing pipeline configuration - Validation and schema checking
Estimated Effort: 1-2 weeks
Current State: Basic tests exist in tests/ directory.
Missing Features: - End-to-end pipeline testing - Method-specific unit tests - Configuration validation tests - Performance benchmarking
Estimated Effort: 2-3 weeks (ongoing)
Technical Debt Note: This represents accumulated testing debt where functionality exists but lacks comprehensive test coverage, making maintenance and refactoring more risky.
Current State: Standard NetCDF output with basic compression.
Missing Implementation: - Optimized chunking strategies - Advanced compression algorithms - Memory-efficient processing for large datasets - Streaming processing capabilities - Storage format optimization
Estimated Effort: 2-3 weeks
- Implementation Plan:
- Profile current storage bottlenecks
- Implement optimized chunking strategies
- Add advanced compression options
- Create memory-efficient processing pipelines
- Add storage format benchmarking
- Improve test coverage (address technical debt)
- Implement intelligent metadata fallback system
- Enhance visualization system
- Implement comprehensive mooring processing reports
- Complete auto QC framework
- Implement OceanSites format conversion
- Add deployment concatenation
- Organize methods module structure
- Enhance configuration system
- Implement multi-site merging
- Complete vertical gridding integration
- Enhance calibration framework
- Improve data storage efficiency
- Performance optimization and profiling
- Create comprehensive documentation
- User experience improvements
- Code Quality - Add type hints throughout codebase - Improve error handling and validation - Standardize documentation strings - Enhance logging throughout pipeline
- Performance - Profile processing bottlenecks - Optimize memory usage for large datasets - Add parallel processing capabilities - Implement caching strategies
- User Experience - Create command-line interface - Add progress indicators for long operations - Improve error messages and debugging - Create tutorial notebooks
- Documentation - Complete API documentation - Add processing examples - Create troubleshooting guides - Document best practices
ioos_qc: For comprehensive QC implementationgsw(TEOS-10): For seawater property calculationsverticalnn: For physics-based vertical interpolationxarray&netCDF4: Core data handlingdask: For large dataset processing (future)
- Pangaea: Data publication workflows
- OceanSites: Enhanced format compliance
- ERDDAP: Direct data ingestion capabilities
- Method validation with known datasets
- Cross-array compatibility testing
- Performance benchmarking
- User interface development
- Processing workflow documentation
This roadmap provides a structured path toward completing the OceanArray processing framework while maintaining focus on documented requirements and practical implementation priorities.