Skip to content

Refactored etl module to improve error handling, and added testing.#25

Open
kjlippold wants to merge 9 commits intomainfrom
359-etl-error-handling
Open

Refactored etl module to improve error handling, and added testing.#25
kjlippold wants to merge 9 commits intomainfrom
359-etl-error-handling

Conversation

@kjlippold
Copy link
Contributor

@kjlippold kjlippold commented Mar 3, 2026

This PR contains a fairly large refactor of the ETL package focused on improving error handling, programming interfaces, validation, and testing. Several features that were previously being handled internally by HydroServer have been moved into the package as well.

  1. Improved error handling using the ETLError exception class. Errors are raised with human-readable messages at the source, removing the need to parse exceptions or tracebacks to generate user-facing messages.
  2. Added the ETLPipeline model as the primary entrypoint into the package. It combines an extractor, transformer, and loader into a single interface for running the pipeline and retrieving results.
  3. Configuration and behavior for most components have been consolidated into unified Pydantic models.
  4. The DataFrame passed from transformers to loaders was changed from wide to long format. Wide format works for CSV files and some JSON payloads, but breaks down when grouped datastreams aren't guaranteed to share the same timestamps — as is the case when sourcing data from HydroServer itself.
  5. Temporal aggregation was moved out of the HydroServer-specific code and into the ETL operations layer alongside arithmetic expression and rating curve operations. Operations now receive a (timestamp, value) DataFrame rather than a bare value series to support this.
  6. Added unit tests for the module. Many of the initial tests were generated with LLM assistance and reviewed by me — we should review them more closely and expand coverage over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant