Skip to content

Best practice for handling unevenly sampled (jagged) observations in Black-Box models? (SimpleBbAsciiFile length constraint & Documentation inquiry) #473

@hurys20

Description

@hurys20

Hello OpenDA developers/community,

I am currently integrating a 2D hydrodynamic and water quality model (CE-QUAL-W2) into OpenDA using the BlackBoxWrapper. Since the model is not natively supported, I am using Python scripts to bridge the model I/O with OpenDA.

Currently, I am using org.openda.blackbox.io.SimpleBbAsciiFile for model results and noosObserver for observations. However, I have hit a severe architectural limitation regarding real-world, unevenly sampled data.

The Physical Scenario (The "Jagged Data" Problem):
In our real-world reservoir monitoring (multiple sites and multiple depths), the sampling frequency is naturally uneven.
For example, within a 60-day simulation window:

Site_A_Depth_0.5m might have 15 observations.

Site_B_Depth_10.0m might have 8 observations.

Site_C_Depth_90.0m might only have 2 observations.

The Technical Bottleneck:
My Python bridge successfully extracts a full, continuous 60-day time series for all simulation points and writes them into model_results.output. However, SimpleBbAsciiFile seems to enforce a strict length contract based on the .noo files.

If the .noo file for Site_B has 8 records, OpenDA throws an error when parsing the 60-day model output:

“Error preparing algorithm.
Error message: expecting vector of length 8 for time, but length was 60
...
at org.openda.blackbox.io.SimpleBbAsciiFile.initialize”

Workarounds we considered (but are not ideal):

Intersection Method: Only keep the exact dates where all sites were simultaneously monitored. (This forces all .noo files to be the exact same length, but we lose a massive amount of valuable field data).

Scalarization Method: Abandon timeSeries entirely and treat every single observation point at every specific timestamp as an independent, length-1 scalar variable (e.g., SiteA_Day1, SiteA_Day2). This explodes the XML configuration and loses the semantic meaning of a time series.

My Questions:

Advanced I/O Handling: Is there a more advanced generic IO class (e.g., a generic NetCDF wrapper) or a specific Observation Operator configuration in OpenDA that allows us to feed a full, continuous model time series (e.g., length 60) and let OpenDA automatically interpolate or pick the matching timestamps based on the varying lengths of individual .noo files?

Best Practices: How do advanced models integrated via the Black-Box approach usually handle varying observation frequencies across different measurement vectors without triggering the length mismatch error?

Documentation & Manuals: Is there a continuously updated reference manual or comprehensive documentation for OpenDA's built-in wrapper classes and algorithms? We often find ourselves unsure about what newer classes might exist, what their underlying constraints/limitations are, how to correctly format their XML paradigms, and how they compare to one another. A detailed guide would greatly help us fully utilize the framework's potential.

Any guidance, documentation links, or examples pointing to a more advanced I/O handler for Black-Box models would be greatly appreciated. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions