Skip to content

Add timestep_hours aggregator config option#883

Open
mcgibbon wants to merge 5 commits intomainfrom
feature/configure_5_day
Open

Add timestep_hours aggregator config option#883
mcgibbon wants to merge 5 commits intomainfrom
feature/configure_5_day

Conversation

@mcgibbon
Copy link
Contributor

@mcgibbon mcgibbon commented Feb 27, 2026

This PR allows setting timestep_hours on the InferenceEvaluatorAggregatorConfig, which if set to a value other than 6, will modify the step selected for "step_20" metrics.

This does mean "step 20" is a lie in these cases. While not ideal, renaming this metric is also problematic because in coupled modelling we rely on reporting step 20 ocean metrics. This is more problematic than in our daily-step runs (which are still preliminary) having to remember and communicate that "step 20" really means "day 5" (which we generally understand, internally). We should rectify this in a later PR adding more configurability to select the desired lead time.

Changes:

  • Added timestep_hours to InferenceEvaluatorAggregatorConfig, which if set to a value other than 6, will modify the step selected for "step_20" metrics.

  • Tests added

@mcgibbon mcgibbon marked this pull request as ready for review February 27, 2026 20:28
Copy link
Collaborator

@oliverwm1 oliverwm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the user configure the dataset timestep—which is a property of the dataset—how about having the user configure the number of steps at which the RMSE is computed (which would default to 20, maintaining backwards compatibility)?

Comment on lines +147 to +148
timestep_hours: Timestep of the data in hours, used for determining
which timestep corresponds to 5 days of evolution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be requiring the user to configure this, when it is really just a property of the dataset? I don't think there is anywhere else in the codebase where this is a user configurable option.

@mcgibbon
Copy link
Contributor Author

mcgibbon commented Mar 2, 2026

Instead of having the user configure the dataset timestep—which is a property of the dataset—how about having the user configure the number of steps at which the RMSE is computed (which would default to 20, maintaining backwards compatibility)?

Hmmm yeah that's a good idea, I didn't like it originally because it's a lot more finnicky/error-prone to be configuring integer step counts like this than timestep lengths. But it would resolve the issue of aligning this change with ocean inference without lying. The only thing I'm still not sure about is the name, as I do want to compare 5-step inference against 20-step inference on their respective runs, and we have places where we explicitly look for the "mean_step_20" name in our code.

@mcgibbon
Copy link
Contributor Author

mcgibbon commented Mar 2, 2026

How about adding a weather_eval_step: int = None configuration option which if not given keeps the mean_step_20 metrics, or if given replaces them with a weather_step set of metrics using the indicated forward step? This keeps the current default behavior, but lets me put new daily-step and 6h-step runs under the same metric.

@jpdunc23
Copy link
Member

jpdunc23 commented Mar 2, 2026

How about adding a weather_eval_step: int = None configuration option which if not given keeps the mean_step_20 metrics, or if given replaces them with a weather_step set of metrics using the indicated forward step? This keeps the current default behavior, but lets me put new daily-step and 6h-step runs under the same metric.

If not too expensive, I think it would be nice to keep the mean_step_20 metrics even when weather_eval_step is provided.

@oliverwm1
Copy link
Collaborator

as I do want to compare 5-step inference against 20-step inference on their respective runs

It is easy enough to make wandb panels comparing two metrics with different names, so I don't think this should be a blocker

@mcgibbon
Copy link
Contributor Author

mcgibbon commented Mar 2, 2026

as I do want to compare 5-step inference against 20-step inference on their respective runs

It is easy enough to make wandb panels comparing two metrics with different names, so I don't think this should be a blocker

Creating ~50 panels, one for each output, is a bit onerous and means I won’t look at this comparison in practice, except by flipping back and forth for a few key variables.

@mcgibbon
Copy link
Contributor Author

mcgibbon commented Mar 2, 2026

If not too expensive, I think it would be nice to keep the mean_step_20 metrics even when weather_eval_step is provided.

What's the use case for this? Aside from the expense, I'd like to avoid doubling the number of wandb keys I need to flip through related to these metrics. Depending on the use case, it also may make more sense to allow weather_eval_steps: list[int] instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants