Document saved statistics variable names#107
Open
pragnyanramtha wants to merge 3 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Documentation-only update clarifying how computed split statistics are stored in generated output datasets.
Changes:
- Adds naming pattern documentation for saved statistics variables.
- Provides concrete statistic variable examples.
- Documents retained dimensions and
diff_statistic behavior.
Comments suppressed due to low confidence (2)
README.md:286
- Typo: “normalisating” should be “normalising”.
4. Splitting and calculation of statistics of the output variables, using the `splitting` section. The `output.splitting.splits` attribute defines the individual splits to create (for example `train`, `val` and `test`) and `output.splitting.dim` defines the dimension to split along. The `compute_statistics` can be optionally set for a given split to calculate the statistical properties requested (for example `mean`, `std`) any method available on `xarray.Dataset.{op}` can be used. In addition methods prefixed by `diff_` (so the operational would be listed as `diff_{op}`) compute a statistic based on difference of consecutive time-steps, e.g. `diff_mean` to compute the `mean` of the difference between consecutive timesteps (these are used for normalisating increments). The `dims` attribute defines the dimensions to calculate the statistics over (for example `grid_index` and `time`).
README.md:286
- This is a run-on sentence; the
compute_statisticsexplanation and thexarray.Dataset.{op}note should be separated or joined with punctuation so the documentation reads clearly.
4. Splitting and calculation of statistics of the output variables, using the `splitting` section. The `output.splitting.splits` attribute defines the individual splits to create (for example `train`, `val` and `test`) and `output.splitting.dim` defines the dimension to split along. The `compute_statistics` can be optionally set for a given split to calculate the statistical properties requested (for example `mean`, `std`) any method available on `xarray.Dataset.{op}` can be used. In addition methods prefixed by `diff_` (so the operational would be listed as `diff_{op}`) compute a statistic based on difference of consecutive time-steps, e.g. `diff_mean` to compute the `mean` of the difference between consecutive timesteps (these are used for normalisating increments). The `dims` attribute defines the dimensions to calculate the statistics over (for example `grid_index` and `time`).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
This updates the README section for
output.splitting.compute_statisticsto describe how computed statistics are stored in the output dataset.The added documentation explains the
{output_variable}__{split_name}__{operation}naming pattern, gives concrete examples such asstate__train__mean, and notes which dimensions remain after reduction. It also clarifies thatdiff_statistics are calculated only for variables that span the splitting dimension.Issue Link
Closes #98
Type of change
Checklist before requesting a review
Validation:
git diff --checkpython -m pre_commit run trailing-whitespace --files README.mdpython -m pre_commit run end-of-file-fixer --files README.md