Skip to content

Document saved statistics variable names#107

Open
pragnyanramtha wants to merge 3 commits into
mllam:mainfrom
pragnyanramtha:docs-98-statistics-variable-names
Open

Document saved statistics variable names#107
pragnyanramtha wants to merge 3 commits into
mllam:mainfrom
pragnyanramtha:docs-98-statistics-variable-names

Conversation

@pragnyanramtha

Copy link
Copy Markdown

Describe your changes

This updates the README section for output.splitting.compute_statistics to describe how computed statistics are stored in the output dataset.

The added documentation explains the {output_variable}__{split_name}__{operation} naming pattern, gives concrete examples such as state__train__mean, and notes which dimensions remain after reduction. It also clarifies that diff_ statistics are calculated only for variables that span the splitting dimension.

Issue Link

Closes #98

Type of change

  • Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch.
  • I have performed a self-review of my code.
  • I have updated the documentation to cover introduced code changes.
  • I have given the PR a name that clearly describes the change, written in imperative form.
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values. Not applicable, no functions or classes changed.
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code. Not applicable, no code changed.
  • I have added tests that prove my fix is effective or that my feature works. Not applicable, documentation-only change.
  • I have requested a reviewer and an assignee. I will leave reviewer and assignee selection to maintainers.

Validation:

  • git diff --check
  • python -m pre_commit run trailing-whitespace --files README.md
  • python -m pre_commit run end-of-file-fixer --files README.md

Copilot AI review requested due to automatic review settings May 13, 2026 20:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documentation-only update clarifying how computed split statistics are stored in generated output datasets.

Changes:

  • Adds naming pattern documentation for saved statistics variables.
  • Provides concrete statistic variable examples.
  • Documents retained dimensions and diff_ statistic behavior.
Comments suppressed due to low confidence (2)

README.md:286

  • Typo: “normalisating” should be “normalising”.
4. Splitting and calculation of statistics of the output variables, using the `splitting` section. The `output.splitting.splits` attribute defines the individual splits to create (for example `train`, `val` and `test`) and `output.splitting.dim` defines the dimension to split along. The `compute_statistics` can be optionally set for a given split to calculate the statistical properties requested (for example `mean`, `std`) any method available on `xarray.Dataset.{op}` can be used. In addition methods prefixed by `diff_` (so the operational would be listed as `diff_{op}`) compute a statistic based on difference of consecutive time-steps, e.g. `diff_mean` to compute the `mean` of the difference between consecutive timesteps (these are used for normalisating increments). The `dims` attribute defines the dimensions to calculate the statistics over (for example `grid_index` and `time`).

README.md:286

  • This is a run-on sentence; the compute_statistics explanation and the xarray.Dataset.{op} note should be separated or joined with punctuation so the documentation reads clearly.
4. Splitting and calculation of statistics of the output variables, using the `splitting` section. The `output.splitting.splits` attribute defines the individual splits to create (for example `train`, `val` and `test`) and `output.splitting.dim` defines the dimension to split along. The `compute_statistics` can be optionally set for a given split to calculate the statistical properties requested (for example `mean`, `std`) any method available on `xarray.Dataset.{op}` can be used. In addition methods prefixed by `diff_` (so the operational would be listed as `diff_{op}`) compute a statistic based on difference of consecutive time-steps, e.g. `diff_mean` to compute the `mean` of the difference between consecutive timesteps (these are used for normalisating increments). The `dims` attribute defines the dimensions to calculate the statistics over (for example `grid_index` and `time`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread README.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Calculate and export global dataset statistics for ML normalization

2 participants