Skip to content

(0.3.0) Refactor bounding box to Metadata.region and add Column #142

Merged
glwagner merged 139 commits into
mainfrom
glw-eq/regional-metadata
Apr 27, 2026
Merged

(0.3.0) Refactor bounding box to Metadata.region and add Column #142
glwagner merged 139 commits into
mainfrom
glw-eq/regional-metadata

Conversation

@glwagner
Copy link
Copy Markdown
Member

@glwagner glwagner commented Mar 24, 2026

Summary

  • Introduces a Column(longitude, latitude) type as a new spatial region for Metadata, separate from BoundingBox
  • When region=Column(lon, lat), native_grid returns a column RectilinearGrid with (Flat, Flat, Bounded) topology and location automatically reduces horizontal dimensions to Nothing
  • Renames bounding_box field → region on Metadata struct (accepts Nothing, BoundingBox, or Column)
  • Renames per-dataset location()dataset_location() with a generic location(::Metadata) wrapper that applies region-based dimension reduction
  • Supports Linear() (default) and Nearest() interpolation for column data extraction

Example usage

col = Column(35.1, 50.1)
T_meta = Metadatum(:temperature, dataset=GLORYSMonthly(), region=col)

native_grid(T_meta)  # → RectilinearGrid(Flat, Flat, Bounded)
location(T_meta)     # → (Nothing, Nothing, Center)
T_field = Field(T_meta)  # → column Field{Nothing, Nothing, Center}

Breaking changes

  • Metadata.bounding_box renamed to Metadata.region
  • bounding_box= keyword argument renamed to region= in Metadata() and Metadatum() constructors
  • Per-dataset location() methods renamed to dataset_location()

Closes #138

Test plan

  • Verify existing tests pass after rename
  • Test Column(lon, lat) construction
  • Test native_grid returns RectilinearGrid for Column region
  • Test location returns (Nothing, Nothing, Center) for Column region
  • End-to-end test: ocean station papa with GLORYS column metadata
  • End-to-end test: RICO site with ERA5 column metadata

🤖 Generated with Claude Code

glwagner and others added 2 commits March 24, 2026 13:38
Introduce a `Column` type as a new spatial region for `Metadata`,
separate from `BoundingBox`. When `region=Column(lon, lat)`,
`native_grid` returns a column `RectilinearGrid` and `location`
automatically reduces horizontal dimensions to `Nothing`.

Key changes:
- Add `Column`, `Linear`, `Nearest` types in metadata.jl
- Rename `bounding_box` field to `region` on `Metadata` struct
- Rename per-dataset `location()` methods to `dataset_location()`
- Add generic `location(::Metadata)` with `restrict_location` dispatch
- Refactor `native_grid` to dispatch on region type
- Add `_column_field` path in `Field(metadata)` that loads data via
  intermediate grid and interpolates to column
- Handle `Column` in Copernicus Marine and CDS API download extensions

Closes #138

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change dataset_location signature from (metadata) to (dataset, name).
Add (Center, Center, Center) fallback so only staggered datasets
(ECCO) need to extend it. Remove redundant definitions from all
other dataset modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@glwagner glwagner changed the title Add Column region type for single-column metadata Refactor bounding box to Metadata.region and add Column Mar 24, 2026
@glwagner
Copy link
Copy Markdown
Member Author

cc @ewquon

@glwagner glwagner changed the title Refactor bounding box to Metadata.region and add Column (0.3.0) Refactor bounding box to Metadata.region and add Column Mar 24, 2026
@ewquon
Copy link
Copy Markdown
Collaborator

ewquon commented Mar 24, 2026

Thanks for adding this @glwagner!

glwagner and others added 5 commits March 24, 2026 14:09
Add unit tests for Column type, restrict_location, dataset_location,
location(metadata), native_grid dispatch, region keyword, and
iteration propagation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The JRA55 field_time_series code calls location(fts) on
FieldTimeSeries objects. This binding was lost when the
dataset_location import replaced the old location import.
Add explicit `using Oceananigans: location` to restore it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add PrescribedOcean in src/Oceans/ following the SlabOcean pattern:
  holds prescribed FieldTimeSeries for T, S, u, v; provides full
  EarthSystemModels interface; use with AtmosphereOceanModel (not
  OceanOnlyModel, which is now guarded with an error).

- Add docs/src/Metadata/metadata_tutorial.md covering BoundingBox,
  Column, location(), FieldTimeSeries, and ERA5 variables.

- Add examples/ERA5_single_column_fluxes.jl demonstrating
  PrescribedAtmosphere + PrescribedOcean + AtmosphereOceanModel
  for computing bulk surface fluxes from ERA5 reanalysis data.

- Add test/test_prescribed_ocean.jl with unit tests for construction,
  interface methods, coupling, time stepping, and OceanOnlyModel guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace single_column_os_papa_simulation.jl: now uses ERA5 atmosphere
  + GLORYS ocean with PrescribedAtmosphere + PrescribedOcean +
  AtmosphereOceanModel to compute bulk fluxes at Ocean Station Papa.
  Marked build_always=true.

- Remove separate ERA5_single_column_fluxes.jl (merged into above).

- Fix Column download expansion in extensions:
  - CopernicusMarine: use 1/6° for Linear (GLORYS is 1/12°), nearest
    selection for Nearest interpolation.
  - CDSAPI: use 0.5° for ERA5 (0.25° grid).

- Add coordinates_selection_method dispatch for Column interpolation
  types in CopernicusMarine extension.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cean

- Remove _ prefix from internal functions: construct_native_grid,
  column_field, extract_column!, intermediate_grid_from_file,
  read_longitude, read_latitude, region_suffix, expand_longitude,
  expand_latitude.

- Fix ERA5 dataset_location: return (Center, Center, Nothing) since
  ERA5 is a 2D surface dataset.

- Simplify PrescribedOcean: use Flat-compatible CenterFields,
  store flux fields separately (like SlabOcean), surface accessors
  return full fields. Works with (Flat, Flat, Flat) topology.

- Update example and tests to use 0D grids.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@glwagner glwagner requested a review from ewquon March 24, 2026 21:31
glwagner and others added 2 commits March 24, 2026 15:35
…+GLORYS

- New ERA5PrescribedAtmosphere: analogous to JRA55PrescribedAtmosphere,
  builds a PrescribedAtmosphere from ERA5 reanalysis data using
  generic FieldTimeSeries(metadata). Supports region keyword for
  spatial subsetting.

- Rewrite single_column_os_papa_simulation.jl:
  - Use ERA5PrescribedAtmosphere instead of JRA55PrescribedAtmosphere
  - Initialize ocean with GLORYS (via Column region) instead of ECCO
  - Same single-column ocean_simulation structure, flux outputs, and
    visualization as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ECCO4Monthly artifacts only have January 1993. The ECCO server is
unreliable, so the fallback to NumericalEarthArtifacts is needed.
Limit ECCO4 downloads to 1 date and skip multi-date tests (cycling,
restoring, FTS utilities) for ECCO4 — these are already covered by
ECCO2Monthly, ECCO2Daily, and EN4 which have 3 dates in artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
glwagner and others added 4 commits March 24, 2026 16:37
Adds validate_netcdf() check to test init. Corrupt .nc files (e.g.,
from interrupted downloads) are deleted so download_from_artifacts
can replace them. Fixes GPU CI failures caused by a corrupt cached
RYF.rsds.1990_1991.nc on the self-hosted runner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The example now uses GLORYS via CopernicusMarine which requires
credentials not available in the docs CI environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ocean station papa example now uses GLORYS data which requires
the CopernicusMarine extension. Credentials are already available
in the docs CI workflow secrets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The example uses GLORYS data which requires the CopernicusMarine
extension. The subprocess that executes the example needs to
explicitly load CopernicusMarine to trigger the extension.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@glwagner
Copy link
Copy Markdown
Member Author

Docs build failure: Copernicus Marine credentials

The ocean station papa example now uses GLORYS data via CopernicusMarine, but the COPERNICUS_USERNAME_PASSWORD secret appears to be empty/unset in this repo. This affects both the docs and CI workflows (both reference the same secret), but CI never tested GLORYS downloads so it was never noticed.

Evidence: COPERNICUS_PASSWORD: shows empty in both docs and historical CI runs.

Fix needed: Add the COPERNICUS_USERNAME_PASSWORD secret to the repo settings (Settings → Secrets and variables → Actions).

Use the new COPERNICUSMARINE_SERVICE_USERNAME and
COPERNICUSMARINE_SERVICE_PASSWORD secrets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@glwagner glwagner force-pushed the glw-eq/regional-metadata branch from 112aa20 to a0f2307 Compare March 24, 2026 23:40
The copernicusmarine Python tool reads COPERNICUSMARINE_SERVICE_USERNAME
and COPERNICUSMARINE_SERVICE_PASSWORD. Update workflow env vars and
the Julia extension to use these names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/Oceans/prescribed_ocean.jl Outdated
Co-authored-by: Simone Silvestri <silvestri.simone0@gmail.com>
glwagner and others added 2 commits March 27, 2026 14:48
Add test_era5.jl (48 tests) covering dataset types, date ranges,
variable name mappings, filename construction, and metadata with
Column/BoundingBox regions.

Add test_column_field.jl (16 tests) covering extract_column! with
Nearest interpolation, dispatch routing, and Column native_grid
construction for ECCO4 and ERA5.

Limit ECCO4DarwinMonthly to 1 date in runtests.jl and
test_ecco4_en4.jl to match available NumericalEarthArtifacts,
preventing CI failures when ecco.jpl.nasa.gov is unreliable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove ECCO4Monthly/ECCO4DarwinMonthly date-limiting and conditional
test skipping that was introduced when the ECCO server was down.
CI confirms all ECCO4 downloads succeed again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@glwagner
Copy link
Copy Markdown
Member Author

CI Failure Analysis

1. Documentation Build — initialization_update_state! import error

The docs runner (Julia 1.12) fails at EarthSystemModels.jl:50 because Models.initialization_update_state! doesn't exist in the Oceananigans version used:

ERROR: exported function Models.initialization_update_state! does not exist

This is the only code-related failure — likely a Julia 1.12 stricter-import issue or Oceananigans version mismatch on the docs runner.

2. GPU — Disk space exhaustion (infrastructure)

The self-hosted GPU runner ran out of disk:

No space left on device : '/opt/actions-runner/_diag/Worker_*.log'

Not a code issue.

3. Reactant Extension — Disk space exhaustion (infrastructure)

Same root cause — downloading JRA55 data failed with:

SystemError: flush: No space left on device

Not a code issue.


ECCO Download Status

ECCO downloads are working again. CI logs confirm successful downloads for all ECCO variants (v2 monthly/daily, v4, v2_darwin, v4_darwin) with multiple dates. Reverted the ECCO4 date-limiting workarounds in 70de82e.

Missing Artifacts (TODO for later)

The NumericalEarthArtifacts data-v1 release is missing files needed for full ECCO4 test coverage as a fallback. Files to add when convenient:

ECCO4Monthly ocean data (for runtests.jl 3 dates + test_ecco4_en4.jl 5 dates):

  • THETA_1993_{02,03,04,05}.nc
  • SALT_1993_{02,03,04,05}.nc

ECCO4Monthly atmosphere data (for test_ecco_atmosphere.jl, 1992-01 to 1992-03):

  • EXFewind_1992_{01,02,03}.nc
  • EXFnwind_1992_{01,02,03}.nc
  • EXFatemp_1992_{01,02,03}.nc
  • EXFaqh_1992_{01,02,03}.nc
  • EXFpress_1992_{01,02,03}.nc
  • EXFlwdn_1992_{01,02,03}.nc
  • oceQsw_1992_{01,02,03}.nc
  • EXFpreci_1992_{01,02,03}.nc

~32 files total to upload to NumericalEarth/NumericalEarthArtifacts release data-v1.

glwagner and others added 12 commits April 23, 2026 14:03
The cds_downloading job was missing the free-disk-space cleanup steps
that cpu_tests already has, causing it to run out of disk during
JRA55 downloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove /opt, dotnet, and swift directories in the GPU container
to free additional disk space for data downloads and compilation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation

The branch's location(::Metadata) wrapper calls dataset_location, so
GLORYS needs to override dataset_location rather than location directly.
This ensures free_surface gets (Center, Center, Nothing) and is treated
as a 2D field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@glwagner
Copy link
Copy Markdown
Member Author

@simone-silvestri is it expected that the CPU CI takes so long? Also I'm confused about GLORYS -- isn't this tested on main?

@glwagner
Copy link
Copy Markdown
Member Author

oh goodness beautiful

Added ECCO credentials to the workflow and removed them from the build step.
Comment thread src/NumericalEarth.jl
@glwagner glwagner closed this Apr 26, 2026
@glwagner glwagner reopened this Apr 26, 2026
@glwagner
Copy link
Copy Markdown
Member Author

@xkykai @simone-silvestri the OS Papa tests appear to be failing. Any ideas how to fix them?

@xkykai
Copy link
Copy Markdown
Collaborator

xkykai commented Apr 27, 2026

Seems like HTTP download problem with the server. Probably safe to ignore?

@glwagner glwagner merged commit a5ab400 into main Apr 27, 2026
13 of 18 checks passed
@glwagner glwagner deleted the glw-eq/regional-metadata branch April 27, 2026 03:53
simone-silvestri added a commit that referenced this pull request Apr 27, 2026
Resolution policy:
- metadata.jl/metadata_field.jl: take main's #142 region-based design as the
  base, add the branch's SpatialLayout/StationColumn trait paths on top so
  station datasets keep working through the merge. Phase 2 collapses the
  trait into the Column-region dispatch.
- Dataset modules (ECCO, EN4, ERA5, ETOPO, GLORYS, JRA55, ORCA, WOA): take
  main's `dataset_location()` rename and `metadata_filename(...; region)`
  signature; keep branch's `dataset_url` where the unified download contract
  applies.
- OSPapa folder: deletion wins — all four OSPapa/*.jl files removed; main's
  test_ospapa.jl removed; branch's test_ocean_station_papa.jl + worked
  example in docs/src/developers/ remain.
- Drop main's inline `struct ShiftSouth/AverageNorthSouth` and
  `mangle(_, ::Nothing)` definitions: branch's Datasets.jl provides them.
simone-silvestri added a commit that referenced this pull request Apr 27, 2026
Phase 2 of the metadata-for-users refactor merge. Drops the dedicated
`Datasets` submodule introduced on the branch and relocates its contents
to where main's #142 puts them:

- AbstractDataset, SpatialLayout / GriddedLatLon / StationColumn /
  spatial_layout, and the download contract (dataset_url, authenticate,
  download_file!, download_dataset, preprocess_data) move into
  DataWrangling.jl.
- Unit-conversion tags (Celsius, Kelvin, Millibar, ...), mangle tags
  (ShiftSouth, AverageNorthSouth), and the generic + identity defaults
  for convert_units / mangle / conversion_units move to metadata_field.jl
  (alongside the shipped non-identity methods).
- contract.jl swaps DATASETS_MODULE → CONTRACT_MODULE = DataWrangling so
  default-method classification still works.
- NumericalEarth.jl drops `const Datasets = DataWrangling.Datasets`.

The trait (StationColumn) remains functional for the OceanStationPapa
worked example; Phase 3 will retire it in favour of the `Column` region.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ewquon added a commit to ewquon/NumericalEarth.jl that referenced this pull request Apr 29, 2026
…to eq/era5_pressure_levels

Adopt the bbox→region refactor (NumericalEarth#142): rename `metadata.bounding_box` to
`metadata.region` across ERA5 + CDSAPI extension code, examples, and tests.
Drop the obsolete `location(::ERA5Metadata)` override in favor of the new
generic `dataset_location` dispatch, and the redundant per-Metadatum
`metadata_filename` override (the generic version returns the stored
filename). Adopt `reversed_latitude_axis(::ERA5Dataset) = true` for the new
column-field path's φ-coord flip; ERA5's `retrieve_data` overrides keep
their manual `reverse(data, dims=2)` since they bypass the generic path.
Restore the CDS API environment-setup docstring on `download_dataset(::ERA5Metadatum)`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ewquon ewquon added the breaking change 💔 concerning a change which breaks the API label Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change 💔 concerning a change which breaks the API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simpler bounding box creation

5 participants