Summary
Add a MetadataSet — a typed bag of Metadata sharing a dataset, dates, region, and dir, differing only in variable name. Collapses the "many Metadata constructions with identical kwargs" pattern that recurs across examples and internal constructors, and enables a uniform set!(model, mset) that auto-routes verbose dataset names to short model field names via a global alias map.
Co-introduces a generic download verb to replace download_dataset, so the MetadataSet case can dispatch onto existing batched download backends.
Discussion origin: #233 (comment)
Motivation
The friction is concrete. From PR #233's examples/era5_breeze.jl:
meta_common = (region = era5_region, dir = era5_datadir)
set!(u, Metadatum(:eastward_velocity; dataset=ds_pl, date=start_date, meta_common...))
set!(v, Metadatum(:northward_velocity; dataset=ds_pl, date=start_date, meta_common...))
set!(T, Metadatum(:temperature; dataset=ds_pl, date=start_date, meta_common...))
set!(qᵛ, Metadatum(:specific_humidity; dataset=ds_pl, date=start_date, meta_common...))
set!(qᶜ, Metadatum(:specific_cloud_liquid_water_content; dataset=ds_pl, date=start_date, meta_common...))
set!(qⁱ, Metadatum(:specific_cloud_ice_water_content; dataset=ds_pl, date=start_date, meta_common...))
becomes
mset = MetadataSet(:eastward_velocity, :northward_velocity, :temperature,
:specific_humidity, :specific_cloud_liquid_water_content,
:specific_cloud_ice_water_content;
dataset=ds_pl, date=start_date, region=era5_region, dir=era5_datadir)
set!(atmos.model, mset)
The download path earlier in the same file already speaks this language — download_dataset(pl_vars, ds_pl, dates; meta_common...) batches a vector of variable names into one CDS request (ext/NumericalEarthCDSAPIExt.jl:280-336). MetadataSet makes the rest of the workflow symmetric.
Adoption survey
Same multi-variable-shared-kwargs pattern recurs in:
src/DataWrangling/ECCO/ECCO_atmosphere.jl:37-42 — 6 Metadata, varying only name (ECCOPrescribedAtmosphere)
src/DataWrangling/JRA55/JRA55_prescribed_atmosphere.jl:36-42 — 7 JRA55FieldTimeSeries calls (JRA55PrescribedAtmosphere)
examples/ERA5_hourly_data.jl:286-289, :344-347 — 2 + 4 Metadata constructions
examples/one_degree_simulation.jl:86-93, global_climate_simulation.jl:64-72, arctic_simulation.jl:55-86, meridional_heat_transport_ecco.jl:36-43 — coupled ocean+ice splits, 4 Metadatum each
examples/inspect_woa_temperature_salinity.jl:20-24, single_column_os_papa_simulation.jl:58-59, near_global_ocean_simulation.jl:86-87, mediterranean_simulation_with_ecco_restoring.jl:95-96, generate_surface_fluxes.jl:56-62 — ocean T/S pairs
test/runtests.jl:108-122, test_ocean_sea_ice_model.jl:41-50 — test setup loops
Totals: 15 example/test bundle sites, 2 flagship internal PrescribedAtmosphere constructors, ~30 download_dataset call sites touched by the rename, 3 docs files to update (docs/src/Metadata/metadata_overview.md, docs/src/Metadata/supported_variables.md, docs/src/index.md).
Type
struct MetadataSet{V, D, R, N, F}
names :: N # NTuple{K, Symbol} — verbose dataset variable names
dataset :: V # shared
dates :: D # shared; scalar or AbstractVector
region :: R # shared
dir :: String # shared
filenames :: F # per-variable, auto-built; overridable
end
const MetadatumSet{V} = MetadataSet{V, <:Union{AnyDateTime, Nothing}} where V
Mirrors Metadata field-for-field, with name → names and filename → filenames.
Constructor
MetadataSet(:var1, :var2, ...; dataset, date [or dates], region, dir)
Positional varargs of verbose dataset variable names. Keyword arguments match Metadata/Metadatum: date for a scalar (yields a MetadatumSet), dates for a vector. region, dir, filenames optional with the same defaults as Metadata.
Access
The variable axis is exposed via property and indexed access:
mset.eastward_velocity # → Metadata/Metadatum for this variable
mset[:eastward_velocity] # equivalent indexed form
mset[1] # indexed by position in `names`
mset.dataset # struct field still accessible (getproperty fallthrough)
mset.region
keys(mset) # → (:eastward_velocity, :northward_velocity, ...)
length(mset) # number of variables
for m in mset ... end # iterates variable axis, yielding Metadata per variable
NamedTuple(mset) # (; eastward_velocity = mset.eastward_velocity, ...)
metadata_path(mset) # NamedTuple of paths keyed by variable name
Property access via getproperty(mset, name): dispatches to struct field if name ∈ fieldnames(MetadataSet), otherwise looks up the variable. Variables named after struct fields (e.g. a hypothetical :dataset variable) remain accessible via mset[:dataset].
Field / FieldTimeSeries
Field(mset, arch=CPU(); kw...) # → NamedTuple{names}(Field, ...)
FieldTimeSeries(mset, arch=CPU(); kw...) # → NamedTuple{names}(FTS, ...)
NamedTuple is keyed by verbose dataset names — fts.eastward_velocity, fts.specific_humidity, etc. PrescribedAtmosphere constructors do the short rename in one explicit block at their existing call site (their public API stays unchanged).
set!(model, mset) — auto-generating via global alias map
A single method, leaning on the existing set!(model; kw...) kwarg interface:
function Oceananigans.set!(model, mset::MetadataSet)
kwargs = (variable_aliases[n] => mset[n]
for n in mset.names if haskey(variable_aliases, n))
set!(model; kwargs...)
end
Variables not in variable_aliases silently fall through — mirrors current kwarg-set! behavior, and enables the ocean+ice split idiom on a single 4-variable set:
mset = MetadataSet(:temperature, :salinity,
:sea_ice_thickness, :sea_ice_concentration;
date, dataset)
set!(ocean.model, mset) # picks up :temperature, :salinity
set!(sea_ice.model, mset) # picks up :sea_ice_thickness, :sea_ice_concentration
Also:
set!(fields::NamedTuple, mset::MetadataSet) # element-wise; NT keys = verbose names
Global alias map (src/DataWrangling/DataWrangling.jl)
Top-level variable_aliases :: Dict{Symbol,Symbol}, every value traceable to a row in docs/src/appendix/notation.md (or restoring.jl:33-47 for biogeochemistry). Synonyms (e.g. :u_velocity / :eastward_velocity / :eastward_wind all → :u) are retained — they serve as domain disambiguators across dataset modules.
const variable_aliases = Dict{Symbol, Symbol}(
# Ocean & atmosphere state (notation.md existing rows)
:temperature => :T,
:air_temperature => :T,
:salinity => :S,
:u_velocity => :u,
:v_velocity => :v,
:eastward_velocity => :u,
:northward_velocity => :v,
:eastward_wind => :u,
:northward_wind => :v,
:sea_level_pressure => :p,
# Atmosphere moisture / microphysics (Breeze notation.md rows)
:specific_humidity => :qᵛ,
:air_specific_humidity => :qᵛ,
:specific_cloud_liquid_water_content => :qᶜˡ,
:specific_cloud_ice_water_content => :qᶜⁱ,
:specific_rain_water_content => :qʳ,
# Sea ice (notation.md `ℵ` row; `:h` matches ClimaSeaIce field name)
:sea_ice_thickness => :h,
:sea_ice_concentration => :ℵ,
# Freshwater fluxes (NEW notation.md rows)
:rain_freshwater_flux => :Jʳᵃ,
:snow_freshwater_flux => :Jˢⁿ,
# Biogeochemistry (already in restoring.jl:33-47)
:dissolved_inorganic_carbon => :DIC,
:alkalinity => :ALK,
:nitrate => :NO₃,
:phosphate => :PO₄,
:dissolved_organic_phosphorus => :DOP,
:particulate_organic_phosphorus => :POP,
:dissolved_iron => :Fe,
:dissolved_silicate => :SiO₂,
:dissolved_oxygen => :O₂,
)
25 entries. Variables with no entry (e.g. :vorticity, :geopotential, :significant_wave_height, :mesh_mask) are still fully fetchable via download(mset) and accessible via mset.<name>; they simply don't take part in the auto-set! path until a real adoption site needs them.
Notation.md additions
Two new rows in a new "Net surface freshwater fluxes" subsection between "Net ocean fluxes" and "Thermodynamic properties":
| ``J^{\mathrm{ra}}`` | `Jʳᵃ` | rain freshwater flux | Rain mass flux at the surface (kg m⁻² s⁻¹) |
| ``J^{\mathrm{sn}}`` | `Jˢⁿ` | snow freshwater flux | Snow mass flux at the surface (kg m⁻² s⁻¹) |
No other notation additions in this PR — speculative rows for vorticity, geopotential, waves, PV, trace gases, and net radiation aliases are deferred until an adoption site needs them.
Generic download (supersedes download_dataset)
Bundled into Stage 1. The argument is metadata, not a dataset, so the verb-on-object form reads better — and a single generic gives MetadataSet a hook for backend-specific aggregation:
download(::Metadatum) # current per-file behavior
download(::Metadata) # current Metadata behavior (date axis)
download(::MetadataSet) # default: per-element loop
download(::MetadataSet{<:ERA5PressureLevelsDataset}) # batched CDS path
download(::AbstractVector{<:Metadata}) # generic many-metadata
Migration
- New
download generic introduced in src/DataWrangling/DataWrangling.jl.
- All per-backend methods renamed
download_dataset → download and import lines updated. Files touched:
src/DataWrangling/DataWrangling.jl:276 (fallback)
src/DataWrangling/ECCO/ECCO.jl:308
src/DataWrangling/JRA55/JRA55_metadata.jl:192
src/DataWrangling/EN4/EN4.jl:207
src/DataWrangling/IBCAO/IBCAO.jl:80, ETOPO/ETOPO.jl:48, IBCSO/IBCSO.jl:75, GEBCO/GEBCO.jl:69, ORCA/ORCA.jl:110
src/DataWrangling/OSPapa/OSPapa_flux_observations.jl:97, OSPapa_ocean_observations.jl:69
ext/NumericalEarthCDSAPIExt.jl:155, 174, 280, 294, 344, 362, 382
ext/NumericalEarthCopernicusMarineExt.jl:16, 24
ext/NumericalEarthWOAExt.jl:38
download_dataset kept as Base.@deprecate alias for one minor release.
download(::MetadataSet{<:ERA5PressureLevelsDataset}) routes onto the existing batched path at ext/NumericalEarthCDSAPIExt.jl:280-336, preserving the per-day multi-variable CDS bundling.
Stages
| Stage |
Scope |
Confidence |
| 1 |
MetadataSet struct + indexing/iteration/getproperty; MetadatumSet alias; variable_aliases dict; set!(model, ::MetadataSet); set!(::NamedTuple, ::MetadataSet); Field(::MetadataSet), FieldTimeSeries(::MetadataSet); download rename + @deprecate; ERA5 batched specialization; notation.md additions; unit tests |
high |
| 2 |
Refactor ECCOPrescribedAtmosphere and JRA55PrescribedAtmosphere to construct one MetadataSet internally and consume fts.<verbose_name> for the short-name rename block. External API preserved. |
high |
| 3 |
Adopt MetadataSet across the 15 surveyed example/test bundle sites. Update docs/src/Metadata/metadata_overview.md (new section), supported_variables.md (bundle reference), docs/src/index.md (Quick Start snippet). |
high |
Deferred (explicitly out of scope)
- Variable-name unification across modules. Synonyms like
:u_velocity / :eastward_velocity / :eastward_wind stay; they serve as domain disambiguators (notably for ECCO4Monthly, which exposes both ocean and atmosphere fields under one dataset struct). Revisit only if/when those datasets are split.
- Speculative
variable_aliases entries for :vorticity, :geopotential, :geopotential_height, :potential_vorticity, :ozone_mass_mixing_ratio, :total_cloud_cover, :fraction_of_cloud_cover, :significant_wave_height, :mean_wave_period, :mean_wave_direction, :eastward_stokes_drift, :northward_stokes_drift, :free_surface, :depth, :bottom_height, :mesh_mask, :river_freshwater_flux, :iceberg_freshwater_flux, :evaporation_minus_precipitation, :net_* radiation aliases, :net_heat_flux, :sea_ice_u_velocity, :sea_ice_v_velocity. These remain fetchable via download(mset) and mset.<name> — they simply don't auto-rename in set!(model, mset) until an adoption site requires it.
:sea_ice_area_fraction as a key — no dataset module uses this name; CF standard name only.
Tasks
Summary
Add a
MetadataSet— a typed bag ofMetadatasharing a dataset, dates, region, anddir, differing only in variable name. Collapses the "manyMetadataconstructions with identical kwargs" pattern that recurs across examples and internal constructors, and enables a uniformset!(model, mset)that auto-routes verbose dataset names to short model field names via a global alias map.Co-introduces a generic
downloadverb to replacedownload_dataset, so theMetadataSetcase can dispatch onto existing batched download backends.Discussion origin: #233 (comment)
Motivation
The friction is concrete. From PR #233's
examples/era5_breeze.jl:becomes
The download path earlier in the same file already speaks this language —
download_dataset(pl_vars, ds_pl, dates; meta_common...)batches a vector of variable names into one CDS request (ext/NumericalEarthCDSAPIExt.jl:280-336).MetadataSetmakes the rest of the workflow symmetric.Adoption survey
Same multi-variable-shared-kwargs pattern recurs in:
src/DataWrangling/ECCO/ECCO_atmosphere.jl:37-42— 6Metadata, varying only name (ECCOPrescribedAtmosphere)src/DataWrangling/JRA55/JRA55_prescribed_atmosphere.jl:36-42— 7JRA55FieldTimeSeriescalls (JRA55PrescribedAtmosphere)examples/ERA5_hourly_data.jl:286-289,:344-347— 2 + 4Metadataconstructionsexamples/one_degree_simulation.jl:86-93,global_climate_simulation.jl:64-72,arctic_simulation.jl:55-86,meridional_heat_transport_ecco.jl:36-43— coupled ocean+ice splits, 4Metadatumeachexamples/inspect_woa_temperature_salinity.jl:20-24,single_column_os_papa_simulation.jl:58-59,near_global_ocean_simulation.jl:86-87,mediterranean_simulation_with_ecco_restoring.jl:95-96,generate_surface_fluxes.jl:56-62— ocean T/S pairstest/runtests.jl:108-122,test_ocean_sea_ice_model.jl:41-50— test setup loopsTotals: 15 example/test bundle sites, 2 flagship internal PrescribedAtmosphere constructors, ~30
download_datasetcall sites touched by the rename, 3 docs files to update (docs/src/Metadata/metadata_overview.md,docs/src/Metadata/supported_variables.md,docs/src/index.md).Type
Mirrors
Metadatafield-for-field, withname → namesandfilename → filenames.Constructor
Positional varargs of verbose dataset variable names. Keyword arguments match
Metadata/Metadatum:datefor a scalar (yields aMetadatumSet),datesfor a vector.region,dir,filenamesoptional with the same defaults asMetadata.Access
The variable axis is exposed via property and indexed access:
Property access via
getproperty(mset, name): dispatches to struct field ifname ∈ fieldnames(MetadataSet), otherwise looks up the variable. Variables named after struct fields (e.g. a hypothetical:datasetvariable) remain accessible viamset[:dataset].Field/FieldTimeSeriesNamedTuple is keyed by verbose dataset names —
fts.eastward_velocity,fts.specific_humidity, etc. PrescribedAtmosphere constructors do the short rename in one explicit block at their existing call site (their public API stays unchanged).set!(model, mset)— auto-generating via global alias mapA single method, leaning on the existing
set!(model; kw...)kwarg interface:Variables not in
variable_aliasessilently fall through — mirrors current kwarg-set!behavior, and enables the ocean+ice split idiom on a single 4-variable set:Also:
Global alias map (
src/DataWrangling/DataWrangling.jl)Top-level
variable_aliases :: Dict{Symbol,Symbol}, every value traceable to a row indocs/src/appendix/notation.md(orrestoring.jl:33-47for biogeochemistry). Synonyms (e.g.:u_velocity/:eastward_velocity/:eastward_windall →:u) are retained — they serve as domain disambiguators across dataset modules.25 entries. Variables with no entry (e.g.
:vorticity,:geopotential,:significant_wave_height,:mesh_mask) are still fully fetchable viadownload(mset)and accessible viamset.<name>; they simply don't take part in the auto-set!path until a real adoption site needs them.Notation.md additions
Two new rows in a new "Net surface freshwater fluxes" subsection between "Net ocean fluxes" and "Thermodynamic properties":
No other notation additions in this PR — speculative rows for vorticity, geopotential, waves, PV, trace gases, and net radiation aliases are deferred until an adoption site needs them.
Generic
download(supersedesdownload_dataset)Bundled into Stage 1. The argument is metadata, not a dataset, so the verb-on-object form reads better — and a single generic gives
MetadataSeta hook for backend-specific aggregation:Migration
downloadgeneric introduced insrc/DataWrangling/DataWrangling.jl.download_dataset→downloadand import lines updated. Files touched:src/DataWrangling/DataWrangling.jl:276(fallback)src/DataWrangling/ECCO/ECCO.jl:308src/DataWrangling/JRA55/JRA55_metadata.jl:192src/DataWrangling/EN4/EN4.jl:207src/DataWrangling/IBCAO/IBCAO.jl:80,ETOPO/ETOPO.jl:48,IBCSO/IBCSO.jl:75,GEBCO/GEBCO.jl:69,ORCA/ORCA.jl:110src/DataWrangling/OSPapa/OSPapa_flux_observations.jl:97,OSPapa_ocean_observations.jl:69ext/NumericalEarthCDSAPIExt.jl:155, 174, 280, 294, 344, 362, 382ext/NumericalEarthCopernicusMarineExt.jl:16, 24ext/NumericalEarthWOAExt.jl:38download_datasetkept asBase.@deprecatealias for one minor release.download(::MetadataSet{<:ERA5PressureLevelsDataset})routes onto the existing batched path atext/NumericalEarthCDSAPIExt.jl:280-336, preserving the per-day multi-variable CDS bundling.Stages
MetadataSetstruct + indexing/iteration/getproperty;MetadatumSetalias;variable_aliasesdict;set!(model, ::MetadataSet);set!(::NamedTuple, ::MetadataSet);Field(::MetadataSet),FieldTimeSeries(::MetadataSet);downloadrename +@deprecate; ERA5 batched specialization; notation.md additions; unit testsECCOPrescribedAtmosphereandJRA55PrescribedAtmosphereto construct oneMetadataSetinternally and consumefts.<verbose_name>for the short-name rename block. External API preserved.docs/src/Metadata/metadata_overview.md(new section),supported_variables.md(bundle reference),docs/src/index.md(Quick Start snippet).Deferred (explicitly out of scope)
:u_velocity/:eastward_velocity/:eastward_windstay; they serve as domain disambiguators (notably forECCO4Monthly, which exposes both ocean and atmosphere fields under one dataset struct). Revisit only if/when those datasets are split.variable_aliasesentries for:vorticity,:geopotential,:geopotential_height,:potential_vorticity,:ozone_mass_mixing_ratio,:total_cloud_cover,:fraction_of_cloud_cover,:significant_wave_height,:mean_wave_period,:mean_wave_direction,:eastward_stokes_drift,:northward_stokes_drift,:free_surface,:depth,:bottom_height,:mesh_mask,:river_freshwater_flux,:iceberg_freshwater_flux,:evaporation_minus_precipitation,:net_*radiation aliases,:net_heat_flux,:sea_ice_u_velocity,:sea_ice_v_velocity. These remain fetchable viadownload(mset)andmset.<name>— they simply don't auto-rename inset!(model, mset)until an adoption site requires it.:sea_ice_area_fractionas a key — no dataset module uses this name; CF standard name only.Tasks
MetadataSetcore + testsvariable_aliasesdict at top ofDataWrangling.jlset!(model, ::MetadataSet)+set!(::NamedTuple, ::MetadataSet)+ testsField(::MetadataSet),FieldTimeSeries(::MetadataSet)returning NamedTuplesdownload; rename backends; deprecatedownload_datasetdownload(::MetadataSet)default + ERA5 specializationECCOPrescribedAtmosphereJRA55PrescribedAtmospheremetadata_overview.mdnew section,supported_variables.mdbundle reference,docs/src/index.mdQuick Start)