-
Notifications
You must be signed in to change notification settings - Fork 5
Add write_to_netcdf support for OutputVar #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,7 @@ import ..Utils: | |
|
|
||
| export OutputVar, | ||
| read_var, | ||
| write_to_netcdf, | ||
| average_lat, | ||
| weighted_average_lat, | ||
| average_lon, | ||
|
|
@@ -476,6 +477,89 @@ function read_var(paths::Vector{String}; short_name = nothing) | |
| end | ||
| end | ||
|
|
||
| """ | ||
| write_to_netcdf(path::AbstractString, var::OutputVar) | ||
|
|
||
| Write the `OutputVar` to a NetCDF file at `path`, overwriting any existing | ||
| file. The variable name is taken from `short_name(var)`. If not present, an | ||
| error is thrown. The resulting NetCDF file can be read back using | ||
| [`read_var`](@ref). | ||
|
|
||
| Example | ||
| ========= | ||
|
|
||
| ```julia | ||
| var = OutputVar(attribs, dims, dim_attribs, data) | ||
| write_to_netcdf("output.nc", var) | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
|
|
||
| # Read it back | ||
| var_read = read_var("output.nc") | ||
| ``` | ||
| """ | ||
| function write_to_netcdf(path::AbstractString, var::OutputVar) | ||
| # check short_name exists | ||
| var_name = short_name(var) | ||
| isempty(var_name) && | ||
| error("OutputVar must have a short_name to be written to NetCDF") | ||
|
Comment on lines
+501
to
+503
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this, would it be possible to use the empty string? I haven't tested this, so I am not sure.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked into this, and an empty string doesn’t seem like a good choice for a NetCDF variable name. CF naming conventions suggest that variable names begin with a letter and be composed of letters, digits, and underscores (section 2.3: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch02s03.html), so in practice tools expect a non‑empty name. Since variables are accessed by name, an empty short_name could also make the file harder to work with in other tools, so I’d prefer to keep the explicit error and require a meaningful short_name for OutputVars written to NetCDF. |
||
|
|
||
| # check dimensions match data | ||
| var_dim_names = collect(keys(var.dims)) | ||
| ndims(var.data) == length(var.dims) || error( | ||
| "Number of dimensions in data ($(ndims(var.data))) does not match number of dims ($(length(var.dims)))", | ||
| ) | ||
| for (i, (dim_name, dim_val)) in enumerate(var.dims) | ||
| if ndims(dim_val) == 1 | ||
| expected_len = length(dim_val) | ||
| actual_len = size(var.data, i) | ||
| expected_len == actual_len || error( | ||
| "Dimension $dim_name has length $expected_len but data has size $actual_len along dimension $i", | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
| ) | ||
| else | ||
| error( | ||
| "Dimension $dim_name must be 1D to be written to NetCDF (ndims=$(ndims(dim_val)))", | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
| ) | ||
| end | ||
| end | ||
|
|
||
| NCDatasets.NCDataset(path, "c") do nc | ||
| # Define dimensions and coordinate variables | ||
| for (i, (dim_name, dim_val)) in enumerate(var.dims) | ||
| dim_var = NCDatasets.defVar(nc, dim_name, dim_val, (dim_name,)) | ||
| if haskey(var.dim_attributes, dim_name) | ||
| for (k, v) in var.dim_attributes[dim_name] | ||
| dim_var.attrib[k] = _netcdf_safe_attrib(v) | ||
| end | ||
| end | ||
| end | ||
|
|
||
| nc_var = | ||
| NCDatasets.defVar(nc, var_name, eltype(var.data), var_dim_names) | ||
|
|
||
| nc_var[:] = var.data | ||
|
|
||
| # Write attributes with type safety (overwrite existing keys) | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
| for (k, v) in var.attributes | ||
| nc_var.attrib[k] = _netcdf_safe_attrib(v) | ||
| end | ||
| end | ||
| return nothing | ||
| end | ||
|
|
||
| """ | ||
| _netcdf_safe_attrib(v) | ||
|
|
||
| Convert attribute value `v` to a NetCDF-safe type. | ||
| NetCDF attributes must be basic scalar types or strings. | ||
| """ | ||
| function _netcdf_safe_attrib(v) | ||
| v isa AbstractString && return v | ||
| v isa Number && return v | ||
| v isa AbstractArray && eltype(v) <: Union{Number, AbstractString} && | ||
| return v | ||
|
Comment on lines
+555
to
+558
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are different concrete types of v isa AbstractString && string(v)I think you can apply a similar reasoning to the other paths of the function as well.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’ve updated For numeric attributes, I kept them as-is since NCDatasets.jl already supports the usual Julia numeric types ( Int, Float32, Float64), and anything more exotic will fall back to the final |
||
| # For anything else (Unitful, custom types), convert to string | ||
| return string(v) | ||
| end | ||
|
|
||
| """ | ||
| short_name(var::OutputVar) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3613,3 +3613,128 @@ end | |
| @test sprint(show, var) == | ||
| "Attributes:\n long_name => hi\nDimension attributes:\n lat:\n units => deg\nData defined over:\n lat with 0 element" | ||
| end | ||
|
|
||
| @testset "Write to NetCDF" begin | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
| mktempdir() do tmpdir | ||
| nc_path = joinpath(tmpdir, "test_file.nc") | ||
|
|
||
| # Test 1: Round-trip write and read with small arrays | ||
| lon = collect(range(-60.0, 60.0, 8)) | ||
| lat = collect(range(-30.0, 30.0, 6)) | ||
| data = reshape(collect(1:(length(lon) * length(lat))), (length(lon), length(lat))) | ||
| dims = OrderedDict("lon" => lon, "lat" => lat) | ||
| attribs = Dict( | ||
| "short_name" => "test_var", | ||
| "long_name" => "Test Variable", | ||
| "units" => "m", | ||
| ) | ||
| dim_attribs = OrderedDict( | ||
| "lon" => Dict("units" => "deg"), | ||
| "lat" => Dict("units" => "deg"), | ||
| ) | ||
|
|
||
| var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data) | ||
| ClimaAnalysis.write_to_netcdf(nc_path, var) | ||
|
|
||
| var_read = ClimaAnalysis.read_var(nc_path; short_name = "test_var") | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
|
|
||
| @test var_read.attributes["short_name"] == attribs["short_name"] | ||
| @test var_read.attributes["long_name"] == attribs["long_name"] | ||
| @test var_read.attributes["units"] == attribs["units"] | ||
| @test var_read.dims["lon"] == lon | ||
| @test var_read.dims["lat"] == lat | ||
| @test var_read.data ≈ data | ||
| @test var_read.dim_attributes["lon"]["units"] == "deg" | ||
| @test var_read.dim_attributes["lat"]["units"] == "deg" | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
|
|
||
| # Test 2: Overwrite behavior (writing to same path replaces file) | ||
| lon2 = collect(range(0.0, 10.0, 5)) | ||
| data2 = collect(range(0.0, step = 1.0, length = length(lon2))) | ||
| dims2 = OrderedDict("lon" => lon2) | ||
| attribs2 = Dict("short_name" => "new_var", "units" => "K") | ||
| dim_attribs2 = OrderedDict("lon" => Dict("units" => "m")) | ||
| var2 = ClimaAnalysis.OutputVar(attribs2, dims2, dim_attribs2, data2) | ||
|
|
||
| ClimaAnalysis.write_to_netcdf(nc_path, var2) | ||
|
|
||
| var2_read = ClimaAnalysis.read_var(nc_path; short_name = "new_var") | ||
| @test var2_read.attributes["short_name"] == "new_var" | ||
| @test var2_read.data ≈ data2 | ||
| @test var2_read.dims["lon"] == lon2 | ||
|
|
||
| # Test 3: Error when short_name is missing | ||
| attribs_bad = Dict("long_name" => "No short name") | ||
| var_bad = ClimaAnalysis.OutputVar(attribs_bad, dims, dim_attribs, data) | ||
| @test_throws ErrorException ClimaAnalysis.write_to_netcdf( | ||
| nc_path, | ||
| var_bad, | ||
| ) | ||
|
|
||
| # Test 4: Attribute type safety (non-string/number attributes) | ||
| attribs_mixed = Dict( | ||
| "short_name" => "mixed_attrs", | ||
| "units" => "kg", | ||
| "custom_array" => [1, 2, 3], | ||
| ) | ||
| var_mixed = | ||
| ClimaAnalysis.OutputVar(attribs_mixed, dims2, dim_attribs2, data2) | ||
| ClimaAnalysis.write_to_netcdf(nc_path, var_mixed) | ||
| var_mixed_read = ClimaAnalysis.read_var( | ||
| nc_path; | ||
| short_name = "mixed_attrs", | ||
| ) | ||
| @test var_mixed_read.attributes["short_name"] == "mixed_attrs" | ||
| @test var_mixed_read.attributes["custom_array"] == [1, 2, 3] | ||
|
|
||
| # Test 5: Float32 round-trip | ||
| float_path = joinpath(tmpdir, "float_file.nc") | ||
| lonf = Float32[0, 1] | ||
| latf = Float32[10, 20] | ||
| dataf = reshape(Float32[1, 2, 3, 4], (length(lonf), length(latf))) | ||
| dimsf = OrderedDict("lon" => lonf, "lat" => latf) | ||
| attribsf = Dict("short_name" => "f32", "units" => "1") | ||
| dim_attribsf = OrderedDict("lon" => Dict("units" => "deg"), "lat" => Dict()) | ||
|
|
||
| var_f32 = ClimaAnalysis.OutputVar(attribsf, dimsf, dim_attribsf, dataf) | ||
| ClimaAnalysis.write_to_netcdf(float_path, var_f32) | ||
| var_f32_read = ClimaAnalysis.read_var(float_path; short_name = "f32") | ||
|
|
||
| @test eltype(var_f32_read.data) == Float32 | ||
| @test var_f32_read.data == dataf | ||
| @test var_f32_read.dims == dimsf | ||
|
|
||
| # Test 6: NaN round-trip | ||
| nan_path = joinpath(tmpdir, "nan_file.nc") | ||
|
AmanKashyap0807 marked this conversation as resolved.
|
||
| xs = [0.0, 1.0, 2.0] | ||
| data_nan = [1.0, NaN, 3.0] | ||
| dims_nan = OrderedDict("x" => xs) | ||
| attribs_nan = Dict("short_name" => "nan_var") | ||
| dim_attribs_nan = OrderedDict("x" => Dict()) | ||
|
|
||
| var_nan = | ||
| ClimaAnalysis.OutputVar(attribs_nan, dims_nan, dim_attribs_nan, data_nan) | ||
| ClimaAnalysis.write_to_netcdf(nan_path, var_nan) | ||
| var_nan_read = ClimaAnalysis.read_var(nan_path; short_name = "nan_var") | ||
|
|
||
| @test var_nan_read.data[1] == 1.0 | ||
| @test isnan(var_nan_read.data[2]) | ||
| @test var_nan_read.data[3] == 3.0 | ||
| @test var_nan_read.dims == dims_nan | ||
|
|
||
| # # Test 7: Scalar (0D) round-trip | ||
| # scalar_path = joinpath(tmpdir, "scalar_file.nc") | ||
|
Comment on lines
+3724
to
+3725
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this not possible in general with NetCDF files?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Scalar (0D) variables are supported in NetCDF. The CF Conventions explicitly describe “scalar coordinate variables” as variables with no dimensions (see CF Conventions section 5.7: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#scalar-coordinate-variables) NCDatasets.jl can also read and write scalar variables, so the limitation here is just that scalar NetCDF I/O hasn’t been wired up and tested in ClimaAnalysis yet, which is why this test is currently commented out. I’m happy to either enable and fix the scalar round-trip test in this PR or open a follow-up issue specifically for scalar support in write_to_netcdf / read_var, depending on what you prefer. |
||
| # data_scalar = reshape([42.0], ()) | ||
| # dims_scalar = OrderedDict{String, Vector{Float64}}() | ||
| # attribs_scalar = Dict("short_name" => "scalar") | ||
| # dim_attribs_scalar = OrderedDict{String, Dict{Any, Any}}() | ||
|
|
||
| # var_scalar = | ||
| # ClimaAnalysis.OutputVar(attribs_scalar, dims_scalar, dim_attribs_scalar, data_scalar) | ||
| # ClimaAnalysis.write_to_netcdf(scalar_path, var_scalar) | ||
| # var_scalar_read = ClimaAnalysis.read_var(scalar_path; short_name = "scalar") | ||
|
|
||
| # @test length(var_scalar_read.dims) == 0 | ||
| # @test ndims(var_scalar_read.data) == 0 | ||
| # @test var_scalar_read.data[] == 42.0 | ||
| end | ||
| end | ||
Uh oh!
There was an error while loading. Please reload this page.