Skip to content

003 feature nn sampler#5

Open
b-barton wants to merge 16 commits into
mainfrom
003-feature-nn-sampler
Open

003 feature nn sampler#5
b-barton wants to merge 16 commits into
mainfrom
003-feature-nn-sampler

Conversation

@b-barton
Copy link
Copy Markdown
Collaborator

@b-barton b-barton commented May 6, 2026

Fixes issue #3 . I've started off 1-feature-initial-oceanosse-workflow-skeleton so should be merged after. This branch bring in a sampler that simply finds the nearest neighbour on the model grid to a profile location. There are test to accompany it. At the moment it expects the model dataset to have lat and lon variables, i and j coordinates. It expects the profile dataset to have profile_id as a coordinate.

@b-barton b-barton requested a review from oj-tooth May 6, 2026 10:32
@b-barton b-barton self-assigned this May 6, 2026
@b-barton b-barton linked an issue May 6, 2026 that may be closed by this pull request
@b-barton
Copy link
Copy Markdown
Collaborator Author

b-barton commented May 6, 2026

A profile which is tied for finding the nearest neighbour gives inconsistent test results.

@oj-tooth
Copy link
Copy Markdown
Contributor

Great work getting started on the sampler here @b-barton.

For the nearest neighbour sampling, there is now an xarray-native method to handle geographical indexing on a curvilinear grid, copying over the relevant method from NEMODataTree.add_geoindex() below, however this can apply to any ocean model grid where latitude and longitude are known.

Here, the NDPointIndex allows us to define a custom index for nearest sel() selection and SklearnGeoBallTreeAdapter uses sklearn.BallTree with a Haversine distance metric.

import xarray as xr
from xarray.indexes import NDPointIndex
from xoak import SklearnGeoBallTreeAdapter

  def add_geoindex(
      self,
      grid: str,
  ) -> Self:
      """
      Add geographical index variables to a given NEMO model grid.

      This enables users to index grid variables using geographical
      coordinates (e.g., glamt, gphit) in addition to their (i, j, k)
      dimensions.

      Parameters
      ----------
      grid : str
          Path to NEMO model grid to add geographical indexes (e.g., 'gridT').

      Returns
      -------
      NEMODataTree
          NEMO DataTree with geographical indexes added to specified model grid.

      Examples
      --------
      Add glamt, gphit as geographical indexes to the T-grid of the NEMO parent domain:

      >>> nemo.add_geoindex(grid="gridT")

      """
      # -- Set geographical indexes -- #
      _, dom_prefix, _, grid_suffix = self._get_properties(grid=grid, infer_dom=True)
      lon_name = f"{dom_prefix}glam{grid_suffix}"
      lat_name = f"{dom_prefix}gphi{grid_suffix}"
      self_copy = self.copy()
      self_copy[grid] = (
          self_copy[grid]
          .dataset.assign_coords(
              {lat_name: self_copy[grid][lat_name], lon_name: self_copy[grid][lon_name]}
          )
          .set_xindex(
              (lat_name, lon_name),
              NDPointIndex,
              tree_adapter_cls=SklearnGeoBallTreeAdapter,
          )
      )

      return self_copy

Usage, in the above context would be:

nemo_geo = nemo.add_geoindex(grid="gridT")

nemo_geo["gridT"].dataset.sel(gphit=60, glamt=-30, method="nearest")

But the important bit is that we would simply be able to use .sel() on the user-specified longitude and latitude directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Create simple nearest neighbour sampler

2 participants