Skip to content

Support multiband GeoTIFFs such that all bands are processed #63

@mdales

Description

@mdales

Currently Yirgacheffe works with a single band within any raster file. It defaults to picking the first band in a GeoTIFF, but you can select other bands if you wish by providing the band=n argument when you load a raster.

What you can't do is process all the bands within the raster. This would be useful if you wanted to say crop a multi band dataset to a GeoJSON (i.e., working with data like the 128-band Tessera tiles).

I can imagine we want a way to mark a raster as being multi-band at load time, meaning that you apply to it all the operations in parallel. For example, something perhaps like:

import yirgacheffe as yg

with yg.read_raster("my_multiband_tile.tif", bands=yg.ALL) as raster_layer:
  with yg.read_shape("my_clipping_data.geojson") as clipping:
    clipped = raster_layer * clipping
    clipped.to_geotiff("result.tif")

The result here would be a multiband raster with all the bands clipped.

The challenge then comes when a multiband layer is applied to other things, what does it even mean if someone tries to write:

import yirgacheffe as yg

with yg.read_raster("my_multiband_tile.tif", bands=yg.ALL) as raster_layer:
  with yg.read_shape("my_clipping_data.geojson") as clipping:
    wtf = clipping / raster_layer
    wtf.to_geotiff("result.tif")

How can a multiband layer sit on the RHS of an expression like that? Do we just error? Do we pick the first layer? Do we expand clipping internally to become multiband? That last one is very tempting until someone tries to do:

import yirgacheffe as yg

with yg.read_raster("my_multiband_tile.tif", bands=yg.ALL) as raster_layer:
  wtf = raster_layer * raster_layer
  wtf.to_geotiff("result.tif")

Are we going to generate a result that has the number of bands squared? I feel that is a dangerous thing to make easy to do with the amount of data in a geospatial pipeline.


One thing I do like about this though is that in theory it's another way to start to express parallelism without having to have the ecologist worry about parallelism.

For example, in the classic AOH example we take a set of species ranges as polygon data, and multiply them by habitat and elevation maps. Could we start to use this to do something like:

import yirgacheffe as yg

with yg.read_raster("habitat.tif") as habitat_map:
  with yg.read_raster("elevation.tif") as elevation_map:
    with yg.read_shape("species.gpkg", bands=yg.ALL) as species:
      aohs = habitat_map * elevation_map * species
      aohs.to_geotiff("aohs.tif") 

That doesn't really work, as the habitat and elevation layers in AOH maps need to be adjusted per species, but I feel there's something in this idea that would let yirgacheffe unlock parallelism without the ecologist needing to think in computer terms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions