Currently Yirgacheffe works with a single band within any raster file. It defaults to picking the first band in a GeoTIFF, but you can select other bands if you wish by providing the band=n argument when you load a raster.
What you can't do is process all the bands within the raster. This would be useful if you wanted to say crop a multi band dataset to a GeoJSON (i.e., working with data like the 128-band Tessera tiles).
I can imagine we want a way to mark a raster as being multi-band at load time, meaning that you apply to it all the operations in parallel. For example, something perhaps like:
import yirgacheffe as yg
with yg.read_raster("my_multiband_tile.tif", bands=yg.ALL) as raster_layer:
with yg.read_shape("my_clipping_data.geojson") as clipping:
clipped = raster_layer * clipping
clipped.to_geotiff("result.tif")
The result here would be a multiband raster with all the bands clipped.
The challenge then comes when a multiband layer is applied to other things, what does it even mean if someone tries to write:
import yirgacheffe as yg
with yg.read_raster("my_multiband_tile.tif", bands=yg.ALL) as raster_layer:
with yg.read_shape("my_clipping_data.geojson") as clipping:
wtf = clipping / raster_layer
wtf.to_geotiff("result.tif")
How can a multiband layer sit on the RHS of an expression like that? Do we just error? Do we pick the first layer? Do we expand clipping internally to become multiband? That last one is very tempting until someone tries to do:
import yirgacheffe as yg
with yg.read_raster("my_multiband_tile.tif", bands=yg.ALL) as raster_layer:
wtf = raster_layer * raster_layer
wtf.to_geotiff("result.tif")
Are we going to generate a result that has the number of bands squared? I feel that is a dangerous thing to make easy to do with the amount of data in a geospatial pipeline.
One thing I do like about this though is that in theory it's another way to start to express parallelism without having to have the ecologist worry about parallelism.
For example, in the classic AOH example we take a set of species ranges as polygon data, and multiply them by habitat and elevation maps. Could we start to use this to do something like:
import yirgacheffe as yg
with yg.read_raster("habitat.tif") as habitat_map:
with yg.read_raster("elevation.tif") as elevation_map:
with yg.read_shape("species.gpkg", bands=yg.ALL) as species:
aohs = habitat_map * elevation_map * species
aohs.to_geotiff("aohs.tif")
That doesn't really work, as the habitat and elevation layers in AOH maps need to be adjusted per species, but I feel there's something in this idea that would let yirgacheffe unlock parallelism without the ecologist needing to think in computer terms.
Currently Yirgacheffe works with a single band within any raster file. It defaults to picking the first band in a GeoTIFF, but you can select other bands if you wish by providing the
band=nargument when you load a raster.What you can't do is process all the bands within the raster. This would be useful if you wanted to say crop a multi band dataset to a GeoJSON (i.e., working with data like the 128-band Tessera tiles).
I can imagine we want a way to mark a raster as being multi-band at load time, meaning that you apply to it all the operations in parallel. For example, something perhaps like:
The result here would be a multiband raster with all the bands clipped.
The challenge then comes when a multiband layer is applied to other things, what does it even mean if someone tries to write:
How can a multiband layer sit on the RHS of an expression like that? Do we just error? Do we pick the first layer? Do we expand clipping internally to become multiband? That last one is very tempting until someone tries to do:
Are we going to generate a result that has the number of bands squared? I feel that is a dangerous thing to make easy to do with the amount of data in a geospatial pipeline.
One thing I do like about this though is that in theory it's another way to start to express parallelism without having to have the ecologist worry about parallelism.
For example, in the classic AOH example we take a set of species ranges as polygon data, and multiply them by habitat and elevation maps. Could we start to use this to do something like:
That doesn't really work, as the habitat and elevation layers in AOH maps need to be adjusted per species, but I feel there's something in this idea that would let yirgacheffe unlock parallelism without the ecologist needing to think in computer terms.