Skip to content

Add Merge & Sparse Tiles#5

Merged
acalcutt merged 263 commits intomasterfrom
merge
Mar 20, 2026
Merged

Add Merge & Sparse Tiles#5
acalcutt merged 263 commits intomasterfrom
merge

Conversation

@acalcutt
Copy link
Copy Markdown
Collaborator

@acalcutt acalcutt commented Mar 19, 2026

Summary

This PR adds MBTiles/raster merge functionality and sparse tile support to rio-rgbify, along with a significant refactor of the core processing pipeline, new output format options, an improved CLI command structure, a proper MBTiles database abstraction, a new image encoding module, and a full GitHub Actions CI suite.


New Features

merge Command — Merge Multiple Terrain Sources into One MBTiles

A new merge CLI command reads a JSON configuration file and blends multiple Terrain-RGB MBTiles or raster DEM sources into a single output MBTiles file.

rio merge --config merge_example.json -j 4

Key capabilities:

  • Priority-ordered blending — sources are listed highest-to-lowest priority; where a higher-priority source has valid data it wins, lower-priority sources fill the gaps
  • Per-source masking — each source can specify mask_values (e.g. [0, -1, -10000]) so ocean-only tiles don't stomp land tiles
  • Per-source height adjustment — apply a fixed vertical offset (meters) to each source before blending
  • MBTiles sources (TerrainRGBMerger) — merges encoded Terrain-RGB tiles from existing MBTiles files; supports both mapbox and terrarium encoding
  • Raster sources (RasterRGBMerger) — merges raw DEM GeoTIFFs directly into the output MBTiles with automatic reprojection
  • Overzoom support — tiles not natively present at a zoom level are upsampled from lower-zoom parent tiles with the chosen resampling method
  • Configurable resamplingnearest, bilinear, cubic, cubic_spline, lanczos, average, mode, gauss
  • Output encoding — choose mapbox or terrarium for the output tiles
  • Output formatpng or webp
  • Bounding box filteringbounds: [west, south, east, north] restricts generated tiles to a geographic region
  • output_nodata — fill pixels with no valid source data with a fixed elevation value instead of leaving them NaN

Example merge_example.json:

{
  "sources": [
    { "path": "/path/to/bathymetry.mbtiles", "encoding": "mapbox", "height_adjustment": -5.0 },
    { "path": "/path/to/base_terrain.mbtiles", "encoding": "mapbox", "mask_values": [-1, 0] },
    { "path": "/path/to/secondary_terrain.mbtiles", "encoding": "terrarium" }
  ],
  "output_path": "/path/to/output.mbtiles",
  "output_encoding": "mapbox",
  "output_format": "webp",
  "resampling": "bilinear",
  "min_zoom": 2,
  "max_zoom": 10,
  "bounds": [-10, 10, 20, 50]
}

Sparse Tiles Mode

Both merge and rgbify support "sparse_tiles": true. When enabled:

  • Tiles where all sources produce only NaN/masked values at the tile's native zoom level are skipped entirely — they are not written to the output MBTiles
  • Overzoomed tiles (interpolated from a lower-zoom parent) are also skipped
  • This significantly reduces output file size for datasets with limited geographic coverage (e.g. land-only or ocean-only sources)

Refactored & Improved Modules

mbtiler.py — Complete Rewrite

  • Replaced global-state _tile_worker (using riomucho) with a standalone process_tile function compatible with Python's multiprocessing spawn context (required on macOS/Windows)
  • Uses multiprocessing.get_context("spawn") for reliability across platforms
  • Batch commit model — tiles are committed to SQLite in configurable batches (--batch-size) to avoid excess memory use on large datasets
  • Retry logic on SQLite OperationalError (database locked) with exponential backoff
  • Gaussian blur smoothing (--gaussian-blur-sigma) applied post-reprojection to reduce artefacts at tile seams; sigma scales automatically with zoom level
  • --resampling option replaces the hardcoded bilinear
  • --batch-size option for explicit control over memory/throughput trade-off
  • Removed dependency on riomucho and rasterio.rio.options.creation_options

scripts/cli.py — Multi-Command Group

  • Restructured from a single @click.command to a @click.group (main_group) with two subcommands: rgbify and merge
  • rgbify gains --batch-size and --resampling options
  • merge reads all configuration from a JSON file (--config)
  • Added --verbose / -v logging flag to both commands

image.py — New Image Encoding Module (replaces encoders.py)

  • ImageEncoder.data_to_rgb() — encodes float elevation arrays to mapbox or terrarium RGB
  • ImageEncoder.save_rgb_to_bytes() — saves to PNG or WebP bytes via Pillow
  • ImageFormat enum (PNG, WEBP)
  • Centralised, testable encoding logic decoupled from I/O

database.py — New MBTiles Database Abstraction

  • MBTilesDatabase context manager wraps SQLite MBTiles creation
  • Creates the standard tiles_shallow / tiles_data / tiles view schema (de-duplicated tile data storage)
  • insert_tile_with_retry() with exponential backoff for concurrent write safety
  • add_metadata() for writing MBTiles metadata rows
  • Handles INSERT OR REPLACE to safely re-run without UNIQUE constraint violations

Testing

New Test Suite (test/test_merger.py — 32+ tests across 6 classes)

Class Coverage
TestTerrainRGBMergerMergeTiles Unit tests for merge priority, NaN fill, sparse skipping, output nodata
TestTerrainRGBMergerIntegration End-to-end MBTiles-to-MBTiles merge, height adjustment, bounds filtering
TestRasterRGBMergerExtractTile Raster source tile extraction and CRS handling
TestRasterRGBMergerGetMaxZoom Auto zoom-level detection from raster resolution
TestRasterRGBMergerIntegration GeoTIFF → MBTiles pipeline
TestMergeCLI CLI merge command: basic, sparse, height adjustment, missing config, raster type, terrarium encoding
TestLiveMerge Real-world fixture tests using pre-committed GEBCO and JAXA tiles (z=0–z=2); skipped automatically in CI where fixtures are absent

Live Test Fixtures (test/fixtures/)

  • gebco_sample.mbtiles — 21 WebP tiles (z=0–2), GEBCO 2024 global bathymetry, 512×512
  • jaxa_sample.mbtiles — 21 WebP tiles (z=0–2), JAXA AW3D30 2024 land elevation, 512×512
  • test/download_fixtures.py — re-download script (handles server-side gzip decompression)

Reference Output Comparison (test/expected/)

  • z0_x0_y0.png, z2_x2_y1.png, z2_x0_y2.png — lossless PNG reference tiles from a known-good merge run
  • test/generate_expected_tiles.py — regeneration script; run when merge behaviour is intentionally changed
  • TestLiveMerge.test_output_matches_expected_tiles — decodes both reference and live output to elevation arrays and asserts np.allclose(atol=1.0 m)

Updated Existing Tests

  • test/test_encoders.py — roundtrip tolerance relaxed to atol=0.5 (float precision); covers mapbox and terrarium
  • test/test_mbtiler.py — format-validation test skipped (validation moved to click.Choice)
  • test/test_cli.py — fully updated for new multi-command group; test_merge_command uses real MBTiles databases

GitHub Actions CI (.github/workflows/tests.yml) — New

  • Matrix: Python 3.10, 3.11, 3.12 on ubuntu-latest
  • Installs rasterio and scipy via conda-forge (avoids GDAL compilation)
  • Uploads coverage to Codecov on Python 3.11

Dependencies

Change Detail
Added scipy Gaussian blur via scipy.ndimage.gaussian_filter
Added Pillow PNG/WebP tile encoding in image.py
Added psutil Per-process CPU affinity logging in mbtiler.py
Added mercantile Tile coordinate utilities (was already used, now declared)
Removed riomucho Replaced by direct multiprocessing pool

Breaking Changes

Note: These changes are incompatible with scripts that relied on the previous behaviour.

  • The rgbify command no longer supports GeoTIFF output (.tif destination) — output is MBTiles only
  • encoders.py (data_to_rgb) is replaced by image.py (ImageEncoder.data_to_rgb) — any code importing directly from rio_rgbify.encoders will need updating
  • The CLI entry point is now a group (main_group); rgbify is a subcommand (rio rgbify ... still works), and rio merge ... is new

acalcutt added 25 commits March 27, 2025 00:34
This reverts commit 688000f.
Add sparse tile option to merge config / remove output_quantized_alpha
- merger.py: remove double height_adjustment application from _merge_tiles
  (was already applied in _decode_tile, causing doubled elevation offsets)
- database.py: INSERT OR REPLACE to handle duplicate tiles on re-run
- test_cli.py: skip test_cli_good_elev (GeoTIFF output mode removed);
  use MBTilesDatabase for test_merge_command sources (was empty files)
- test_encoders.py: use np.allclose(atol=0.5) for roundtrip tests
  (terrarium ignores interval; float32 precision causes ~0.15 max error)
- test_mbtiler.py: skip test_RGBtiler_format_fails (format validation
  moved to CLI click.Choice)
- test_merger.py: fix _merger() helper to default num_sources=2; use
  _WORLD_BOUNDS to avoid mercantile InvalidLatitudeError at lat=±90
- merger.py: fix source priority — mask should be np.isnan(result) so
  higher-priority (first) sources win; previous mask ~np.isnan(resampled_data)
  caused every subsequent source to overwrite earlier ones (last wins bug)
- cli.py: fix resampling choice 'gaussian' -> 'gauss' to match rasterio's
  Resampling enum member name; tests correctly pass 'gauss'
- test/download_fixtures.py: script to fetch z0-z2 tiles from both tile
  servers into MBTiles fixture files (auto-decompresses gzip responses)
- test/fixtures/gebco_sample.mbtiles: 21 WebP tiles from ocean-rgb endpoint
- test/fixtures/jaxa_sample.mbtiles: 21 WebP tiles from jaxa_terrainrgb_webp endpoint
- test/test_merger.py: TestLiveMerge (6 tests) mirroring merge_bathymetry.json:
  JAXA land (priority 1) over GEBCO bathymetry (priority 2), z0-z2, webp output
  - produces output with tiles at all 3 zoom levels
  - output tiles are valid WebP RGB 512x512 images
  - JAXA land takes priority (positive elevation in East Asia z=2/2/1 tile)
  - sparse_tiles mode produces <= full mode tile count
  - output_nodata config key accepted without error
  Tests skip automatically when fixture files are absent.
- Add test/expected/z{0,2}_x*_y*.png: lossless PNG reference tiles generated
  by running the GEBCO+JAXA merger once and frozen as ground truth
- Add test/generate_expected_tiles.py: regeneration script to re-run whenever
  merger behaviour is intentionally changed
- Add TestLiveMerge.test_output_matches_expected_tiles: decodes output tiles to
  elevation arrays and asserts np.allclose(atol=1.0) against references
- Update _EXPECTED_TILES_DIR to point to existing test/expected/ convention
@acalcutt acalcutt changed the title Start to add merge feature Add Merge & Sparse Tiles Mar 20, 2026
@acalcutt acalcutt merged commit 512d65f into master Mar 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant