Skip to content

Latest commit

 

History

History
432 lines (334 loc) · 15 KB

File metadata and controls

432 lines (334 loc) · 15 KB

pyshed API Reference

Developer-oriented reference for the supported public pyshed Python API. This file mirrors the runtime surface re-exported from python/pyshed/__init__.py and the shipped PEP 561 stub in python/pyshed/__init__.pyi.

Public Exports

The pyshed package exports these names:

  • Engine
  • DelineationResult
  • DelineationUnitMetadata
  • AreaOnlyResult
  • LevelSelection
  • SelectedLevel
  • ResolvedOutlet
  • UpstreamUnits
  • PreMergeDrainageUnit
  • PreMergeDrainageUnits
  • TerminalRefinement
  • DissolvedWatershed
  • BasinGeoParquetWriter
  • UnitBundleGeoParquetWriter
  • ShedError
  • DatasetError
  • ResolutionError
  • AssemblyError
  • bench_trace
  • set_log_level
  • __version__

_pyshed exists as a compiled implementation detail, but its helper functions are not part of the supported public API.

set_log_level

set_log_level(level: str) -> None

Sets the active log level for both the Rust tracing bridge and the Python logging tree.

Parameter Type Meaning
level str Case-insensitive level name: "trace", "debug", "info", "warn"/"warning", or "error"/"critical"

Records from Rust code route through pyo3-log under loggers named after their crate (_pyshed.*, shed_core.*, hfx_core.*). If any relevant logger has no handler, a StreamHandler is added to that logger automatically, so first-time users see output without calling logging.basicConfig.

Set PYSHED_LOG to one of the same level names to opt in at import time.

bench_trace

bench_trace(path: os.PathLike[str] | str) -> Iterator[None]

Context manager that writes Rust stage-span benchmark telemetry to path while the context is active.

with bench_trace("trace.jsonl"):
    result = engine.delineate(lat=47.3769, lon=8.5417)

# trace.jsonl now contains JSONL records with kind == "stage".

Engine

Constructor

Engine(
    dataset_path: str,
    *,
    snap_radius: float | None = None,
    snap_strategy: Literal["distance-first", "weight-first"] | None = None,
    snap_threshold: int | None = None,
    clean_epsilon: float | None = None,
    refine: bool = True,
    repair_geometry: Literal["auto", "gdal", "clean"] | Literal[False] | None = "auto",
    parquet_cache: bool | None = None,
    parquet_cache_max_mb: int = 512,
) -> None

Opens an HFX dataset and constructs a delineation engine.

dataset_path must point to an HFX v0.2.1 dataset. HFX v0.1 datasets hard-error as an unsupported format version.

Parameter Type Default Meaning
dataset_path str Path or URL to the HFX dataset root directory
snap_radius float | None None Snap-path search radius in metres; must be finite and positive when provided
snap_strategy "distance-first" | "weight-first" | None None Snap ranking strategy. Defaults to "weight-first" (HFX v0.2 contract).
snap_threshold int | None None Minimum upstream-pixel count for stream-network snapping
clean_epsilon float | None None Topology-cleaning epsilon in degrees
refine bool True Whether raster-based terminal refinement is enabled
repair_geometry "auto" | "gdal" | "clean" | False | None "auto" Geometry repair mode. "auto", "clean", False, and None use pure-Rust topology cleaning; "gdal" opts into GDAL repair
parquet_cache bool | None None Enable in-memory Parquet column-chunk cache for repeated delineations. None enables caching for remote URLs and disables it for local paths
parquet_cache_max_mb int 512 Maximum cache size in MiB; must be > 0 when parquet_cache=True

Exceptions

  • DatasetError when the dataset cannot be opened or read.
  • ValueError when a configuration argument is invalid, such as an unknown snap_strategy, unknown repair_geometry, a non-positive snap_radius, or parquet_cache_max_mb=0 when parquet_cache=True.

Tuning Knobs

Use these constructor options when the dataset or outlet coordinates need extra control. They apply to the Engine instance you create.

  • snap_radius sets the search radius in metres for snapping the outlet point onto the river network. Set it on the constructor and reuse that engine for delineations that need the same search distance. Raise it when outlet coordinates sit off the mapped river.

    engine = Engine("/path/to/hfx/dataset", snap_radius=5000)
  • repair_geometry controls the geometry repairer. Geometry repair defaults to the pure-Rust topology cleaner. "gdal" opts into the GDAL repairer. "auto", "clean", False, and None all use the default cleaner.

  • parquet_cache controls the per-Engine in-memory cache of recently fetched dataset blocks. The None default enables it for remote URLs and disables it for local paths. Repeated delineations in the same session reuse data already fetched, so overlapping watersheds are faster. The cache is held in memory only, never written to disk, and is separate from the persistent metadata cache kept under HFX_CACHE_DIR or the OS cache directory.

  • parquet_cache_max_mb sets the cache size cap in MiB. It defaults to 512 when caching is enabled and must be greater than zero when parquet_cache=True.

Methods

delineate(
    *,
    lat: float,
    lon: float,
    geometry: bool = True,
) -> DelineationResult | AreaOnlyResult

Delineates the watershed upstream of a single outlet.

Parameter Type Meaning
lat float Outlet latitude in decimal degrees (EPSG:4326)
lon float Outlet longitude in decimal degrees (EPSG:4326)
geometry bool When True, return a full DelineationResult; when False, return AreaOnlyResult scalar metadata without geometry accessors

Type checkers see precise overloads:

Engine.delineate(*, lat: float, lon: float) -> DelineationResult
Engine.delineate(*, lat: float, lon: float, geometry=True) -> DelineationResult
Engine.delineate(*, lat: float, lon: float, geometry=False) -> AreaOnlyResult

Exceptions

  • ValueError when lat or lon is outside the valid geographic range.
  • ResolutionError when the outlet cannot be resolved to a terminal catchment.
  • DatasetError when underlying dataset reads fail during delineation.
  • AssemblyError when watershed geometry assembly fails.
  • ShedError for other engine failures such as traversal or refinement errors.
delineate_batch(
    outlets: list[dict[str, float]],
    *,
    progress: Callable[[dict], None] | None = None,
) -> list[DelineationResult]

Delineates watersheds for a batch of outlets that share the same engine configuration.

Each outlet must be a dict with exactly these keys:

{"lat": 47.3769, "lon": 8.5417}

Results are returned in input order. The call raises on the first failure in that order rather than returning per-outlet error objects.

When progress is supplied, the batch runs sequentially and the callback is invoked once per outlet (after it completes) with an event dict:

Key Type Present
index int always
total int always
lat float always
lon float always
duration_ms int always
status str ("ok" or "error") always
n_catchments int success only
error str failure only

Exceptions raised by the callback are swallowed and logged via warn!; they do not interrupt the batch.

Without progress, the batch runs in parallel via Rayon.

Exceptions

  • KeyError when an outlet dict is missing "lat" or "lon".
  • ValueError when any outlet contains invalid coordinates.
  • The same typed pyshed exceptions as delineate() for engine failures.

Staged Methods

Engine.delineate() is equivalent to this staged composition:

level = engine.select_level(selection=LevelSelection.FINEST)
outlet = engine.resolve_outlet(level, lat=47.3769, lon=8.5417)
upstream = engine.traverse(outlet)
units = engine.pre_merge_units(upstream)
refinement = engine.refine(outlet, units)
dissolved = engine.dissolve(units, refinement)
result = engine.compose_result(outlet, upstream, units, refinement, dissolved)

The supported order is:

select_level -> resolve_outlet -> traverse -> pre_merge_units -> refine -> dissolve -> compose_result

Each method accepts the typed intermediate from the prior stage. Passing the wrong object type raises TypeError.

Method Returns Meaning
select_level(selection=LevelSelection.FINEST) SelectedLevel Selects the finest loaded HFX level
resolve_outlet(level, *, lat, lon) ResolvedOutlet Resolves an outlet at that level
traverse(outlet) UpstreamUnits Traverses same-level upstream unit IDs
pre_merge_units(upstream) PreMergeDrainageUnits Materializes whole source drainage units and whole-unit WKB
refine(outlet, units) TerminalRefinement Runs or skips terminal refinement based on engine config and dataset auxiliaries
dissolve(units, refinement) DissolvedWatershed Produces the final merged geometry and area
compose_result(...) DelineationResult Packages the same merged result shape returned by delineate()

LevelSelection.FINEST is the only valid selection currently supported. Multi-level selection is on the roadmap.

PreMergeDrainageUnits contains whole source drainage units, including the whole terminal unit. When terminal refinement is applied, summing or unioning pre-merge units is not the same as the final merged area_km2 or geometry_wkb.

DelineationResult

Returned by Engine.delineate() and Engine.delineate_batch().

Properties

Property Type Meaning
terminal_unit_id int Terminal HFX unit ID that the outlet resolved to
input_outlet tuple[float, float] Original outlet as (lon, lat)
resolved_outlet tuple[float, float] Outlet used for resolution as (lon, lat)
refined_outlet tuple[float, float] | None Raster-refined outlet as (lon, lat), or None if refinement was not applied
resolution_method str Debug/provenance string describing how outlet resolution happened
upstream_unit_ids list[int] Upstream unit IDs including the terminal unit
upstream_units list[DelineationUnitMetadata] Light per-unit metadata without per-unit geometry
area_km2 float Geodesic watershed area in square kilometres
geometry_bbox tuple[float, float, float, float] | None Geometry bounds as (minx, miny, maxx, maxy), or None for empty geometry
geometry_wkb bytes Watershed geometry encoded as OGC WKB bytes

Methods

to_geojson() -> str

Serializes the result as a GeoJSON Feature string.

__repr__() -> str

Returns a concise debug representation including the terminal unit ID, area, and upstream unit count.

DelineationResult intentionally does not expose per-unit WKB. Use the pre_merge_units() staged output when whole-unit geometries are required.

DelineationUnitMetadata

Light per-unit metadata retained on a merged result.

Property Type Meaning
id int Drainage unit ID
level int HFX drainage-unit level
area_km2 float Source unit area
up_area_km2 float | None Source upstream area when present
outlet tuple[float, float] Declared outlet as (lon, lat)

Staged Intermediate Classes

PreMergeDrainageUnits

Property Type Meaning
terminal_unit_id int Terminal unit ID
level int Selected HFX level
units list[PreMergeDrainageUnit] Whole source drainage-unit metadata
unit_geometry_wkb list[bytes] Whole source drainage-unit WKB geometries
R3_NOTE str Visible note about whole-unit versus refined merged geometry divergence

PreMergeDrainageUnit exposes id, level, area_km2, up_area_km2, and outlet. TerminalRefinement.status is one of applied, best_effort_skipped, or disabled. DissolvedWatershed exposes area_km2 and geometry_wkb.

GeoParquet Writers

Exports are explicit writer-object calls and write complete batches.

BasinGeoParquetWriter().write(
    engine,
    "basins.parquet",
    [result],
    basin_ids=["basin-3"],
)

UnitBundleGeoParquetWriter().write(
    engine,
    "unit-bundle.parquet",
    [units],
    [refinement],
)

BasinGeoParquetWriter

write(
    engine: Engine,
    path: str,
    results: list[DelineationResult],
    *,
    basin_ids: list[str] | None = None,
    method: str | None = None,
    allow_default_basin_id: bool = False,
) -> None

Writes one merged-basin row per DelineationResult. basin_ids are caller-supplied filesystem-safe identifiers. The terminal-unit-ID default is allowed only when allow_default_basin_id=True and exactly one result is provided.

UnitBundleGeoParquetWriter

write(
    engine: Engine,
    path: str,
    bundles: list[PreMergeDrainageUnits],
    refinements: list[TerminalRefinement],
    *,
    method: str | None = None,
) -> None

Writes one row per pre-merge drainage unit. Row identity is dataset-local unit_id; grouping columns are terminal_unit_id and delineation; geometry is the whole source unit.

For both writers, default delineation is {fabric_name}/{fabric_version}/{method}. method defaults to d8-best-effort when engine refinement is enabled and no-refine when refine=False. The per-row actual outcome is stored separately in refinement_status.

AreaOnlyResult

Returned by Engine.delineate(..., geometry=False).

This result exposes scalar metadata and area only. It intentionally does not provide geometry_wkb, geometry_bbox, or to_geojson().

Properties

Property Type Meaning
terminal_unit_id int Terminal HFX unit ID that the outlet resolved to
input_outlet tuple[float, float] Original outlet as (lon, lat)
resolved_outlet tuple[float, float] Outlet used for resolution as (lon, lat)
refined_outlet tuple[float, float] | None Raster-refined outlet as (lon, lat), or None if refinement was not applied
resolution_method str Debug/provenance string describing how outlet resolution happened
upstream_unit_ids list[int] Upstream unit IDs including the terminal unit
area_km2 float Geodesic watershed area in square kilometres

Exceptions

class ShedError(Exception): ...
class DatasetError(ShedError): ...
class ResolutionError(ShedError): ...
class AssemblyError(ShedError): ...

These typed exceptions are raised by the engine so callers can distinguish dataset-open failures, outlet-resolution failures, and geometry-assembly failures from broader engine errors.

Module Metadata

__version__: str

Installed package version reported by importlib.metadata.version("pyshed").