pyshed API Reference

Developer-oriented reference for the supported public pyshed Python API. This file mirrors the runtime surface re-exported from python/pyshed/__init__.py and the shipped PEP 561 stub in python/pyshed/__init__.pyi.

Public Exports

The pyshed package exports these names:

Engine
DelineationResult
DelineationUnitMetadata
AreaOnlyResult
LevelSelection
SelectedLevel
ResolvedOutlet
UpstreamUnits
PreMergeDrainageUnit
PreMergeDrainageUnits
TerminalRefinement
DissolvedWatershed
BasinGeoParquetWriter
UnitBundleGeoParquetWriter
ShedError
DatasetError
ResolutionError
AssemblyError
bench_trace
set_log_level
__version__

_pyshed exists as a compiled implementation detail, but its helper functions are not part of the supported public API.

set_log_level

set_log_level(level: str) -> None

Sets the active log level for both the Rust tracing bridge and the Python logging tree.

Parameter	Type	Meaning
`level`	`str`	Case-insensitive level name: `"trace"`, `"debug"`, `"info"`, `"warn"`/`"warning"`, or `"error"`/`"critical"`

Records from Rust code route through pyo3-log under loggers named after their crate (_pyshed.*, shed_core.*, hfx_core.*). If any relevant logger has no handler, a StreamHandler is added to that logger automatically, so first-time users see output without calling logging.basicConfig.

Set PYSHED_LOG to one of the same level names to opt in at import time.

bench_trace

bench_trace(path: os.PathLike[str] | str) -> Iterator[None]

Context manager that writes Rust stage-span benchmark telemetry to path while the context is active.

with bench_trace("trace.jsonl"):
    result = engine.delineate(lat=47.3769, lon=8.5417)

# trace.jsonl now contains JSONL records with kind == "stage".

Engine

Constructor

Engine(
    dataset_path: str,
    *,
    snap_radius: float | None = None,
    snap_strategy: Literal["distance-first", "weight-first"] | None = None,
    snap_threshold: int | None = None,
    clean_epsilon: float | None = None,
    refine: bool = True,
    repair_geometry: Literal["auto", "gdal", "clean"] | Literal[False] | None = "auto",
    parquet_cache: bool | None = None,
    parquet_cache_max_mb: int = 512,
) -> None

Opens an HFX dataset and constructs a delineation engine.

dataset_path must point to an HFX v0.2.1 dataset. HFX v0.1 datasets hard-error as an unsupported format version.

Parameter	Type	Default	Meaning
`dataset_path`	`str`	—	Path or URL to the HFX dataset root directory
`snap_radius`	`float \| None`	`None`	Snap-path search radius in metres; must be finite and positive when provided
`snap_strategy`	`"distance-first" \| "weight-first" \| None`	`None`	Snap ranking strategy. Defaults to `"weight-first"` (HFX v0.2 contract).
`snap_threshold`	`int \| None`	`None`	Minimum upstream-pixel count for stream-network snapping
`clean_epsilon`	`float \| None`	`None`	Topology-cleaning epsilon in degrees
`refine`	`bool`	`True`	Whether raster-based terminal refinement is enabled
`repair_geometry`	`"auto" \| "gdal" \| "clean" \| False \| None`	`"auto"`	Geometry repair mode. `"auto"`, `"clean"`, `False`, and `None` use pure-Rust topology cleaning; `"gdal"` opts into GDAL repair
`parquet_cache`	`bool \| None`	`None`	Enable in-memory Parquet column-chunk cache for repeated delineations. `None` enables caching for remote URLs and disables it for local paths
`parquet_cache_max_mb`	`int`	`512`	Maximum cache size in MiB; must be > 0 when `parquet_cache=True`

Exceptions

DatasetError when the dataset cannot be opened or read.
ValueError when a configuration argument is invalid, such as an unknown snap_strategy, unknown repair_geometry, a non-positive snap_radius, or parquet_cache_max_mb=0 when parquet_cache=True.

Tuning Knobs

Use these constructor options when the dataset or outlet coordinates need extra control. They apply to the Engine instance you create.

snap_radius sets the search radius in metres for snapping the outlet point onto the river network. Set it on the constructor and reuse that engine for delineations that need the same search distance. Raise it when outlet coordinates sit off the mapped river.
```
engine = Engine("/path/to/hfx/dataset", snap_radius=5000)
```
repair_geometry controls the geometry repairer. Geometry repair defaults to the pure-Rust topology cleaner. "gdal" opts into the GDAL repairer. "auto", "clean", False, and None all use the default cleaner.
parquet_cache controls the per-Engine in-memory cache of recently fetched dataset blocks. The None default enables it for remote URLs and disables it for local paths. Repeated delineations in the same session reuse data already fetched, so overlapping watersheds are faster. The cache is held in memory only, never written to disk, and is separate from the persistent metadata cache kept under HFX_CACHE_DIR or the OS cache directory.
parquet_cache_max_mb sets the cache size cap in MiB. It defaults to 512 when caching is enabled and must be greater than zero when parquet_cache=True.

Methods

delineate(
    *,
    lat: float,
    lon: float,
    geometry: bool = True,
) -> DelineationResult | AreaOnlyResult

Delineates the watershed upstream of a single outlet.

Parameter	Type	Meaning
`lat`	`float`	Outlet latitude in decimal degrees (EPSG:4326)
`lon`	`float`	Outlet longitude in decimal degrees (EPSG:4326)
`geometry`	`bool`	When `True`, return a full `DelineationResult`; when `False`, return `AreaOnlyResult` scalar metadata without geometry accessors

Type checkers see precise overloads:

Engine.delineate(*, lat: float, lon: float) -> DelineationResult
Engine.delineate(*, lat: float, lon: float, geometry=True) -> DelineationResult
Engine.delineate(*, lat: float, lon: float, geometry=False) -> AreaOnlyResult

Exceptions

ValueError when lat or lon is outside the valid geographic range.
ResolutionError when the outlet cannot be resolved to a terminal catchment.
DatasetError when underlying dataset reads fail during delineation.
AssemblyError when watershed geometry assembly fails.
ShedError for other engine failures such as traversal or refinement errors.

delineate_batch(
    outlets: list[dict[str, float]],
    *,
    progress: Callable[[dict], None] | None = None,
) -> list[DelineationResult]

Delineates watersheds for a batch of outlets that share the same engine configuration.

Each outlet must be a dict with exactly these keys:

{"lat": 47.3769, "lon": 8.5417}

Results are returned in input order. The call raises on the first failure in that order rather than returning per-outlet error objects.

When progress is supplied, the batch runs sequentially and the callback is invoked once per outlet (after it completes) with an event dict:

Key	Type	Present
`index`	`int`	always
`total`	`int`	always
`lat`	`float`	always
`lon`	`float`	always
`duration_ms`	`int`	always
`status`	`str` (`"ok"` or `"error"`)	always
`n_catchments`	`int`	success only
`error`	`str`	failure only

Exceptions raised by the callback are swallowed and logged via warn!; they do not interrupt the batch.

Without progress, the batch runs in parallel via Rayon.

Exceptions

KeyError when an outlet dict is missing "lat" or "lon".
ValueError when any outlet contains invalid coordinates.
The same typed pyshed exceptions as delineate() for engine failures.

Staged Methods

Engine.delineate() is equivalent to this staged composition:

level = engine.select_level(selection=LevelSelection.FINEST)
outlet = engine.resolve_outlet(level, lat=47.3769, lon=8.5417)
upstream = engine.traverse(outlet)
units = engine.pre_merge_units(upstream)
refinement = engine.refine(outlet, units)
dissolved = engine.dissolve(units, refinement)
result = engine.compose_result(outlet, upstream, units, refinement, dissolved)

The supported order is:

select_level -> resolve_outlet -> traverse -> pre_merge_units -> refine -> dissolve -> compose_result

Each method accepts the typed intermediate from the prior stage. Passing the wrong object type raises TypeError.

Method	Returns	Meaning
`select_level(selection=LevelSelection.FINEST)`	`SelectedLevel`	Selects the finest loaded HFX level
`resolve_outlet(level, *, lat, lon)`	`ResolvedOutlet`	Resolves an outlet at that level
`traverse(outlet)`	`UpstreamUnits`	Traverses same-level upstream unit IDs
`pre_merge_units(upstream)`	`PreMergeDrainageUnits`	Materializes whole source drainage units and whole-unit WKB
`refine(outlet, units)`	`TerminalRefinement`	Runs or skips terminal refinement based on engine config and dataset auxiliaries
`dissolve(units, refinement)`	`DissolvedWatershed`	Produces the final merged geometry and area
`compose_result(...)`	`DelineationResult`	Packages the same merged result shape returned by `delineate()`

LevelSelection.FINEST is the only valid selection currently supported. Multi-level selection is on the roadmap.

PreMergeDrainageUnits contains whole source drainage units, including the whole terminal unit. When terminal refinement is applied, summing or unioning pre-merge units is not the same as the final merged area_km2 or geometry_wkb.

DelineationResult

Returned by Engine.delineate() and Engine.delineate_batch().

Properties

Property	Type	Meaning
`terminal_unit_id`	`int`	Terminal HFX unit ID that the outlet resolved to
`input_outlet`	`tuple[float, float]`	Original outlet as `(lon, lat)`
`resolved_outlet`	`tuple[float, float]`	Outlet used for resolution as `(lon, lat)`
`refined_outlet`	`tuple[float, float] \| None`	Raster-refined outlet as `(lon, lat)`, or `None` if refinement was not applied
`resolution_method`	`str`	Debug/provenance string describing how outlet resolution happened
`upstream_unit_ids`	`list[int]`	Upstream unit IDs including the terminal unit
`upstream_units`	`list[DelineationUnitMetadata]`	Light per-unit metadata without per-unit geometry
`area_km2`	`float`	Geodesic watershed area in square kilometres
`geometry_bbox`	`tuple[float, float, float, float] \| None`	Geometry bounds as `(minx, miny, maxx, maxy)`, or `None` for empty geometry
`geometry_wkb`	`bytes`	Watershed geometry encoded as OGC WKB bytes

Methods

to_geojson() -> str

Serializes the result as a GeoJSON Feature string.

__repr__() -> str

Returns a concise debug representation including the terminal unit ID, area, and upstream unit count.

DelineationResult intentionally does not expose per-unit WKB. Use the pre_merge_units() staged output when whole-unit geometries are required.

DelineationUnitMetadata

Light per-unit metadata retained on a merged result.

Property	Type	Meaning
`id`	`int`	Drainage unit ID
`level`	`int`	HFX drainage-unit level
`area_km2`	`float`	Source unit area
`up_area_km2`	`float \| None`	Source upstream area when present
`outlet`	`tuple[float, float]`	Declared outlet as `(lon, lat)`

Staged Intermediate Classes

PreMergeDrainageUnits

Property	Type	Meaning
`terminal_unit_id`	`int`	Terminal unit ID
`level`	`int`	Selected HFX level
`units`	`list[PreMergeDrainageUnit]`	Whole source drainage-unit metadata
`unit_geometry_wkb`	`list[bytes]`	Whole source drainage-unit WKB geometries
`R3_NOTE`	`str`	Visible note about whole-unit versus refined merged geometry divergence

PreMergeDrainageUnit exposes id, level, area_km2, up_area_km2, and outlet. TerminalRefinement.status is one of applied, best_effort_skipped, or disabled. DissolvedWatershed exposes area_km2 and geometry_wkb.

GeoParquet Writers

Exports are explicit writer-object calls and write complete batches.

BasinGeoParquetWriter().write(
    engine,
    "basins.parquet",
    [result],
    basin_ids=["basin-3"],
)

UnitBundleGeoParquetWriter().write(
    engine,
    "unit-bundle.parquet",
    [units],
    [refinement],
)

BasinGeoParquetWriter

write(
    engine: Engine,
    path: str,
    results: list[DelineationResult],
    *,
    basin_ids: list[str] | None = None,
    method: str | None = None,
    allow_default_basin_id: bool = False,
) -> None

Writes one merged-basin row per DelineationResult. basin_ids are caller-supplied filesystem-safe identifiers. The terminal-unit-ID default is allowed only when allow_default_basin_id=True and exactly one result is provided.

UnitBundleGeoParquetWriter

write(
    engine: Engine,
    path: str,
    bundles: list[PreMergeDrainageUnits],
    refinements: list[TerminalRefinement],
    *,
    method: str | None = None,
) -> None

Writes one row per pre-merge drainage unit. Row identity is dataset-local unit_id; grouping columns are terminal_unit_id and delineation; geometry is the whole source unit.

For both writers, default delineation is {fabric_name}/{fabric_version}/{method}. method defaults to d8-best-effort when engine refinement is enabled and no-refine when refine=False. The per-row actual outcome is stored separately in refinement_status.

AreaOnlyResult

Returned by Engine.delineate(..., geometry=False).

This result exposes scalar metadata and area only. It intentionally does not provide geometry_wkb, geometry_bbox, or to_geojson().

Properties

Property	Type	Meaning
`terminal_unit_id`	`int`	Terminal HFX unit ID that the outlet resolved to
`input_outlet`	`tuple[float, float]`	Original outlet as `(lon, lat)`
`resolved_outlet`	`tuple[float, float]`	Outlet used for resolution as `(lon, lat)`
`refined_outlet`	`tuple[float, float] \| None`	Raster-refined outlet as `(lon, lat)`, or `None` if refinement was not applied
`resolution_method`	`str`	Debug/provenance string describing how outlet resolution happened
`upstream_unit_ids`	`list[int]`	Upstream unit IDs including the terminal unit
`area_km2`	`float`	Geodesic watershed area in square kilometres

Exceptions

class ShedError(Exception): ...
class DatasetError(ShedError): ...
class ResolutionError(ShedError): ...
class AssemblyError(ShedError): ...

These typed exceptions are raised by the engine so callers can distinguish dataset-open failures, outlet-resolution failures, and geometry-assembly failures from broader engine errors.

Module Metadata

__version__: str

Installed package version reported by importlib.metadata.version("pyshed").

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyshed API Reference

Public Exports

set_log_level

bench_trace

Engine

Constructor

Exceptions

Tuning Knobs

Methods

Exceptions

Exceptions

Staged Methods

DelineationResult

Properties

Methods

DelineationUnitMetadata

Staged Intermediate Classes

PreMergeDrainageUnits

GeoParquet Writers

BasinGeoParquetWriter

UnitBundleGeoParquetWriter

AreaOnlyResult

Properties

Exceptions

Module Metadata

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

pyshed API Reference

Public Exports

set_log_level

bench_trace

Engine

Constructor

Exceptions

Tuning Knobs

Methods

Exceptions

Exceptions

Staged Methods

DelineationResult

Properties

Methods

DelineationUnitMetadata

Staged Intermediate Classes

PreMergeDrainageUnits

GeoParquet Writers

BasinGeoParquetWriter

UnitBundleGeoParquetWriter

AreaOnlyResult

Properties

Exceptions

Module Metadata