Developer-oriented reference for the supported public pyshed Python API.
This file mirrors the runtime surface re-exported from
python/pyshed/__init__.py and the shipped
PEP 561 stub in python/pyshed/__init__.pyi.
The pyshed package exports these names:
EngineDelineationResultDelineationUnitMetadataAreaOnlyResultLevelSelectionSelectedLevelResolvedOutletUpstreamUnitsPreMergeDrainageUnitPreMergeDrainageUnitsTerminalRefinementDissolvedWatershedBasinGeoParquetWriterUnitBundleGeoParquetWriterShedErrorDatasetErrorResolutionErrorAssemblyErrorbench_traceset_log_level__version__
_pyshed exists as a compiled implementation detail, but its helper functions
are not part of the supported public API.
set_log_level(level: str) -> NoneSets the active log level for both the Rust tracing bridge and the Python
logging tree.
| Parameter | Type | Meaning |
|---|---|---|
level |
str |
Case-insensitive level name: "trace", "debug", "info", "warn"/"warning", or "error"/"critical" |
Records from Rust code route through pyo3-log under loggers named after their
crate (_pyshed.*, shed_core.*, hfx_core.*). If any relevant logger has no
handler, a StreamHandler is added to that logger automatically, so first-time
users see output without calling logging.basicConfig.
Set PYSHED_LOG to one of the same level names to opt in at import time.
bench_trace(path: os.PathLike[str] | str) -> Iterator[None]Context manager that writes Rust stage-span benchmark telemetry to path while
the context is active.
with bench_trace("trace.jsonl"):
result = engine.delineate(lat=47.3769, lon=8.5417)
# trace.jsonl now contains JSONL records with kind == "stage".Engine(
dataset_path: str,
*,
snap_radius: float | None = None,
snap_strategy: Literal["distance-first", "weight-first"] | None = None,
snap_threshold: int | None = None,
clean_epsilon: float | None = None,
refine: bool = True,
repair_geometry: Literal["auto", "gdal", "clean"] | Literal[False] | None = "auto",
parquet_cache: bool | None = None,
parquet_cache_max_mb: int = 512,
) -> NoneOpens an HFX dataset and constructs a delineation engine.
dataset_path must point to an HFX v0.2.1 dataset. HFX v0.1 datasets
hard-error as an unsupported format version.
| Parameter | Type | Default | Meaning |
|---|---|---|---|
dataset_path |
str |
— | Path or URL to the HFX dataset root directory |
snap_radius |
float | None |
None |
Snap-path search radius in metres; must be finite and positive when provided |
snap_strategy |
"distance-first" | "weight-first" | None |
None |
Snap ranking strategy. Defaults to "weight-first" (HFX v0.2 contract). |
snap_threshold |
int | None |
None |
Minimum upstream-pixel count for stream-network snapping |
clean_epsilon |
float | None |
None |
Topology-cleaning epsilon in degrees |
refine |
bool |
True |
Whether raster-based terminal refinement is enabled |
repair_geometry |
"auto" | "gdal" | "clean" | False | None |
"auto" |
Geometry repair mode. "auto", "clean", False, and None use pure-Rust topology cleaning; "gdal" opts into GDAL repair |
parquet_cache |
bool | None |
None |
Enable in-memory Parquet column-chunk cache for repeated delineations. None enables caching for remote URLs and disables it for local paths |
parquet_cache_max_mb |
int |
512 |
Maximum cache size in MiB; must be > 0 when parquet_cache=True |
DatasetErrorwhen the dataset cannot be opened or read.ValueErrorwhen a configuration argument is invalid, such as an unknownsnap_strategy, unknownrepair_geometry, a non-positivesnap_radius, orparquet_cache_max_mb=0whenparquet_cache=True.
Use these constructor options when the dataset or outlet coordinates need extra
control. They apply to the Engine instance you create.
-
snap_radiussets the search radius in metres for snapping the outlet point onto the river network. Set it on the constructor and reuse that engine for delineations that need the same search distance. Raise it when outlet coordinates sit off the mapped river.engine = Engine("/path/to/hfx/dataset", snap_radius=5000)
-
repair_geometrycontrols the geometry repairer. Geometry repair defaults to the pure-Rust topology cleaner."gdal"opts into the GDAL repairer."auto","clean",False, andNoneall use the default cleaner. -
parquet_cachecontrols the per-Enginein-memory cache of recently fetched dataset blocks. TheNonedefault enables it for remote URLs and disables it for local paths. Repeated delineations in the same session reuse data already fetched, so overlapping watersheds are faster. The cache is held in memory only, never written to disk, and is separate from the persistent metadata cache kept underHFX_CACHE_DIRor the OS cache directory. -
parquet_cache_max_mbsets the cache size cap in MiB. It defaults to512when caching is enabled and must be greater than zero whenparquet_cache=True.
delineate(
*,
lat: float,
lon: float,
geometry: bool = True,
) -> DelineationResult | AreaOnlyResultDelineates the watershed upstream of a single outlet.
| Parameter | Type | Meaning |
|---|---|---|
lat |
float |
Outlet latitude in decimal degrees (EPSG:4326) |
lon |
float |
Outlet longitude in decimal degrees (EPSG:4326) |
geometry |
bool |
When True, return a full DelineationResult; when False, return AreaOnlyResult scalar metadata without geometry accessors |
Type checkers see precise overloads:
Engine.delineate(*, lat: float, lon: float) -> DelineationResult
Engine.delineate(*, lat: float, lon: float, geometry=True) -> DelineationResult
Engine.delineate(*, lat: float, lon: float, geometry=False) -> AreaOnlyResultValueErrorwhenlatorlonis outside the valid geographic range.ResolutionErrorwhen the outlet cannot be resolved to a terminal catchment.DatasetErrorwhen underlying dataset reads fail during delineation.AssemblyErrorwhen watershed geometry assembly fails.ShedErrorfor other engine failures such as traversal or refinement errors.
delineate_batch(
outlets: list[dict[str, float]],
*,
progress: Callable[[dict], None] | None = None,
) -> list[DelineationResult]Delineates watersheds for a batch of outlets that share the same engine configuration.
Each outlet must be a dict with exactly these keys:
{"lat": 47.3769, "lon": 8.5417}Results are returned in input order. The call raises on the first failure in that order rather than returning per-outlet error objects.
When progress is supplied, the batch runs sequentially and the callback is
invoked once per outlet (after it completes) with an event dict:
| Key | Type | Present |
|---|---|---|
index |
int |
always |
total |
int |
always |
lat |
float |
always |
lon |
float |
always |
duration_ms |
int |
always |
status |
str ("ok" or "error") |
always |
n_catchments |
int |
success only |
error |
str |
failure only |
Exceptions raised by the callback are swallowed and logged via warn!; they do
not interrupt the batch.
Without progress, the batch runs in parallel via Rayon.
KeyErrorwhen an outlet dict is missing"lat"or"lon".ValueErrorwhen any outlet contains invalid coordinates.- The same typed
pyshedexceptions asdelineate()for engine failures.
Engine.delineate() is equivalent to this staged composition:
level = engine.select_level(selection=LevelSelection.FINEST)
outlet = engine.resolve_outlet(level, lat=47.3769, lon=8.5417)
upstream = engine.traverse(outlet)
units = engine.pre_merge_units(upstream)
refinement = engine.refine(outlet, units)
dissolved = engine.dissolve(units, refinement)
result = engine.compose_result(outlet, upstream, units, refinement, dissolved)The supported order is:
select_level -> resolve_outlet -> traverse -> pre_merge_units -> refine -> dissolve -> compose_result
Each method accepts the typed intermediate from the prior stage. Passing the
wrong object type raises TypeError.
| Method | Returns | Meaning |
|---|---|---|
select_level(selection=LevelSelection.FINEST) |
SelectedLevel |
Selects the finest loaded HFX level |
resolve_outlet(level, *, lat, lon) |
ResolvedOutlet |
Resolves an outlet at that level |
traverse(outlet) |
UpstreamUnits |
Traverses same-level upstream unit IDs |
pre_merge_units(upstream) |
PreMergeDrainageUnits |
Materializes whole source drainage units and whole-unit WKB |
refine(outlet, units) |
TerminalRefinement |
Runs or skips terminal refinement based on engine config and dataset auxiliaries |
dissolve(units, refinement) |
DissolvedWatershed |
Produces the final merged geometry and area |
compose_result(...) |
DelineationResult |
Packages the same merged result shape returned by delineate() |
LevelSelection.FINEST is the only valid selection currently supported.
Multi-level selection is on the roadmap.
PreMergeDrainageUnits contains whole source drainage units, including the
whole terminal unit. When terminal refinement is applied, summing or unioning
pre-merge units is not the same as the final merged area_km2 or
geometry_wkb.
Returned by Engine.delineate() and Engine.delineate_batch().
| Property | Type | Meaning |
|---|---|---|
terminal_unit_id |
int |
Terminal HFX unit ID that the outlet resolved to |
input_outlet |
tuple[float, float] |
Original outlet as (lon, lat) |
resolved_outlet |
tuple[float, float] |
Outlet used for resolution as (lon, lat) |
refined_outlet |
tuple[float, float] | None |
Raster-refined outlet as (lon, lat), or None if refinement was not applied |
resolution_method |
str |
Debug/provenance string describing how outlet resolution happened |
upstream_unit_ids |
list[int] |
Upstream unit IDs including the terminal unit |
upstream_units |
list[DelineationUnitMetadata] |
Light per-unit metadata without per-unit geometry |
area_km2 |
float |
Geodesic watershed area in square kilometres |
geometry_bbox |
tuple[float, float, float, float] | None |
Geometry bounds as (minx, miny, maxx, maxy), or None for empty geometry |
geometry_wkb |
bytes |
Watershed geometry encoded as OGC WKB bytes |
to_geojson() -> strSerializes the result as a GeoJSON Feature string.
__repr__() -> strReturns a concise debug representation including the terminal unit ID, area, and upstream unit count.
DelineationResult intentionally does not expose per-unit WKB. Use the
pre_merge_units() staged output when whole-unit geometries are required.
Light per-unit metadata retained on a merged result.
| Property | Type | Meaning |
|---|---|---|
id |
int |
Drainage unit ID |
level |
int |
HFX drainage-unit level |
area_km2 |
float |
Source unit area |
up_area_km2 |
float | None |
Source upstream area when present |
outlet |
tuple[float, float] |
Declared outlet as (lon, lat) |
| Property | Type | Meaning |
|---|---|---|
terminal_unit_id |
int |
Terminal unit ID |
level |
int |
Selected HFX level |
units |
list[PreMergeDrainageUnit] |
Whole source drainage-unit metadata |
unit_geometry_wkb |
list[bytes] |
Whole source drainage-unit WKB geometries |
R3_NOTE |
str |
Visible note about whole-unit versus refined merged geometry divergence |
PreMergeDrainageUnit exposes id, level, area_km2, up_area_km2, and
outlet. TerminalRefinement.status is one of applied,
best_effort_skipped, or disabled. DissolvedWatershed exposes
area_km2 and geometry_wkb.
Exports are explicit writer-object calls and write complete batches.
BasinGeoParquetWriter().write(
engine,
"basins.parquet",
[result],
basin_ids=["basin-3"],
)
UnitBundleGeoParquetWriter().write(
engine,
"unit-bundle.parquet",
[units],
[refinement],
)write(
engine: Engine,
path: str,
results: list[DelineationResult],
*,
basin_ids: list[str] | None = None,
method: str | None = None,
allow_default_basin_id: bool = False,
) -> NoneWrites one merged-basin row per DelineationResult. basin_ids are
caller-supplied filesystem-safe identifiers. The terminal-unit-ID default is
allowed only when allow_default_basin_id=True and exactly one result is
provided.
write(
engine: Engine,
path: str,
bundles: list[PreMergeDrainageUnits],
refinements: list[TerminalRefinement],
*,
method: str | None = None,
) -> NoneWrites one row per pre-merge drainage unit. Row identity is dataset-local
unit_id; grouping columns are terminal_unit_id and delineation; geometry
is the whole source unit.
For both writers, default delineation is
{fabric_name}/{fabric_version}/{method}. method defaults to
d8-best-effort when engine refinement is enabled and no-refine when
refine=False. The per-row actual outcome is stored separately in
refinement_status.
Returned by Engine.delineate(..., geometry=False).
This result exposes scalar metadata and area only. It intentionally does not
provide geometry_wkb, geometry_bbox, or to_geojson().
| Property | Type | Meaning |
|---|---|---|
terminal_unit_id |
int |
Terminal HFX unit ID that the outlet resolved to |
input_outlet |
tuple[float, float] |
Original outlet as (lon, lat) |
resolved_outlet |
tuple[float, float] |
Outlet used for resolution as (lon, lat) |
refined_outlet |
tuple[float, float] | None |
Raster-refined outlet as (lon, lat), or None if refinement was not applied |
resolution_method |
str |
Debug/provenance string describing how outlet resolution happened |
upstream_unit_ids |
list[int] |
Upstream unit IDs including the terminal unit |
area_km2 |
float |
Geodesic watershed area in square kilometres |
class ShedError(Exception): ...
class DatasetError(ShedError): ...
class ResolutionError(ShedError): ...
class AssemblyError(ShedError): ...These typed exceptions are raised by the engine so callers can distinguish dataset-open failures, outlet-resolution failures, and geometry-assembly failures from broader engine errors.
__version__: strInstalled package version reported by importlib.metadata.version("pyshed").