HFX

Just want to delineate watersheds? You do not need this spec. Install the Python package (pip install pyshed) or use the reference engine, shed (github.com/CooperBigFoot/shed). Read on only if you are building or validating HFX datasets.

HFX (HydroFabric Exchange) is an open specification and toolkit for a compiled drainage format that lets watershed delineation engines consume any source hydrofabric (a dataset describing a region's river network: the streams, how they connect, and the land area draining to each) through a single normalized contract.

The core idea is simple: adapters compile source-specific hydrofabrics such as HydroBASINS, GRIT, or MERIT Hydro into HFX once, offline. Engines then consume HFX exclusively, with no fabric-specific logic in the hot path.

Why HFX Exists

Today a watershed delineation tool is usually wired to a single hydrofabric, so moving to a different fabric, or comparing results across several, means re-tooling the engine for each one's format, identifiers, and edge cases. The same fragmentation blocks AI agents and automated pipelines, which need one documented, machine-checkable contract to target instead of many bespoke, often undocumented formats. HFX is that contract, and it splits the work so each side stays simple:

Adapters handle the source-specific ETL and normalization, once and offline.
The engine reads one compiled contract, with no fabric-specific logic in its hot path.
A validator checks each compiled dataset, instead of every upstream source format.

Architecture

flowchart LR
    A[Source Hydrofabric<br/>HydroBASINS / GRIT / MERIT Hydro / ...] --> B[Adapter<br/>offline compile step]
    B --> C[HFX Dataset<br/>normalized artifacts]
    C --> D[Delineation Engine<br/>HFX only]

This is a two-layer architecture:

Source-specific adapters run once and produce a self-contained HFX dataset.
The engine consumes only HFX artifacts and applies runtime traversal policy without knowing the source fabric.

HFX Dataset Layout

An HFX dataset is a single folder containing these artifacts:

Artifact	Purpose
`catchments.parquet`	Drainage-unit polygons (the area of land draining to each river reach), levels, parent links, outlets, and bbox columns for row-group pruning (skipping blocks of the Parquet file whose bounding boxes cannot overlap a query, so a remote read fetches less data)
`graph.parquet`	Same-level upstream adjacency graph (a list of which drainage unit flows directly into which, used to traverse upstream)
`snap.parquet`	Optional reach or node geometries used for outlet snapping
`manifest.json`	Dataset metadata describing fabric identity, CRS, topology class (the network's connectivity model, e.g. whether flow paths only branch apart or can also rejoin), counts, and auxiliary declarations

Auxiliary data such as paired D8 rasters (a matched pair of grids in the D8 flow model, in which each cell drains to whichever of its 8 neighbors is steepest downhill) is declared in manifest.json, for example with hfx.aux.d8_raster.v1 entries pointing at flow_dir.tif (the flow-direction grid) and flow_acc.tif (the flow-accumulation grid, the number of upstream cells draining through each cell).

v0.2 Scope

Current design boundaries for HFX v0.2:

Multi-level drainage units with explicit parent relationships.
Same-level graph traversal only; cross-level hierarchy lives in parent_id.
Optional snap features are separate from required unit outlets.
Auxiliary artifacts are manifest-declared, not first-class core files.
EPSG:4326 is required.
Each dataset is self-contained in a single folder.
The manifest describes the data, not engine traversal policy.
The graph supports both tree and DAG topologies (in a tree, each drainage unit drains to exactly one downstream unit; in a DAG, a directed acyclic graph, a unit may drain to more than one downstream unit, as with braided or anabranching rivers, while flow never loops back on itself).
Adapter implementation is intentionally out of scope for the spec: any tool that produces conformant artifacts is valid.

Repository Layout

This repository is organized as a spec-first monorepo:

Path	Purpose
`spec/`	Canonical HFX specification and spec changelog
`schemas/`	Machine-readable schema artifacts, starting with the manifest schema
`examples/`	Reference datasets and implementer-facing examples
`conformance/`	Valid and invalid fixtures for validator and interoperability work
`crates/`	Rust toolkit crates, including shared logic and the validator CLI
`adapters/`	Source-fabric compilers (GRIT and MERIT, both working)
`docs/decisions/`	Short decision records for important spec and architecture choices
`scripts/`	Repo helper scripts and release support utilities

Source Of Truth

The primary normative artifact is the development specification at spec/HFX_SPEC.md.

Supporting public interfaces live alongside it:

schemas/manifest.schema.json defines the manifest contract in machine-readable form.
examples/ holds reference datasets for implementers.
conformance/ holds validator fixtures and intentionally invalid datasets.

The validator and future adapters exist to serve the specification, not define it.

Validator CLI

The validator is published as the hfx-validator crate and installs the hfx binary:

cargo install hfx-validator
hfx ./path/to/dataset

For machine-readable output:

hfx --format json ./path/to/dataset

--strict promotes warnings to errors.
--skip-rasters skips flow_dir.tif and flow_acc.tif checks.
Exit code 0 means the dataset is valid; exit code 1 means it is invalid.

Validation behavior is defined against spec/HFX_SPEC.md.

Datasets

One reference dataset is hosted on Cloudflare R2 and readable directly over HTTPS via range requests, so engines do not have to download it in full. Upstream Tech sponsors the hosting infrastructure (infrastructure sponsor only); it is not the publisher or vendor of the data.

GRIT 2.0.0 HFX

An HFX compilation of the Global River Topology (GRIT) vector datasets, with segment (level=0) and reach (level=1) drainage units.

Manifest: https://basin-delineations-public.upstream.tech/grit/hfx-v0.3.0/manifest.json

Property	Value
`format_version`	0.3.0
`fabric_name`	grit
`fabric_version`	1.0.0
`adapter_version`	grit-global-2.0.0
`unit_count`	22,337,300
`topology`	dag
CRS	EPSG:4326
Auxiliaries	two `hfx.aux.snap.v2` snap indexes (segment-stems, reach-stems)
Total size	~43 GB

Objects (base URL https://basin-delineations-public.upstream.tech/grit/hfx-v0.3.0/):

manifest.json
catchments.parquet
graph.parquet
aux/snap_segments.parquet
aux/snap_reaches.parquet
NOTICE, CITATION.txt, README.md (attribution objects)

License and attribution: CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/), inherited from the source data — NonCommercial use only. Any use of this dataset must credit the source data authors:

Wortmann, M. et al. (2025) “Global River Topology (GRIT) vector datasets”. Zenodo. doi:10.5281/zenodo.17435232.

DOI: https://doi.org/10.5281/zenodo.17435232

Validate any local copy:

hfx ./path/to/dataset --strict

Versioning

HFX carries two independent version tracks: the SPEC track is the format version itself (format_version, currently 0.3.0), while the TOOLKIT track covers the lockstep workspace crates hfx-core and hfx-validator, whose next curated release is 0.4.0 (implementing spec 0.3.0). See docs/VERSIONING.md for the full policy, the spec-to-toolkit mapping table, and the release procedure.

Implementations

HFX is the standard; engines and adapters implement it. shed is the reference engine implementation — it consumes HFX datasets directly and ships the Python bindings.

Contributions are welcome on the adapter side: a HydroBASINS adapter is the most-wanted next adapter. See docs/ADAPTER_GUIDE.md for how to build one and adapters/ for the existing compilers.

Status

HFX spec 0.3.0 is the current format version, and the hosted GRIT 2.0.0 dataset conforms to it. hfx-core and hfx-validator 0.3.0 are published on crates.io; the next curated toolkit release, 0.4.0, implements spec 0.3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.claude		.claude
.github		.github
adapters		adapters
conformance		conformance
crates		crates
docs		docs
examples		examples
hosting		hosting
schemas		schemas
scripts		scripts
spec		spec
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
GOVERNANCE.md		GOVERNANCE.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HFX

Why HFX Exists

Architecture

HFX Dataset Layout

v0.2 Scope

Repository Layout

Source Of Truth

Validator CLI

Datasets

GRIT 2.0.0 HFX

Versioning

Implementations

Status

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HFX

Why HFX Exists

Architecture

HFX Dataset Layout

v0.2 Scope

Repository Layout

Source Of Truth

Validator CLI

Datasets

GRIT 2.0.0 HFX

Versioning

Implementations

Status

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages