English | 简体中文
One sentence in natural language drives the full calibration workflow — from data validation to model evaluation.
hydroagent_demo.mp4
Demo video (~50 s)
HydroAgent is a hydrological model calibration system built on a Large Language Model (LLM) agentic loop. The user describes a task in plain natural language; the agent autonomously decides which tools to call and in what order, and finally returns calibrated parameters, evaluation metrics, and an analysis report.
It supports conceptual hydrological models such as GR4J and XAJ over the CAMELS-US dataset, with two calibration modes: classical SCE-UA / GA optimization and LLM-guided calibration.
HydroAgent uses a five-layer architecture with clearly separated responsibilities:
Brain agent.py LLM reasoning & tool dispatch (ReAct loop)
Cerebellum skills/*/skill.md Skill workflow guides, injected into the system prompt
Spine adapters/ PackageAdapter — bidirectional intent <-> package translation
Nerve tools/*.py Thin tool-routing functions
Muscle hydromodel pkg Numerical work: SCE-UA optimization, GR4J/XAJ simulation
Beyond the default ReAct loop (agent.py), the system also offers a Plan-and-Execute pipeline mode (pipeline.py), and the ability to spawn sub-agents (e.g. basin explorer, calibration worker) via tools/spawn_agent.py + agents/*.md.
The system prompt is assembled dynamically from seven sections on every agent.run():
| Section | Source | Content |
|---|---|---|
| §1 | skills/system.md |
Identity and behavioral principles |
| §1.5 | policy/*.md |
Behavioral constraints (when to stop, when to ask) |
| §1.7 | skills/expert_hydrologist/skill.md |
Hydrologist cognitive framework (always injected) |
| §2 | skill_registry.py |
Index of available skills and their read paths |
| §3 | adapters/ |
PackageAdapter capability docs |
| §4 | knowledge/*.md |
Domain knowledge (parameter meanings, failure diagnosis) |
| §5 | basin_profiles/ + MEMORY.md |
Basin history and cross-session memory |
git clone https://github.com/zhuanglaihong/HydroAgent.git
cd HydroAgent
python -m venv .venv
# Windows
.venv\Scripts\python.exe -m pip install -e .
# Linux / macOS
.venv/bin/python -m pip install -e .Note on
hydromodel.pip installwill automatically pull thedevbranch ofOuyangWenyu/hydromodelfrom GitHub, not the PyPI release. Thedevbranch contains a fix for a silent parameter-range bug and an ~290× GR4J speed-up that the experiments rely on. If you ever see "changing the parameter range has no effect on the NSE", you have somehow ended up on the PyPI version — reinstall.
cp configs/example_private.py configs/private.pyEdit configs/private.py and fill in the required fields:
OPENAI_API_KEY = "sk-your-api-key"
OPENAI_BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1" # or DeepSeek / OpenAI endpoint
DATASET_DIR = r"/path/to/CAMELS_US_parent" # parent directory of CAMELS_US/
RESULT_DIR = r"/path/to/results"Any OpenAI-compatible endpoint works (Qwen, DeepSeek, OpenAI, local Ollama).
HydroAgent offers two user-facing interactive modes and one developer-facing scripting mode:
Terminal mode (concise output, for daily use):
python -m hydroagent # interactive REPL
python -m hydroagent "Calibrate GR4J for basin 12025000" # single query, then exit
python -m hydroagent -w results/my_experiment # set the workspaceWeb mode (browser UI, FastAPI + WebSocket):
python -m hydroagent --server # default http://localhost:7860
python -m hydroagent --server --port 8080Developer scripting mode (paper experiments / system tests):
Instantiate HydroAgent directly in a Python script, recording the full tool-call
sequence and intermediate results:
from hydroagent.agent import HydroAgent
from hydroagent.interface.ui import ConsoleUI
agent = HydroAgent(workspace=Path("results/exp1"), ui=ConsoleUI(mode="dev"))
agent.run("Calibrate GR4J for basin 12025000 with SCE-UA")This is also the path used by the paper experiments.
User> Calibrate GR4J for basin 12025000
Agent:
[tool] validate_basin(["12025000"]) -> data complete
[tool] calibrate_model("gr4j", "SCE_UA") -> train NSE=0.783
[tool] evaluate_model(period="test") -> test NSE=0.747
GR4J calibration for basin 12025000 is complete.
Best parameters: x1=1180.6 mm, x2=-3.94, x3=36.9 mm, x4=1.22 days
Train NSE=0.783, test NSE=0.747; results saved to results/gr4j_12025000/
Rather than proposing parameter values directly, the LLM analyzes whether parameters hit their bounds, dynamically adjusts the search range, and lets SCE-UA iterate:
Round 1: default range -> SCE-UA -> NSE=0.65, x1=1998 (hits upper bound 2000)
LLM diagnosis: "x1 keeps hitting the upper bound; basin storage capacity is
underestimated. Suggest extending to [1, 3000]."
Round 2: new range -> SCE-UA -> NSE=0.74, no boundary-hitting parameters
LLM diagnosis: "NSE acceptable and no parameter at its bound. Converged."
After every successful calibration a basin profile is saved automatically
(basin_profiles/<basin_id>.json). On the next task for the same basin, the profile is
injected into the LLM context to support prior initialization and comparative analysis.
skills/expert_hydrologist/skill.md is an always-injected cognitive skill that encodes a
senior hydrologist's reasoning patterns: three mental models (parameters as hypotheses,
classify-before-search, three-layer error onion), five decision heuristics, and four
honesty boundaries.
When a required capability or data source is outside the current scope, the agent can
extend itself: create_skill generates a new skill package (skill.md + tool
implementation, usable immediately), and create_adapter generates a new PackageAdapter
skeleton and reloads it automatically.
HydroAgent/
├── hydroagent/
│ ├── agent.py # Agentic loop core (ReAct)
│ ├── pipeline.py # Plan-and-Execute pipeline mode (alternative path)
│ ├── llm.py # LLM client (function calling)
│ ├── memory.py # Cross-session memory (session log + MEMORY.md + basin profiles)
│ ├── skill_registry.py # Skill scan, keyword match, cognitive-skill injection
│ ├── skill_states.py # Skill lifecycle state management
│ ├── config.py # Configuration loading
│ ├── adapters/ # PackageAdapter plugin layer (priority-based routing)
│ │ ├── base.py # PackageAdapter base class
│ │ ├── hydromodel/ # hydromodel adapter (GR4J / XAJ / HBV ...)
│ │ ├── hydrodatasource/ # data-source adapter (CAMELS-US)
│ │ └── generic/ # fallback adapter (priority=0)
│ ├── agents/ # sub-agent role definitions (spawned via spawn_agent)
│ ├── interface/
│ │ ├── cli.py # CLI entry + interactive REPL
│ │ ├── ui.py # Rich terminal UI (user / dev modes)
│ │ └── server.py # FastAPI web service
│ ├── tools/ # tool set (auto-discovered & registered)
│ │ ├── validate.py # validate_basin
│ │ ├── simulate.py # run_simulation
│ │ ├── search_memory.py # search_memory
│ │ ├── observe.py # inspect_dir / read_file on-site observation
│ │ ├── ask_user.py # proactive clarification
│ │ ├── task_tools.py # batch task management
│ │ ├── spawn_agent.py # spawn a sub-agent
│ │ ├── create_skill.py # generate a new skill (meta-tool)
│ │ └── create_adapter.py # generate a new adapter (meta-tool)
│ ├── skills/ # skill packages (workflow guide + tool impl)
│ │ ├── calibration/ # standard calibration (SCE-UA / GA / scipy)
│ │ ├── llm_calibration/ # LLM-guided calibration
│ │ ├── evaluation/ # model evaluation
│ │ ├── batch_calibration/ # batch calibration
│ │ ├── model_comparison/ # multi-model comparison
│ │ ├── code_analysis/ # code generation & execution
│ │ ├── visualization/ # visualization
│ │ └── expert_hydrologist/ # hydrologist cognitive framework (always injected)
│ ├── policy/ # behavioral-constraint layer (agent_behavior / calibration_policy ...)
│ ├── knowledge/ # structured domain knowledge
│ │ ├── model_parameters.md # parameter physics & bounds
│ │ ├── failure_modes.md # hydrological failure taxonomy & diagnosis
│ │ ├── calibration_guide.md # calibration experience
│ │ └── datasets.md # dataset notes
│ └── utils/
│ ├── context_utils.py # token estimation & context truncation
│ ├── task_state.py # batch-task state persistence
│ └── basin_validator.py # basin data validation
└── configs/
├── example_private.py # config template (committed)
└── private.py # actual config (gitignored)
| Category | Component |
|---|---|
| LLM | Any OpenAI-compatible endpoint (Qwen / DeepSeek / GPT-4o / local Ollama) |
| Hydrological modeling | hydromodel (GR4J / XAJ / SACSMA) |
| Dataset | CAMELS-US (671 basins, downloaded via aquaFetch) |
| Terminal UI | Rich |
| Web service | FastAPI + WebSocket |
| Python | 3.11+ |
Zhu, S. et al. (2026). Large Language Models as Virtual Hydrologists. Geophysical Research Letters.
The key difference from Zhu et al.: HydroAgent has the LLM adjust the parameter search range (rather than proposing parameter values directly) and pairs it with SCE-UA global optimization, reducing the number of LLM calls by roughly 100x while maintaining equivalent calibration accuracy.
This project is licensed under the PolyForm Noncommercial License 1.0.0, which permits use, modification, and distribution for noncommercial purposes. For commercial use, please contact the author.