A machine learning model that predicts landslide susceptibility across Washington State at 10-meter resolution, trained on geospatial data from 7 sources and validated with rigorous spatial cross-validation.
Named after the geological term for rocky debris at the base of a slope.
- 0.90 AUC-ROC on spatially cross-validated held-out test folds (HUC8 watershed-based splits)
- Oso blind holdout: Model correctly identified 46.8% of landslide pixels as High or Very High risk in the 2014 Oso landslide zone — an area entirely excluded from training — compared to ~15% of stable terrain
- SHAP analysis identified precipitation, elevation, and surface geology as the top predictive features, consistent with known geophysical drivers of slope failure in the Pacific Northwest
- Statewide predictions generated across ~450 million pixels at 10m resolution
Landslides are a persistent natural hazard in Washington State, driven by steep terrain, heavy rainfall, and unstable geology. The 2014 Oso landslide killed 43 people and remains one of the deadliest in U.S. history. Identifying high-risk areas before failure occurs is critical for land-use planning and early warning.
This project builds a pixel-level binary classifier that assigns a landslide probability to every 10m×10m cell across the state, using publicly available geospatial data and gradient-boosted trees.
| Feature | Source | Resolution |
|---|---|---|
| Elevation (DEM) | USGS 3DEP via Google Earth Engine | 10m |
| Slope | Derived from DEM | 10m |
| Aspect (sin/cos) | Derived from DEM | 10m |
| TWI | Computed locally from DEM (D8 flow accumulation) | 10m |
| Precipitation | PRISM 30-year normals via GEE | ~800m, resampled to 10m |
| SAR Backscatter | Sentinel-1 VV, wet season mean via GEE | 10m |
| Surface Geology | WA DNR 1:100K map (1,822 rock type classes) | Vector, rasterized to 10m |
| Landslide Inventory | WA DNR landslide deposits | Vector, rasterized to 10m |
All rasters were aligned to a common grid: EPSG:32610 (UTM Zone 10N), 10m resolution, 60,855 × 40,098 pixels.
- Positive samples: all landslide inventory pixels (excluding Oso holdout zone)
- Negative samples: random non-landslide pixels at 1:10 ratio, with a 100m buffer around landslide polygons
- Oso holdout: 5km buffer around the 2014 slide site, excluded entirely from training
Random cross-validation leaks spatial autocorrelation and inflates metrics by 10–20 points, the most common methodological error in published landslide susceptibility studies. This project uses HUC8 watershed boundaries to define 5 geographically disjoint folds, ensuring the model is always tested on terrain it has never seen.
XGBoost gradient-boosted classifier with tuned hyperparameters:
| Parameter | Value |
|---|---|
| n_estimators | 800 |
| max_depth | 4 |
| learning_rate | 0.2 |
| scale_pos_weight | 20 |
| subsample | 0.8 |
| colsample_bytree | 0.8 |
Training subsampled to ~1.5M rows (maintaining 1:10 ratio) from the full 27M sample matrix.
The 2014 Oso landslide zone (5km buffer) was held out entirely from training. The final model: trained on all other data — was evaluated on this zone to test whether it could flag a catastrophic, real-world failure site as high risk without ever seeing it.
Precipitation dominates, followed by elevation and geology. This aligns with domain knowledge: Washington's landslide-prone areas are concentrated on the wet, mountainous west side of the Cascades, where heavy rainfall saturates unstable soils on steep slopes.
An interactive Folium map is available at outputs/maps/susceptibility_map.html. It shows risk tiers overlaid on OpenStreetMap with a marker for the Oso landslide site.
Talus/
├── data/
│ ├── raw/ # Original .gdb files
│ ├── processed/ # Aligned rasters, feature matrices
│ └── labels/ # Landslide inventory shapefile + mask
├── src/ # All scripts
├── outputs/
│ ├── maps/ # Statewide GeoTIFF + interactive HTML
│ ├── figures/ # SHAP plots, static maps
│ └── models/ # Saved model + metrics JSON
└── notebooks/
- Create the conda environment:
conda create -n talus python=3.11
conda activate talus- Install dependencies:
python -m pip install geopandas rasterio shapely pyproj xgboost scikit-learn imbalanced-learn shap folium matplotlib seaborn numpy- Run scripts in order from
src/.
Note: Raw data files are not included in this repo due to size. DEM, slope, aspect, SAR, and precipitation rasters were exported from Google Earth Engine. The landslide inventory and geology data were downloaded from the Washington DNR.
Python 3.11, XGBoost, scikit-learn, SHAP, rasterio, geopandas, Google Earth Engine, Folium, matplotlib
- Savit Pawar
- Jason Jeong


