Skip to content

jeonjas25/Talus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Talus: Landslide Susceptibility Prediction for Washington State

A machine learning model that predicts landslide susceptibility across Washington State at 10-meter resolution, trained on geospatial data from 7 sources and validated with rigorous spatial cross-validation.

Named after the geological term for rocky debris at the base of a slope.

Statewide Risk Map


Key Results

  • 0.90 AUC-ROC on spatially cross-validated held-out test folds (HUC8 watershed-based splits)
  • Oso blind holdout: Model correctly identified 46.8% of landslide pixels as High or Very High risk in the 2014 Oso landslide zone — an area entirely excluded from training — compared to ~15% of stable terrain
  • SHAP analysis identified precipitation, elevation, and surface geology as the top predictive features, consistent with known geophysical drivers of slope failure in the Pacific Northwest
  • Statewide predictions generated across ~450 million pixels at 10m resolution

Background

Landslides are a persistent natural hazard in Washington State, driven by steep terrain, heavy rainfall, and unstable geology. The 2014 Oso landslide killed 43 people and remains one of the deadliest in U.S. history. Identifying high-risk areas before failure occurs is critical for land-use planning and early warning.

This project builds a pixel-level binary classifier that assigns a landslide probability to every 10m×10m cell across the state, using publicly available geospatial data and gradient-boosted trees.


Data Sources

Feature Source Resolution
Elevation (DEM) USGS 3DEP via Google Earth Engine 10m
Slope Derived from DEM 10m
Aspect (sin/cos) Derived from DEM 10m
TWI Computed locally from DEM (D8 flow accumulation) 10m
Precipitation PRISM 30-year normals via GEE ~800m, resampled to 10m
SAR Backscatter Sentinel-1 VV, wet season mean via GEE 10m
Surface Geology WA DNR 1:100K map (1,822 rock type classes) Vector, rasterized to 10m
Landslide Inventory WA DNR landslide deposits Vector, rasterized to 10m

All rasters were aligned to a common grid: EPSG:32610 (UTM Zone 10N), 10m resolution, 60,855 × 40,098 pixels.


Methodology

Sampling

  • Positive samples: all landslide inventory pixels (excluding Oso holdout zone)
  • Negative samples: random non-landslide pixels at 1:10 ratio, with a 100m buffer around landslide polygons
  • Oso holdout: 5km buffer around the 2014 slide site, excluded entirely from training

Spatial Cross-Validation

Random cross-validation leaks spatial autocorrelation and inflates metrics by 10–20 points, the most common methodological error in published landslide susceptibility studies. This project uses HUC8 watershed boundaries to define 5 geographically disjoint folds, ensuring the model is always tested on terrain it has never seen.

Model

XGBoost gradient-boosted classifier with tuned hyperparameters:

Parameter Value
n_estimators 800
max_depth 4
learning_rate 0.2
scale_pos_weight 20
subsample 0.8
colsample_bytree 0.8

Training subsampled to ~1.5M rows (maintaining 1:10 ratio) from the full 27M sample matrix.

Oso Blind Holdout

The 2014 Oso landslide zone (5km buffer) was held out entirely from training. The final model: trained on all other data — was evaluated on this zone to test whether it could flag a catastrophic, real-world failure site as high risk without ever seeing it.


Feature Importance (SHAP)

SHAP Bar Plot

SHAP Beeswarm

Precipitation dominates, followed by elevation and geology. This aligns with domain knowledge: Washington's landslide-prone areas are concentrated on the wet, mountainous west side of the Cascades, where heavy rainfall saturates unstable soils on steep slopes.


Interactive Map

An interactive Folium map is available at outputs/maps/susceptibility_map.html. It shows risk tiers overlaid on OpenStreetMap with a marker for the Oso landslide site.


Project Structure

Talus/
├── data/
│   ├── raw/              # Original .gdb files
│   ├── processed/        # Aligned rasters, feature matrices
│   └── labels/           # Landslide inventory shapefile + mask
├── src/                  # All scripts
├── outputs/
│   ├── maps/             # Statewide GeoTIFF + interactive HTML
│   ├── figures/          # SHAP plots, static maps
│   └── models/           # Saved model + metrics JSON
└── notebooks/

How to Reproduce

  1. Create the conda environment:
conda create -n talus python=3.11
conda activate talus
  1. Install dependencies:
python -m pip install geopandas rasterio shapely pyproj xgboost scikit-learn imbalanced-learn shap folium matplotlib seaborn numpy
  1. Run scripts in order from src/.

Note: Raw data files are not included in this repo due to size. DEM, slope, aspect, SAR, and precipitation rasters were exported from Google Earth Engine. The landslide inventory and geology data were downloaded from the Washington DNR.


Tech Stack

Python 3.11, XGBoost, scikit-learn, SHAP, rasterio, geopandas, Google Earth Engine, Folium, matplotlib


Team

  • Savit Pawar
  • Jason Jeong

About

Landslide prediction model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages