Skip to content

Sapphirine/MTANetworkFlow

Repository files navigation

NYC Subway Ridership Forecasting

Hourly predictions per station using ARX modeling, Spark, and geospatial visualization

This repository contains the full data pipeline, modeling code, and visualization tools for forecasting hourly subway ridership for every station in New York City. My project ingests large-scale historical ridership data, performs feature engineering and parallelized + rolling model fitting, and generates geospatial animations of predicted flows across the NYC subway network.


Project Overview

We build an autoregressive model with exogenous features (ARX) to predict station-level ridership, achieving:

  • R2 > 0.9 on ~97% of all stations
  • A fast, scalable training + inference pipeline using Apache Spark
  • Geospatial visualizations showing hourly ridership patterns, weekday/weekend dynamics, and commuter surges

This work relies on:

  • Polars for streaming & partitioning a 200M+ row ridership dataset
  • Spark for parallelized ARX model fitting
  • GeoPandas / Plotly / CartoDB tiles for visualizing subway flow

All details and experiments are documented in the included final report.

About

project by Albert Wen

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors