Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach

Overview

This repository accompanies “Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach” and contains the full data-engineering, simulation logic, and model training stack described in the paper.

src/data/*.py implements the data pipeline: it downloads and processes raw operational logs into ML-ready datasets and builds itinerary files to improve simulation performance.
src/environment/simulation.py defines a stochastic network simulator that rolls out multiple sampled trajectories across snapshots in a GPU-parallelized fashion.
Policies in src/algorithms/{regression,bc,dcil}/ cover classic regressors, behavioral cloning, and the paper’s Drift-Corrected Imitation Learning (DCIL), instantiated with Transformer, MLP, and XGBoost backbones (see src/models/).
The scripts in src/slurm/ and src/scripts/ reproduce every experiment, analysis plot, and table in the paper, making the entire workflow fully reproducible end-to-end.

Installation

Prerequisites

Python: 3.12.10
Conda (optional, but recommended) or a plain venv

Using Conda

conda create -n rail-delay-pred python=3.12.10
conda activate rail-delay-pred
pip install --upgrade pip
pip install -r requirements.txt

Data Preparation

Download raw data:

Run the following to fetch the raw data for the period December 2021 through January 2025 (having the month before and after is necessary for simulation):

python -m src.data.download_raw_data 'data/raw' 2021 12 2025 1

Process raw data

Submit the SLURM job to process your downloaded files

sbatch src/slurm/data_creation/raw_data_processing.slurm

Create embeddings

Create the embeddings for stations and lines

python -m src.data.station_embedding 'data/raw' 'data/embeddings' 2021 12 2025 1 8
python -m src.data.line_embedding 'data/raw' 'data/embeddings/stations_emb_8.pkl' 'data/embeddings' 2021 12 2025 1

Create dataset files

Create the dataset files (configs, data, metadata)

sbatch src/slurm/data_creation/create_dataset.slurm

Create itineraries

Create the itineraries files (used to make the simulator fast)

sbatch src/slurm/data_creation/create_itineraries.slurm

Create eval config

Create the eval configuration

python -m src.data.create_eval_config 'data/eval_configs/cfg.pkl' 30 50 --horizon-obs-bins 0 5 10 15 20 25 30 --delay-delta-bins -5 0 5

Experiments

Now, you can run the experiments:

Running hyper-parameter tuning

Transformer regression

sbatch src/slurm/experiments/tr_reg/tr_reg_phase1.slurm
sbatch src/slurm/experiments/tr_reg/tr_reg_phase2.slurm

Transformer BC

sbatch src/slurm/experiments/tr_bc/tr_bc_phase1.slurm
sbatch src/slurm/experiments/tr_bc/tr_bc_phase2.slurm

Transformer DCIL

sbatch src/slurm/experiments/tr_dcil/tr_dcil_phase1.slurm
sbatch src/slurm/experiments/tr_dcil/tr_dcil_phase2.slurm
sbatch src/slurm/experiments/tr_dcil/tr_dcil_phase3.slurm

MLP regression

sbatch src/slurm/experiments/mlp_reg/mlp_reg_phase1.slurm
sbatch src/slurm/experiments/mlp_reg/mlp_reg_phase2.slurm

MLP BC

sbatch src/slurm/experiments/mlp_bc/mlp_bc_phase1.slurm
sbatch src/slurm/experiments/mlp_bc/mlp_bc_phase2.slurm

MLP DCIL

sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_phase1.slurm
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_phase2.slurm
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_phase3.slurm

XGBoost regression

sbatch src/slurm/experiments/xgb_reg/xgb_reg_phase1.slurm
sbatch src/slurm/experiments/xgb_reg/xgb_reg_phase2.slurm

XGBoost BC

sbatch src/slurm/experiments/xgb_bc/xgb_bc_phase1.slurm
sbatch src/slurm/experiments/xgb_bc/xgb_bc_phase2.slurm

If you would like to visualize the runs (plots and best runs tables) of a tuning phase, you can run for instance:

python -m src.scripts.analyze_experiments "runs/tr_dcil/tuning_phase_3" "mae,hor0,hor1,hor2,hor3,hor4,hor5" "mae" figs/tr_dcil_p3_figs.pdf --param_groups "alpha,beta;traj_len"

For args details, see implementation.

Running final evaluations on test set

Run the following to reproduce the final experiments for the Table 1 and 2.

sbatch src/slurm/experiments/tr_reg/tr_reg_final.slurm
sbatch src/slurm/experiments/tr_bc/tr_bc_final.slurm
sbatch src/slurm/experiments/tr_dcil/tr_dcil_final.slurm
sbatch src/slurm/experiments/mlp_reg/mlp_reg_final.slurm
sbatch src/slurm/experiments/mlp_bc/mlp_bc_final.slurm
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_final.slurm
sbatch src/slurm/experiments/xgb_reg/xgb_reg_final.slurm
sbatch src/slurm/experiments/xgb_bc/xgb_bc_final.slurm

Creating tables

Create the table (copy latex code from terminal out).

python -m src.scripts.create_tables

Creating uncertainty quantification plot

Run this to reproduce the uncertainty quantification plot (specifiy your run path and checkpoint number)

python -m src.script.uncert_quant "YOUR_RUN_PATH" "YOUR_CHECKPOINT_NUMBER" 'data/dataset' 'data/itineraries' 'data/eval_configs/cfg.pkl' 'figs/uncertainty_quantification.svg' 1.0

Results

The test-set results from the paper (Section 7, Tables 1–2). Metrics are in seconds and averaged over 10 seeds (mean ± std).

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figs		figs
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
appendix.pdf		appendix.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach

Overview

Table of Contents

Installation

Prerequisites

Using Conda

Data Preparation

Download raw data:

Process raw data

Create embeddings

Create dataset files

Create itineraries

Create eval config

Experiments

Running hyper-parameter tuning

Running final evaluations on test set

Creating tables

Creating uncertainty quantification plot

Results

License

About

Uh oh!

Releases

Packages

Languages

License

orailix/rail-delay-simulator

Folders and files

Latest commit

History

Repository files navigation

Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach

Overview

Table of Contents

Installation

Prerequisites

Using Conda

Data Preparation

Download raw data:

Process raw data

Create embeddings

Create dataset files

Create itineraries

Create eval config

Experiments

Running hyper-parameter tuning

Running final evaluations on test set

Creating tables

Creating uncertainty quantification plot

Results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages