This repository accompanies “Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach” and contains the full data-engineering, simulation logic, and model training stack described in the paper.
-
src/data/*.pyimplements the data pipeline: it downloads and processes raw operational logs into ML-ready datasets and builds itinerary files to improve simulation performance. -
src/environment/simulation.pydefines a stochastic network simulator that rolls out multiple sampled trajectories across snapshots in a GPU-parallelized fashion. -
Policies in
src/algorithms/{regression,bc,dcil}/cover classic regressors, behavioral cloning, and the paper’s Drift-Corrected Imitation Learning (DCIL), instantiated with Transformer, MLP, and XGBoost backbones (seesrc/models/). -
The scripts in
src/slurm/andsrc/scripts/reproduce every experiment, analysis plot, and table in the paper, making the entire workflow fully reproducible end-to-end.
- Python: 3.12.10
- Conda (optional, but recommended) or a plain
venv
conda create -n rail-delay-pred python=3.12.10
conda activate rail-delay-pred
pip install --upgrade pip
pip install -r requirements.txtRun the following to fetch the raw data for the period December 2021 through January 2025 (having the month before and after is necessary for simulation):
python -m src.data.download_raw_data 'data/raw' 2021 12 2025 1Submit the SLURM job to process your downloaded files
sbatch src/slurm/data_creation/raw_data_processing.slurmCreate the embeddings for stations and lines
python -m src.data.station_embedding 'data/raw' 'data/embeddings' 2021 12 2025 1 8
python -m src.data.line_embedding 'data/raw' 'data/embeddings/stations_emb_8.pkl' 'data/embeddings' 2021 12 2025 1Create the dataset files (configs, data, metadata)
sbatch src/slurm/data_creation/create_dataset.slurmCreate the itineraries files (used to make the simulator fast)
sbatch src/slurm/data_creation/create_itineraries.slurmCreate the eval configuration
python -m src.data.create_eval_config 'data/eval_configs/cfg.pkl' 30 50 --horizon-obs-bins 0 5 10 15 20 25 30 --delay-delta-bins -5 0 5Now, you can run the experiments:
Transformer regression
sbatch src/slurm/experiments/tr_reg/tr_reg_phase1.slurm
sbatch src/slurm/experiments/tr_reg/tr_reg_phase2.slurmTransformer BC
sbatch src/slurm/experiments/tr_bc/tr_bc_phase1.slurm
sbatch src/slurm/experiments/tr_bc/tr_bc_phase2.slurmTransformer DCIL
sbatch src/slurm/experiments/tr_dcil/tr_dcil_phase1.slurm
sbatch src/slurm/experiments/tr_dcil/tr_dcil_phase2.slurm
sbatch src/slurm/experiments/tr_dcil/tr_dcil_phase3.slurmMLP regression
sbatch src/slurm/experiments/mlp_reg/mlp_reg_phase1.slurm
sbatch src/slurm/experiments/mlp_reg/mlp_reg_phase2.slurmMLP BC
sbatch src/slurm/experiments/mlp_bc/mlp_bc_phase1.slurm
sbatch src/slurm/experiments/mlp_bc/mlp_bc_phase2.slurmMLP DCIL
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_phase1.slurm
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_phase2.slurm
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_phase3.slurmXGBoost regression
sbatch src/slurm/experiments/xgb_reg/xgb_reg_phase1.slurm
sbatch src/slurm/experiments/xgb_reg/xgb_reg_phase2.slurmXGBoost BC
sbatch src/slurm/experiments/xgb_bc/xgb_bc_phase1.slurm
sbatch src/slurm/experiments/xgb_bc/xgb_bc_phase2.slurmIf you would like to visualize the runs (plots and best runs tables) of a tuning phase, you can run for instance:
python -m src.scripts.analyze_experiments "runs/tr_dcil/tuning_phase_3" "mae,hor0,hor1,hor2,hor3,hor4,hor5" "mae" figs/tr_dcil_p3_figs.pdf --param_groups "alpha,beta;traj_len"For args details, see implementation.
Run the following to reproduce the final experiments for the Table 1 and 2.
sbatch src/slurm/experiments/tr_reg/tr_reg_final.slurm
sbatch src/slurm/experiments/tr_bc/tr_bc_final.slurm
sbatch src/slurm/experiments/tr_dcil/tr_dcil_final.slurm
sbatch src/slurm/experiments/mlp_reg/mlp_reg_final.slurm
sbatch src/slurm/experiments/mlp_bc/mlp_bc_final.slurm
sbatch src/slurm/experiments/mlp_dcil/mlp_dcil_final.slurm
sbatch src/slurm/experiments/xgb_reg/xgb_reg_final.slurm
sbatch src/slurm/experiments/xgb_bc/xgb_bc_final.slurmCreate the table (copy latex code from terminal out).
python -m src.scripts.create_tablesRun this to reproduce the uncertainty quantification plot (specifiy your run path and checkpoint number)
python -m src.script.uncert_quant "YOUR_RUN_PATH" "YOUR_CHECKPOINT_NUMBER" 'data/dataset' 'data/itineraries' 'data/eval_configs/cfg.pkl' 'figs/uncertainty_quantification.svg' 1.0The test-set results from the paper (Section 7, Tables 1–2). Metrics are in seconds and averaged over 10 seeds (mean ± std).
This project is licensed under the MIT License.
