Skip to content

COeXISTENCE-PROJECT/URB

Repository files navigation

GitHub Release DOI Dataset Citation

Test scripts Test baselines GitHub License Open in Code Ocean

Official leaderboard

Urban Routing Benchmark: Benchmarking MARL algorithms on the fleet routing tasks

Connected Autonomous Vehicles (CAVs) have the potential to transform urban mobility by alleviating congestion through optimized, intelligent routing. Unlike human drivers, CAVs leverage collective, data-driven policies generated by machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing. To that end, we present URB: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles. URB is a comprehensive benchmarking environment that unifies evaluation across 29 real-world traffic networks paired with realistic demand patterns. URB comes with a catalog of predefined tasks, multi-agent RL (MARL) algorithm implementations, baseline methods, ten domain-specific performance metrics, and a modular configuration scheme.

Through this broad experimental scheme, URB aims to:

  1. Identify which state-of-the-art algorithms outperform others in this class of tasks,
  2. Drive competition for future algorithmic improvements, and
  3. Clarify the impact of collective CAV routing on congestion, emissions, and sustainability in future cities, equipping policymakers with solid arguments for CAV regulations.

🔗 Workflow

URB (as depicted in the above figure):

  • Runs an experiment script using RouteRL,
  • With the selected algorithm (e.g., an RL algorithm or a baseline method),
  • Opens environment, algorithm and task configuration files from config/,
  • Loads the network and demand from networks/,
  • Executes the scenario defined in the script and configurations, and
  • When the training is finished, it uses raw results to compute a wide-set of KPIs.

📝 Task coverage

An example URB experiment can be defined as:

In the town of Nemours inhabited only by human drivers, at some point, a given share of drivers mutate to CAVs and delegate routing decisions. Then, for a period of time, the CAV agents develop routing strategies to minimize their delay using MARL. This process causes disturbances and influences the traffic efficiency and human travel experience.

URB can accommodate a wide variety of task specifications, including:

  1. Full or mixed autonomy
  2. CAV fleet following different behavior profiles, including: malicious, altruistic, selfish, etc.
  3. Varying complexity of traffic networks and demand patterns
  4. Human behavior models: probabilistic vs. greedy
  5. Human adaptations: drivers react to actions of the fleet and change their behaviour
  6. And many more!

🏙️ Traffic network and demand data

With this repository, URB comes with 6 traffic networks and associated demand data to experiment with. Two examples:

Gretz Armainvilliers Nangis
Gretz Armainvilliers Nangis

More networks and demand data are available here. User can download the network folder of their choice, place the folder in networks/, and use it as described below.


📦 Setup

Quickstart: Code Ocean Capsule

Open in Code Ocean

For a quickstart interaction with URB, we provide an executable code capsule on Code Ocean that runs a concise demonstrative experiment using the QMIX algorithm in the St. Arnoult network.

This environment includes all necessary dependencies (including SUMO) preinstalled, enabling reproducibility with a single click. We invite those interested to explore this capsule to examine the experimental workflow and output formats in a fully isolated and controlled setting.

  1. Visit the capsule link.
  2. Create a free CodeOcean account (if you don’t have one).
  3. Click Reproducible Run to execute the code in a controlled and reproducible environment.

Prerequisites

Make sure you have SUMO installed in your system. This procedure should be carried out separately, by following the instructions provided here.

Cloning repository

Clone the URB repository from github by

git clone https://github.com/COeXISTENCE-PROJECT/URB.git

Creating enviroment for URB

  • Option 1 (Recommended): Create a virtual enviroment with venv:
python3.13.1 -m venv .venv

and then install dependencies by:

cd URB
pip install --force-reinstall --no-cache-dir -r requirements.txt
  • Option 2 (Alternative): Use conda environment with conda:
conda create -n URB python=3.13.1

and then install dependencies by:

cd URB
conda activate URB
pip install --force-reinstall --no-cache-dir -r requirements.txt

🔬 Running experiments

Usage of URB for Reinforcement Learning algorithms

To use URB while using RL algorithm, you have to provide in the command line the following command:

python scripts/<script_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed> --torch-seed <torch_seed>

where

  • <scipt_name> is the script you wish to run, available scripts are ippo_torchrl, iql_torchrl, mappo_torchrl, vdn_torchrl, qmix_torchrl, iql, ippo,
  • <exp_id> is your own experiment identifier, for instance random_ing,
  • <hyperparam_id> is the hyperparameterization identifier, it must correspond to a .json filename (without extension) in config/algo_config. Provided scripts automatically select the algorithm-specific subfolder in this directory.
  • <env_conf_id> is the environment configuration identifier. It must correspond to a .json filename (without extension) in config/env_config. It is used to parameterize environment-specific processes, such as path generation, disk operations, etc. It is optional and by default is set to config1.
  • <task_id> is the task configuration identifier. It must correspond to a .json filename (without extension) in config/task_config. It is used to parameterize the simulated scenario, such as portion of AVs, duration of human learning, AV behavior, etc.
  • <net_name> is the name of the network you wish to use. Must be one of the folder names in networks/ i.e. ingolstadt_custom, nangis, nemours, provins or saint_arnoult,
  • <env_seed> is reproducibility random seed for the traffic environment, default seed is set to be 42,
  • <torch_seed> is reproducibility random seed for PyTorch, it is optional and by default is set to 42.

For example, the following command runs an experiment using:

  • QMIX algorithm, hyperparameterized by config/algo_config/qmix_torchrl/config3.json,
  • The task specified in config/task_config/config4.json,
  • The environment parameterization specified in config/env_config/config1.json (by default),
  • Experiment identifier sai_qmix_0, which will be used as the folder name in results/ to save the experiment data,
  • Saint Arnoult network and demand, from networks/saint_arnoult,
  • Environment (also used for random and numpy) and PyTorch seeds 42 and 0, respectively.
python scripts/qmix_torchrl.py --id sai_qmix_0 --alg-conf config3 --task-conf config4 --net saint_arnoult --env-seed 42 --torch-seed 0

Usage URB for baselines

Depending on the selected baseline method, it can be run as follows:

  • Decentralized (per-agent) methods: executed via scripts/baselines.py with the --model <model_name> flag, without specifying --torch_seed. In these methods, each agent trains its own separate model using only agent-level observations. Currently available options are: aon, random, and gawron.
  • Centralized methods: executed in the same manner as RL scripts via scripts/<alg_name>.py, without the
    --torch_seed flag. These methods use a centralized data structure or model instead of per-agent models or observations. Currently available: greedy.

Command:

# Decentralized (aon, random, gawron)
python scripts/baselines.py --model <model_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed> 
# Centralized (greedy)
python scripts/<script_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed>

Examples:

# aon
python scripts/baselines.py --model aon --id ing_aon --alg-conf config1 --task-conf config2 --net ingolstadt_custom --env-seed 42 
# greedy
python scripts/greedy.py --id ing_greedy --alg-conf config1 --task-conf config2 --net ingolstadt_custom --env-seed 42

📊 Calculating metrics and indicators

Each experiment outputs set of raw records, which are then processed with the script in this folder for a set of performance indicators which we report and several additional metrics that track the quality of the solution and its impact to the system.

All experiment scripts in scripts/ now automatically run analysis/metrics.py at the end of execution. Manual execution is still supported as described below.

Usage

To use the analysis script, you have to provide in the command line the following command:

python analysis/metrics.py --id <exp_id> --verbose <verbose> --results-folder <results-folder> --skip-clearing <skip-clearing> --skip-collecting <skip-collecting>

that will collect the results from the experiment with identifier <exp_id> and save them in the folder <exp_id>/metrics/. The --verbose flag is optional and if set to True will print additional information about the analysis process. Flag --results-folder is optional and if set to True will use the folder <results-folder> instead of the default one results/. The flags --skip-clearing and --skip-collecting are optional and if set to True will skip clearing and collecting the results from the experiment, respectively. Those operations have to be done only once, so if you are running the analysis script multiple times, you can skip them.

Loss values from learning scripts are saved in a unified CSV format at: results/<exp_id>/losses/losses.csv

Reported indicators


The core metric is the travel time $t$, which is both the core term of the utility for human drivers (rational utility maximizers) and of the CAVs reward. We report the average travel time for the system $\hat{t}$, human drivers $\hat{t}_{HDV}$, and autonomous vehicles $\hat{t}_{CAV}$. We record each during the training, testing phase and for 50 days before CAVs are introduced to the system ( $\hat{t}^{train}, \hat{t}^{test}$, $\hat{t}^{pre}$). Using these values, we introduce:

  • CAV advantage as $\hat{t}_{HDV}^{post}$ / $\hat{t}_{CAV}$,
  • Effect of changing to CAV as ${\hat{t}_{HDV}^{pre}}/{\hat{t}_{CAV}}$, and
  • Effect of remaining HDV as ${\hat{t}_{HDV}^{pre}}/{\hat{t}_{HDV}^{test}}$, which reflect the relative performance of HDVs and the CAV fleet from the point of view of individual agents.

To better understand the causes of the changes in travel time, we track the Average speed and Average mileage (directly extracted from SUMO).

We measure the Cost of training, expressed as the average of: $\sum_{\tau \in train}(t^\tau_a - \hat{t}^{pre}_a)$ over all agents $a$, i.e. the cumulated disturbance that CAV cause during the training period. We define $c_{CAV}$ and $c_{HDV}$ accordingly. We call an experiment won by CAVs if their policy was on average faster than human drivers' behaviour. A final winrate is a percentage of runs that were won by CAVs.

💎 Extending URB

We provide templates for extending the possible experiments that can be conducted using URB.

Adding new scripts

Users can add new experiment scripts for testing different algorithms, different implementations and different training pipelines. The recommended script structure is provided in scripts/base_script.py.

Adding new baselines

  • Decentralized (per-agent) baseline models: users can define and use their own methods by creating a new model that extends baseline_models/BaseLearningModel.
  • Centralized baseline models (or methods that cannot be reasonably implemented in a per-agent manner): the recommended approach is to implement them using the universal scripts/base_script.py template, in the same way as RL methods.

New scenarios and hyperparameterizations

Users can extend possible experiment configurations by adding:


🔎 Results

Below are some results from three networks and two scenarios, reported in our paper URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles.

Task 1 - Mixed autonomy: In a given network with a fixed demand pattern, experienced human agents have learned their route-choice strategies (minimized travel times). At some point, a given share of them mutate to CAVs and delegate routing decisions. Then, for a given number of training episodes, the agents develop routing strategies to minimize their delay using MARL.

Travel times relative to human baseline ($t^{pre}$) across episodes in 3 instances (90 to 420 agents). Mean±CI for five runs.

Table 1: Scenario 1 results for three cities (mean±std over five seeded runs).
Pre-CAV mean travel times ($t_{pre}$) are constant per network: St. Arnoult: 3.15, Provins: 2.8, Ingolstadt: 4.21.

City Algorithm $t_{test}$ $t_{CAV}$ $t_{HDV}$ $c_{all}$ $c_{HDV}$ $c_{CAV}$ $\Delta_V$ $\Delta_L$ $\mathbf{WR}$
St. Arnoult IPPO 3.28 (0.004) 3.33 (0.013) 3.25 (0.008) 0.63 (0.015) 0.13 (0.004) 1.38 (0.034) -0.24 (0.067) 0.06 (0.004) 0%
IQL 3.36 (0.040) 3.53 (0.104) 3.24 (0.005) 0.66 (0.000) 0.14 (0.000) 1.44 (0.004) -0.37 (0.115) 0.09 (0.021) 0%
MAPPO 3.35 (0.049) 3.51 (0.121) 3.25 (0.004) 0.66 (0.000) 0.14 (0.004) 1.45 (0.000) -0.27 (0.129) 0.09 (0.019) 0%
QMIX 3.24 (0.080) 3.21 (0.206) 3.25 (0.004) 0.65 (0.004) 0.14 (0.005) 1.43 (0.005) -0.22 (0.034) 0.03 (0.040) 80%
Greedy 3.15 3.01 3.24 0.02 0.02 0.02 0.01 0.00 100%
Human 3.15 N/A 3.15 N/A N/A N/A 0.00 0.00 100%
AON 3.15 3.01 3.25 0.55 0.09 1.21 -0.06 0.00 100%
Random 3.38 3.58 3.25 0.60 0.09 1.36 -0.33 0.10 0%
Provins IPPO 2.90 (0.015) 2.98 (0.040) 2.85 (0.004) 0.61 (0.271) 0.31 (0.217) 1.05 (0.356) -0.52 (0.080) 0.05 (0.009) 0%
IQL 2.91 (0.011) 3.01 (0.027) 2.85 (0.008) 1.40 (0.104) 0.92 (0.068) 2.12 (0.183) -0.58 (0.093) 0.05 (0.007) 0%
MAPPO 2.93 (0.011) 3.05 (0.024) 2.84 (0.005) 1.29 (0.162) 0.83 (0.110) 2.00 (0.247) -0.69 (0.038) 0.06 (0.004) 0%
QMIX 2.96 (0.005) 3.14 (0.000) 2.85 (0.000) 0.85 (0.215) 0.52 (0.176) 1.35 (0.278) -0.82 (0.033) 0.08 (0.000) 0%
Greedy 2.80 2.74 2.84 0.05 0.05 0.06 0.01 0.00 100%
Human 2.80 N/A 2.80 N/A N/A N/A 0.00 0.00 100%
AON 2.81 2.76 2.84 0.47 0.19 0.99 -0.14 0.00 100%
Random 2.93 3.04 2.85 0.51 0.22 0.95 -0.62 0.06 0%
Ingolstadt IPPO 4.41 (0.005) 4.71 (0.030) 4.21 (0.023) 2.42 (0.497) 1.90 (0.505) 3.19 (0.495) -0.52 (0.095) 0.06 (0.004) 0%
IQL 4.46 (0.009) 4.81 (0.024) 4.23 (0.009) 2.54 (0.546) 1.93 (0.533) 3.44 (0.562) -0.69 (0.067) 0.07 (0.000) 0%
MAPPO 4.45 (0.011) 4.82 (0.019) 4.21 (0.008) 2.76 (0.599) 2.16 (0.622) 3.66 (0.562) -0.72 (0.066) 0.07 (0.004) 0%
QMIX 4.50 (0.140) 4.87 (0.325) 4.24 (0.015) 1.83 (0.749) 1.27 (0.710) 2.67 (0.810) -0.97 (0.235) 0.06 (0.045) 0%
Greedy 4.22 4.24 4.20 0.26 0.25 0.27 -0.06 0.00 0%
Human 4.21 N/A 4.21 N/A N/A N/A 0.00 0.00 100%
AON 4.29 4.37 4.23 0.87 0.55 0.24 -0.45 -0.01 0%
Random 4.45 4.81 4.22 0.99 0.49 1.74 -0.68 0.07 0%

Task 2 - Full autonomy: In St. Arnoult, what happens when all drivers mutate to CAVs and delegate routing decisions?

Note

The parameterization details for the above results are provided in the experiment data repository.


📖 Citation

If you use this software, please cite it as below.

@inproceedings{URB,
  title={URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles},
  author={Akman, Ahmet Onur and Psarou, Anastasia and Hoffmann, Micha{\l} and Gorczyca, {\L}ukasz and Kowalski, {\L}ukasz and Gora, Pawe{\l} and Jamr{\'o}z, Grzegorz and Kucharski, Rafa{\l}},
  booktitle={Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025) Datasets and Benchmarks Track},
  month={December},
  year={2025}
}

Credits

URB is part of COeXISTENCE (ERC Starting Grant, grant agreement No 101075838) and is a team work at Jagiellonian University in Kraków, Poland by: Ahmet Onur Akman, Anastasia Psarou, Łukasz Gorczyca, Michał Hoffmann, Lukasz Kowalski, Paweł Gora, and Grzegorz Jamróz, within the research group of Rafał Kucharski.

About

URB - Urban Routing Benchmark - Benchmarking MARL algorithms on the fleet routing tasks.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors