Urban Routing Benchmark: Benchmarking MARL algorithms on the fleet routing tasks

Connected Autonomous Vehicles (CAVs) have the potential to transform urban mobility by alleviating congestion through optimized, intelligent routing. Unlike human drivers, CAVs leverage collective, data-driven policies generated by machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing. To that end, we present URB: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles. URB is a comprehensive benchmarking environment that unifies evaluation across 29 real-world traffic networks paired with realistic demand patterns. URB comes with a catalog of predefined tasks, multi-agent RL (MARL) algorithm implementations, baseline methods, ten domain-specific performance metrics, and a modular configuration scheme.

Through this broad experimental scheme, URB aims to:

Identify which state-of-the-art algorithms outperform others in this class of tasks,
Drive competition for future algorithmic improvements, and
Clarify the impact of collective CAV routing on congestion, emissions, and sustainability in future cities, equipping policymakers with solid arguments for CAV regulations.

🔗 Workflow

URB (as depicted in the above figure):

Runs an experiment script using RouteRL,
With the selected algorithm (e.g., an RL algorithm or a baseline method),
Opens environment, algorithm and task configuration files from config/,
Loads the network and demand from networks/,
Executes the scenario defined in the script and configurations, and
When the training is finished, it uses raw results to compute a wide-set of KPIs.

📝 Task coverage

An example URB experiment can be defined as:

In the town of Nemours inhabited only by human drivers, at some point, a given share of drivers mutate to CAVs and delegate routing decisions. Then, for a period of time, the CAV agents develop routing strategies to minimize their delay using MARL. This process causes disturbances and influences the traffic efficiency and human travel experience.

URB can accommodate a wide variety of task specifications, including:

Full or mixed autonomy
CAV fleet following different behavior profiles, including: malicious, altruistic, selfish, etc.
Varying complexity of traffic networks and demand patterns
Human behavior models: probabilistic vs. greedy
Human adaptations: drivers react to actions of the fleet and change their behaviour
And many more!

🏙️ Traffic network and demand data

With this repository, URB comes with 6 traffic networks and associated demand data to experiment with. Two examples:


Gretz Armainvilliers	Nangis

More networks and demand data are available here. User can download the network folder of their choice, place the folder in networks/, and use it as described below.

📦 Setup

Quickstart: Code Ocean Capsule

For a quickstart interaction with URB, we provide an executable code capsule on Code Ocean that runs a concise demonstrative experiment using the QMIX algorithm in the St. Arnoult network.

This environment includes all necessary dependencies (including SUMO) preinstalled, enabling reproducibility with a single click. We invite those interested to explore this capsule to examine the experimental workflow and output formats in a fully isolated and controlled setting.

Visit the capsule link.
Create a free CodeOcean account (if you don’t have one).
Click Reproducible Run to execute the code in a controlled and reproducible environment.

Prerequisites

Make sure you have SUMO installed in your system. This procedure should be carried out separately, by following the instructions provided here.

Cloning repository

Clone the URB repository from github by

git clone https://github.com/COeXISTENCE-PROJECT/URB.git

Creating enviroment for URB

Option 1 (Recommended): Create a virtual enviroment with venv:

python3.13.1 -m venv .venv

and then install dependencies by:

cd URB
pip install --force-reinstall --no-cache-dir -r requirements.txt

Option 2 (Alternative): Use conda environment with conda:

conda create -n URB python=3.13.1

and then install dependencies by:

cd URB
conda activate URB
pip install --force-reinstall --no-cache-dir -r requirements.txt

🔬 Running experiments

Usage of URB for Reinforcement Learning algorithms

To use URB while using RL algorithm, you have to provide in the command line the following command:

python scripts/<script_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed> --torch-seed <torch_seed>

where

<scipt_name> is the script you wish to run, available scripts are ippo_torchrl, iql_torchrl, mappo_torchrl, vdn_torchrl, qmix_torchrl, iql, ippo,
<exp_id> is your own experiment identifier, for instance random_ing,
<hyperparam_id> is the hyperparameterization identifier, it must correspond to a .json filename (without extension) in config/algo_config. Provided scripts automatically select the algorithm-specific subfolder in this directory.
<env_conf_id> is the environment configuration identifier. It must correspond to a .json filename (without extension) in config/env_config. It is used to parameterize environment-specific processes, such as path generation, disk operations, etc. It is optional and by default is set to config1.
<task_id> is the task configuration identifier. It must correspond to a .json filename (without extension) in config/task_config. It is used to parameterize the simulated scenario, such as portion of AVs, duration of human learning, AV behavior, etc.
<net_name> is the name of the network you wish to use. Must be one of the folder names in networks/ i.e. ingolstadt_custom, nangis, nemours, provins or saint_arnoult,
<env_seed> is reproducibility random seed for the traffic environment, default seed is set to be 42,
<torch_seed> is reproducibility random seed for PyTorch, it is optional and by default is set to 42.

For example, the following command runs an experiment using:

QMIX algorithm, hyperparameterized by config/algo_config/qmix_torchrl/config3.json,
The task specified in config/task_config/config4.json,
The environment parameterization specified in config/env_config/config1.json (by default),
Experiment identifier sai_qmix_0, which will be used as the folder name in results/ to save the experiment data,
Saint Arnoult network and demand, from networks/saint_arnoult,
Environment (also used for random and numpy) and PyTorch seeds 42 and 0, respectively.

python scripts/qmix_torchrl.py --id sai_qmix_0 --alg-conf config3 --task-conf config4 --net saint_arnoult --env-seed 42 --torch-seed 0

Usage URB for baselines

Depending on the selected baseline method, it can be run as follows:

Decentralized (per-agent) methods: executed via scripts/baselines.py with the --model <model_name> flag, without specifying --torch_seed. In these methods, each agent trains its own separate model using only agent-level observations. Currently available options are: aon, random, and gawron.
Centralized methods: executed in the same manner as RL scripts via scripts/<alg_name>.py, without the
--torch_seed flag. These methods use a centralized data structure or model instead of per-agent models or observations. Currently available: greedy.

Command:

# Decentralized (aon, random, gawron)
python scripts/baselines.py --model <model_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed>

# Centralized (greedy)
python scripts/<script_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed>

Examples:

# aon
python scripts/baselines.py --model aon --id ing_aon --alg-conf config1 --task-conf config2 --net ingolstadt_custom --env-seed 42

# greedy
python scripts/greedy.py --id ing_greedy --alg-conf config1 --task-conf config2 --net ingolstadt_custom --env-seed 42

📊 Calculating metrics and indicators

Each experiment outputs set of raw records, which are then processed with the script in this folder for a set of performance indicators which we report and several additional metrics that track the quality of the solution and its impact to the system.

All experiment scripts in scripts/ now automatically run analysis/metrics.py at the end of execution. Manual execution is still supported as described below.

Usage

To use the analysis script, you have to provide in the command line the following command:

python analysis/metrics.py --id <exp_id> --verbose <verbose> --results-folder <results-folder> --skip-clearing <skip-clearing> --skip-collecting <skip-collecting>

that will collect the results from the experiment with identifier <exp_id> and save them in the folder <exp_id>/metrics/. The --verbose flag is optional and if set to True will print additional information about the analysis process. Flag --results-folder is optional and if set to True will use the folder <results-folder> instead of the default one results/. The flags --skip-clearing and --skip-collecting are optional and if set to True will skip clearing and collecting the results from the experiment, respectively. Those operations have to be done only once, so if you are running the analysis script multiple times, you can skip them.

Loss values from learning scripts are saved in a unified CSV format at: results/<exp_id>/losses/losses.csv

Reported indicators

The core metric is the travel time $t$, which is both the core term of the utility for human drivers (rational utility maximizers) and of the CAVs reward. We report the average travel time for the system $\hat{t}$, human drivers $\hat{t}_{HDV}$, and autonomous vehicles $\hat{t}_{CAV}$. We record each during the training, testing phase and for 50 days before CAVs are introduced to the system ( $\hat{t}^{train}, \hat{t}^{test}$, $\hat{t}^{pre}$). Using these values, we introduce:

CAV advantage as $\hat{t}_{HDV}^{post}$ / $\hat{t}_{CAV}$,
Effect of changing to CAV as ${\hat{t}_{HDV}^{pre}}/{\hat{t}_{CAV}}$, and
Effect of remaining HDV as ${\hat{t}_{HDV}^{pre}}/{\hat{t}_{HDV}^{test}}$, which reflect the relative performance of HDVs and the CAV fleet from the point of view of individual agents.

To better understand the causes of the changes in travel time, we track the Average speed and Average mileage (directly extracted from SUMO).

We measure the Cost of training, expressed as the average of: $\sum_{\tau \in train}(t^\tau_a - \hat{t}^{pre}_a)$ over all agents $a$, i.e. the cumulated disturbance that CAV cause during the training period. We define $c_{CAV}$ and $c_{HDV}$ accordingly. We call an experiment won by CAVs if their policy was on average faster than human drivers' behaviour. A final winrate is a percentage of runs that were won by CAVs.

💎 Extending URB

We provide templates for extending the possible experiments that can be conducted using URB.

Adding new scripts

Users can add new experiment scripts for testing different algorithms, different implementations and different training pipelines. The recommended script structure is provided in scripts/base_script.py.

Adding new baselines

Decentralized (per-agent) baseline models: users can define and use their own methods by creating a new model that extends baseline_models/BaseLearningModel.
Centralized baseline models (or methods that cannot be reasonably implemented in a per-agent manner): the recommended approach is to implement them using the universal scripts/base_script.py template, in the same way as RL methods.

New scenarios and hyperparameterizations

Users can extend possible experiment configurations by adding:

Algorithm hyperparameterization in config/algo_config,
Experiment setting in config/env_config, and
New tasks in config/task_config.

🔎 Results

Below are some results from three networks and two scenarios, reported in our paper URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles.

Task 1 - Mixed autonomy: In a given network with a fixed demand pattern, experienced human agents have learned their route-choice strategies (minimized travel times). At some point, a given share of them mutate to CAVs and delegate routing decisions. Then, for a given number of training episodes, the agents develop routing strategies to minimize their delay using MARL.

Travel times relative to human baseline ($t^{pre}$) across episodes in 3 instances (90 to 420 agents). Mean±CI for five runs.

Table 1: Scenario 1 results for three cities (mean±std over five seeded runs).
Pre-CAV mean travel times ($t_{pre}$) are constant per network: St. Arnoult: 3.15, Provins: 2.8, Ingolstadt: 4.21.

City	Algorithm	$t_{test}$	$t_{CAV}$	$t_{HDV}$	$c_{all}$	$c_{HDV}$	$c_{CAV}$	$\Delta_V$	$\Delta_L$	$\mathbf{WR}$
St. Arnoult	IPPO	3.28 (0.004)	3.33 (0.013)	3.25 (0.008)	0.63 (0.015)	0.13 (0.004)	1.38 (0.034)	-0.24 (0.067)	0.06 (0.004)	0%
	IQL	3.36 (0.040)	3.53 (0.104)	3.24 (0.005)	0.66 (0.000)	0.14 (0.000)	1.44 (0.004)	-0.37 (0.115)	0.09 (0.021)	0%
	MAPPO	3.35 (0.049)	3.51 (0.121)	3.25 (0.004)	0.66 (0.000)	0.14 (0.004)	1.45 (0.000)	-0.27 (0.129)	0.09 (0.019)	0%
	QMIX	3.24 (0.080)	3.21 (0.206)	3.25 (0.004)	0.65 (0.004)	0.14 (0.005)	1.43 (0.005)	-0.22 (0.034)	0.03 (0.040)	80%
	Greedy	3.15	3.01	3.24	0.02	0.02	0.02	0.01	0.00	100%
	Human	3.15	N/A	3.15	N/A	N/A	N/A	0.00	0.00	100%
	AON	3.15	3.01	3.25	0.55	0.09	1.21	-0.06	0.00	100%
	Random	3.38	3.58	3.25	0.60	0.09	1.36	-0.33	0.10	0%
Provins	IPPO	2.90 (0.015)	2.98 (0.040)	2.85 (0.004)	0.61 (0.271)	0.31 (0.217)	1.05 (0.356)	-0.52 (0.080)	0.05 (0.009)	0%
	IQL	2.91 (0.011)	3.01 (0.027)	2.85 (0.008)	1.40 (0.104)	0.92 (0.068)	2.12 (0.183)	-0.58 (0.093)	0.05 (0.007)	0%
	MAPPO	2.93 (0.011)	3.05 (0.024)	2.84 (0.005)	1.29 (0.162)	0.83 (0.110)	2.00 (0.247)	-0.69 (0.038)	0.06 (0.004)	0%
	QMIX	2.96 (0.005)	3.14 (0.000)	2.85 (0.000)	0.85 (0.215)	0.52 (0.176)	1.35 (0.278)	-0.82 (0.033)	0.08 (0.000)	0%
	Greedy	2.80	2.74	2.84	0.05	0.05	0.06	0.01	0.00	100%
	Human	2.80	N/A	2.80	N/A	N/A	N/A	0.00	0.00	100%
	AON	2.81	2.76	2.84	0.47	0.19	0.99	-0.14	0.00	100%
	Random	2.93	3.04	2.85	0.51	0.22	0.95	-0.62	0.06	0%
Ingolstadt	IPPO	4.41 (0.005)	4.71 (0.030)	4.21 (0.023)	2.42 (0.497)	1.90 (0.505)	3.19 (0.495)	-0.52 (0.095)	0.06 (0.004)	0%
	IQL	4.46 (0.009)	4.81 (0.024)	4.23 (0.009)	2.54 (0.546)	1.93 (0.533)	3.44 (0.562)	-0.69 (0.067)	0.07 (0.000)	0%
	MAPPO	4.45 (0.011)	4.82 (0.019)	4.21 (0.008)	2.76 (0.599)	2.16 (0.622)	3.66 (0.562)	-0.72 (0.066)	0.07 (0.004)	0%
	QMIX	4.50 (0.140)	4.87 (0.325)	4.24 (0.015)	1.83 (0.749)	1.27 (0.710)	2.67 (0.810)	-0.97 (0.235)	0.06 (0.045)	0%
	Greedy	4.22	4.24	4.20	0.26	0.25	0.27	-0.06	0.00	0%
	Human	4.21	N/A	4.21	N/A	N/A	N/A	0.00	0.00	100%
	AON	4.29	4.37	4.23	0.87	0.55	0.24	-0.45	-0.01	0%
	Random	4.45	4.81	4.22	0.99	0.49	1.74	-0.68	0.07	0%

Task 2 - Full autonomy: In St. Arnoult, what happens when all drivers mutate to CAVs and delegate routing decisions?

Note

The parameterization details for the above results are provided in the experiment data repository.

📖 Citation

If you use this software, please cite it as below.

@inproceedings{URB,
  title={URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles},
  author={Akman, Ahmet Onur and Psarou, Anastasia and Hoffmann, Micha{\l} and Gorczyca, {\L}ukasz and Kowalski, {\L}ukasz and Gora, Pawe{\l} and Jamr{\'o}z, Grzegorz and Kucharski, Rafa{\l}},
  booktitle={Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025) Datasets and Benchmarks Track},
  month={December},
  year={2025}
}

Credits

URB is part of COeXISTENCE (ERC Starting Grant, grant agreement No 101075838) and is a team work at Jagiellonian University in Kraków, Poland by: Ahmet Onur Akman, Anastasia Psarou, Łukasz Gorczyca, Michał Hoffmann, Lukasz Kowalski, Paweł Gora, and Grzegorz Jamróz, within the research group of Rafał Kucharski.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github		.github
analysis		analysis
baseline_models		baseline_models
config		config
docs		docs
leaderboard		leaderboard
networks		networks
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
CITATION.CFF		CITATION.CFF
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Urban Routing Benchmark: Benchmarking MARL algorithms on the fleet routing tasks

🔗 Workflow

📝 Task coverage

🏙️ Traffic network and demand data

📦 Setup

Quickstart: Code Ocean Capsule

Prerequisites

Cloning repository

Creating enviroment for URB

🔬 Running experiments

Usage of URB for Reinforcement Learning algorithms

Usage URB for baselines

📊 Calculating metrics and indicators

Usage

Reported indicators

💎 Extending URB

Adding new scripts

Adding new baselines

New scenarios and hyperparameterizations

🔎 Results

📖 Citation

Credits

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Urban Routing Benchmark: Benchmarking MARL algorithms on the fleet routing tasks

🔗 Workflow

📝 Task coverage

🏙️ Traffic network and demand data

📦 Setup

Quickstart: Code Ocean Capsule

Prerequisites

Cloning repository

Creating enviroment for URB

🔬 Running experiments

Usage of URB for Reinforcement Learning algorithms

Usage URB for baselines

📊 Calculating metrics and indicators

Usage

Reported indicators

💎 Extending URB

Adding new scripts

Adding new baselines

New scenarios and hyperparameterizations

🔎 Results

📖 Citation

Credits

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages