Connected Autonomous Vehicles (CAVs) have the potential to transform urban mobility by alleviating congestion through optimized, intelligent routing. Unlike human drivers, CAVs leverage collective, data-driven policies generated by machine learning algorithms. Reinforcement learning (RL) can facilitate the development of such collective routing strategies, yet standardized and realistic benchmarks are missing. To that end, we present URB: Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles. URB is a comprehensive benchmarking environment that unifies evaluation across 29 real-world traffic networks paired with realistic demand patterns. URB comes with a catalog of predefined tasks, multi-agent RL (MARL) algorithm implementations, baseline methods, ten domain-specific performance metrics, and a modular configuration scheme.
Through this broad experimental scheme, URB aims to:
- Identify which state-of-the-art algorithms outperform others in this class of tasks,
- Drive competition for future algorithmic improvements, and
- Clarify the impact of collective CAV routing on congestion, emissions, and sustainability in future cities, equipping policymakers with solid arguments for CAV regulations.
URB (as depicted in the above figure):
- Runs an experiment script using
RouteRL, - With the selected algorithm (e.g., an RL algorithm or a baseline method),
- Opens environment, algorithm and task configuration files from
config/, - Loads the network and demand from
networks/, - Executes the scenario defined in the script and configurations, and
- When the training is finished, it uses raw results to compute a wide-set of KPIs.
An example URB experiment can be defined as:
In the town of Nemours inhabited only by human drivers, at some point, a given share of drivers mutate to CAVs and delegate routing decisions. Then, for a period of time, the CAV agents develop routing strategies to minimize their delay using MARL. This process causes disturbances and influences the traffic efficiency and human travel experience.
URB can accommodate a wide variety of task specifications, including:
- Full or mixed autonomy
- CAV fleet following different behavior profiles, including: malicious, altruistic, selfish, etc.
- Varying complexity of traffic networks and demand patterns
- Human behavior models: probabilistic vs. greedy
- Human adaptations: drivers react to actions of the fleet and change their behaviour
- And many more!
With this repository, URB comes with 6 traffic networks and associated demand data to experiment with. Two examples:
![]() |
![]() |
|---|---|
| Gretz Armainvilliers | Nangis |
More networks and demand data are available here. User can download the network folder of their choice, place the folder in
networks/, and use it as described below.
For a quickstart interaction with URB, we provide an executable code capsule on Code Ocean that runs a concise demonstrative experiment using the QMIX algorithm in the St. Arnoult network.
This environment includes all necessary dependencies (including SUMO) preinstalled, enabling reproducibility with a single click. We invite those interested to explore this capsule to examine the experimental workflow and output formats in a fully isolated and controlled setting.
- Visit the capsule link.
- Create a free CodeOcean account (if you don’t have one).
- Click Reproducible Run to execute the code in a controlled and reproducible environment.
Make sure you have SUMO installed in your system. This procedure should be carried out separately, by following the instructions provided here.
Clone the URB repository from github by
git clone https://github.com/COeXISTENCE-PROJECT/URB.git- Option 1 (Recommended): Create a virtual enviroment with
venv:
python3.13.1 -m venv .venvand then install dependencies by:
cd URB
pip install --force-reinstall --no-cache-dir -r requirements.txt- Option 2 (Alternative): Use conda environment with
conda:
conda create -n URB python=3.13.1and then install dependencies by:
cd URB
conda activate URB
pip install --force-reinstall --no-cache-dir -r requirements.txtTo use URB while using RL algorithm, you have to provide in the command line the following command:
python scripts/<script_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed> --torch-seed <torch_seed>where
<scipt_name>is the script you wish to run, available scripts areippo_torchrl,iql_torchrl,mappo_torchrl,vdn_torchrl,qmix_torchrl,iql,ippo,<exp_id>is your own experiment identifier, for instancerandom_ing,<hyperparam_id>is the hyperparameterization identifier, it must correspond to a.jsonfilename (without extension) inconfig/algo_config. Provided scripts automatically select the algorithm-specific subfolder in this directory.<env_conf_id>is the environment configuration identifier. It must correspond to a.jsonfilename (without extension) inconfig/env_config. It is used to parameterize environment-specific processes, such as path generation, disk operations, etc. It is optional and by default is set toconfig1.<task_id>is the task configuration identifier. It must correspond to a.jsonfilename (without extension) inconfig/task_config. It is used to parameterize the simulated scenario, such as portion of AVs, duration of human learning, AV behavior, etc.<net_name>is the name of the network you wish to use. Must be one of the folder names innetworks/i.e.ingolstadt_custom,nangis,nemours,provinsorsaint_arnoult,<env_seed>is reproducibility random seed for the traffic environment, default seed is set to be 42,<torch_seed>is reproducibility random seed for PyTorch, it is optional and by default is set to 42.
For example, the following command runs an experiment using:
- QMIX algorithm, hyperparameterized by
config/algo_config/qmix_torchrl/config3.json, - The task specified in
config/task_config/config4.json, - The environment parameterization specified in
config/env_config/config1.json(by default), - Experiment identifier
sai_qmix_0, which will be used as the folder name inresults/to save the experiment data, - Saint Arnoult network and demand, from
networks/saint_arnoult, - Environment (also used for
randomandnumpy) and PyTorch seeds 42 and 0, respectively.
python scripts/qmix_torchrl.py --id sai_qmix_0 --alg-conf config3 --task-conf config4 --net saint_arnoult --env-seed 42 --torch-seed 0Depending on the selected baseline method, it can be run as follows:
- Decentralized (per-agent) methods: executed via
scripts/baselines.pywith the--model <model_name>flag, without specifying--torch_seed. In these methods, each agent trains its own separate model using only agent-level observations. Currently available options are:aon,random, andgawron. - Centralized methods: executed in the same manner as RL scripts via
scripts/<alg_name>.py, without the
--torch_seedflag. These methods use a centralized data structure or model instead of per-agent models or observations. Currently available:greedy.
Command:
# Decentralized (aon, random, gawron)
python scripts/baselines.py --model <model_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed> # Centralized (greedy)
python scripts/<script_name> --id <exp_id> --alg-conf <hyperparam_id> --env-conf <env_conf_id> --task-conf <task_id> --net <net_name> --env-seed <env_seed>Examples:
# aon
python scripts/baselines.py --model aon --id ing_aon --alg-conf config1 --task-conf config2 --net ingolstadt_custom --env-seed 42 # greedy
python scripts/greedy.py --id ing_greedy --alg-conf config1 --task-conf config2 --net ingolstadt_custom --env-seed 42Each experiment outputs set of raw records, which are then processed with the script in this folder for a set of performance indicators which we report and several additional metrics that track the quality of the solution and its impact to the system.
All experiment scripts in scripts/ now automatically run analysis/metrics.py at the end of execution.
Manual execution is still supported as described below.
To use the analysis script, you have to provide in the command line the following command:
python analysis/metrics.py --id <exp_id> --verbose <verbose> --results-folder <results-folder> --skip-clearing <skip-clearing> --skip-collecting <skip-collecting>that will collect the results from the experiment with identifier <exp_id> and save them in the folder <exp_id>/metrics/. The --verbose flag is optional and if set to True will print additional information about the analysis process. Flag --results-folder is optional and if set to True will use the folder <results-folder> instead of the default one results/. The flags --skip-clearing and --skip-collecting are optional and if set to True will skip clearing and collecting the results from the experiment, respectively. Those operations have to be done only once, so if you are running the analysis script multiple times, you can skip them.
Loss values from learning scripts are saved in a unified CSV format at:
results/<exp_id>/losses/losses.csv
The core metric is the travel time
- CAV advantage as
$\hat{t}_{HDV}^{post}$ /$\hat{t}_{CAV}$ , - Effect of changing to CAV as
${\hat{t}_{HDV}^{pre}}/{\hat{t}_{CAV}}$ , and - Effect of remaining HDV as
${\hat{t}_{HDV}^{pre}}/{\hat{t}_{HDV}^{test}}$ , which reflect the relative performance of HDVs and the CAV fleet from the point of view of individual agents.
To better understand the causes of the changes in travel time, we track the Average speed and Average mileage (directly extracted from SUMO).
We measure the Cost of training, expressed as the average of:
We provide templates for extending the possible experiments that can be conducted using URB.
Users can add new experiment scripts for testing different algorithms, different implementations and different training pipelines. The recommended script structure is provided in scripts/base_script.py.
- Decentralized (per-agent) baseline models: users can define and use their own methods by creating a new model that extends
baseline_models/BaseLearningModel. - Centralized baseline models (or methods that cannot be reasonably implemented in a per-agent manner): the recommended approach is to implement them using the universal
scripts/base_script.pytemplate, in the same way as RL methods.
Users can extend possible experiment configurations by adding:
- Algorithm hyperparameterization in
config/algo_config, - Experiment setting in
config/env_config, and - New tasks in
config/task_config.
Below are some results from three networks and two scenarios, reported in our paper URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles.
Task 1 - Mixed autonomy: In a given network with a fixed demand pattern, experienced human agents have learned their route-choice strategies (minimized travel times). At some point, a given share of them mutate to CAVs and delegate routing decisions. Then, for a given number of training episodes, the agents develop routing strategies to minimize their delay using MARL.
Travel times relative to human baseline (
Table 1: Scenario 1 results for three cities (mean±std over five seeded runs).
Pre-CAV mean travel times (
| City | Algorithm | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| St. Arnoult | IPPO | 3.28 (0.004) | 3.33 (0.013) | 3.25 (0.008) | 0.63 (0.015) | 0.13 (0.004) | 1.38 (0.034) | -0.24 (0.067) | 0.06 (0.004) | 0% |
| IQL | 3.36 (0.040) | 3.53 (0.104) | 3.24 (0.005) | 0.66 (0.000) | 0.14 (0.000) | 1.44 (0.004) | -0.37 (0.115) | 0.09 (0.021) | 0% | |
| MAPPO | 3.35 (0.049) | 3.51 (0.121) | 3.25 (0.004) | 0.66 (0.000) | 0.14 (0.004) | 1.45 (0.000) | -0.27 (0.129) | 0.09 (0.019) | 0% | |
| QMIX | 3.24 (0.080) | 3.21 (0.206) | 3.25 (0.004) | 0.65 (0.004) | 0.14 (0.005) | 1.43 (0.005) | -0.22 (0.034) | 0.03 (0.040) | 80% | |
| Greedy | 3.15 | 3.01 | 3.24 | 0.02 | 0.02 | 0.02 | 0.01 | 0.00 | 100% | |
| Human | 3.15 | N/A | 3.15 | N/A | N/A | N/A | 0.00 | 0.00 | 100% | |
| AON | 3.15 | 3.01 | 3.25 | 0.55 | 0.09 | 1.21 | -0.06 | 0.00 | 100% | |
| Random | 3.38 | 3.58 | 3.25 | 0.60 | 0.09 | 1.36 | -0.33 | 0.10 | 0% | |
| Provins | IPPO | 2.90 (0.015) | 2.98 (0.040) | 2.85 (0.004) | 0.61 (0.271) | 0.31 (0.217) | 1.05 (0.356) | -0.52 (0.080) | 0.05 (0.009) | 0% |
| IQL | 2.91 (0.011) | 3.01 (0.027) | 2.85 (0.008) | 1.40 (0.104) | 0.92 (0.068) | 2.12 (0.183) | -0.58 (0.093) | 0.05 (0.007) | 0% | |
| MAPPO | 2.93 (0.011) | 3.05 (0.024) | 2.84 (0.005) | 1.29 (0.162) | 0.83 (0.110) | 2.00 (0.247) | -0.69 (0.038) | 0.06 (0.004) | 0% | |
| QMIX | 2.96 (0.005) | 3.14 (0.000) | 2.85 (0.000) | 0.85 (0.215) | 0.52 (0.176) | 1.35 (0.278) | -0.82 (0.033) | 0.08 (0.000) | 0% | |
| Greedy | 2.80 | 2.74 | 2.84 | 0.05 | 0.05 | 0.06 | 0.01 | 0.00 | 100% | |
| Human | 2.80 | N/A | 2.80 | N/A | N/A | N/A | 0.00 | 0.00 | 100% | |
| AON | 2.81 | 2.76 | 2.84 | 0.47 | 0.19 | 0.99 | -0.14 | 0.00 | 100% | |
| Random | 2.93 | 3.04 | 2.85 | 0.51 | 0.22 | 0.95 | -0.62 | 0.06 | 0% | |
| Ingolstadt | IPPO | 4.41 (0.005) | 4.71 (0.030) | 4.21 (0.023) | 2.42 (0.497) | 1.90 (0.505) | 3.19 (0.495) | -0.52 (0.095) | 0.06 (0.004) | 0% |
| IQL | 4.46 (0.009) | 4.81 (0.024) | 4.23 (0.009) | 2.54 (0.546) | 1.93 (0.533) | 3.44 (0.562) | -0.69 (0.067) | 0.07 (0.000) | 0% | |
| MAPPO | 4.45 (0.011) | 4.82 (0.019) | 4.21 (0.008) | 2.76 (0.599) | 2.16 (0.622) | 3.66 (0.562) | -0.72 (0.066) | 0.07 (0.004) | 0% | |
| QMIX | 4.50 (0.140) | 4.87 (0.325) | 4.24 (0.015) | 1.83 (0.749) | 1.27 (0.710) | 2.67 (0.810) | -0.97 (0.235) | 0.06 (0.045) | 0% | |
| Greedy | 4.22 | 4.24 | 4.20 | 0.26 | 0.25 | 0.27 | -0.06 | 0.00 | 0% | |
| Human | 4.21 | N/A | 4.21 | N/A | N/A | N/A | 0.00 | 0.00 | 100% | |
| AON | 4.29 | 4.37 | 4.23 | 0.87 | 0.55 | 0.24 | -0.45 | -0.01 | 0% | |
| Random | 4.45 | 4.81 | 4.22 | 0.99 | 0.49 | 1.74 | -0.68 | 0.07 | 0% |
Task 2 - Full autonomy: In St. Arnoult, what happens when all drivers mutate to CAVs and delegate routing decisions?
Note
The parameterization details for the above results are provided in the experiment data repository.
If you use this software, please cite it as below.
@inproceedings{URB,
title={URB -- Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles},
author={Akman, Ahmet Onur and Psarou, Anastasia and Hoffmann, Micha{\l} and Gorczyca, {\L}ukasz and Kowalski, {\L}ukasz and Gora, Pawe{\l} and Jamr{\'o}z, Grzegorz and Kucharski, Rafa{\l}},
booktitle={Proceedings of the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025) Datasets and Benchmarks Track},
month={December},
year={2025}
}
URB is part of COeXISTENCE (ERC Starting Grant, grant agreement No 101075838) and is a team work at Jagiellonian University in Kraków, Poland by: Ahmet Onur Akman, Anastasia Psarou, Łukasz Gorczyca, Michał Hoffmann, Lukasz Kowalski, Paweł Gora, and Grzegorz Jamróz, within the research group of Rafał Kucharski.






