Multi-Robot Exploration with RLlib

This is a Deep Reinforcement Learning (DRL) framework for multi-robot coverage using Proximal Policy Optimization (PPO) with a centralized critic and decentralized actors (CTDE framework), which maintains connectivity during exploration. Following this framework, each robot learns a local policy based on partial observations, such as LiDAR scans, the visited cell history, and the states of nearby robots, while benefiting from centralized value estimation during training. The combination of convolutional feature extraction, centralized value estimation, and communication-aware reward shaping enables consistently high coverage and connectivity among multiple robots. This approach scales effectively with team size: we demonstrated successful training with up to 20 robots, and thanks to the decentralized policy design, the learned models generalize seamlessly to larger teams, achieving reliable performance with up to 50 robots at test time.

Example Episode

Network Architecture

Installation (Python v3.10)

Install Pytorch with CUDA support (Required)

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu{cuda_ver}

Install Dependencies (Python version 3.10 required for pre-trained models -- I use 3.10.11)

pip install ray[tune]==2.44.1 PettingZoo>=1.22.3 dm_tree scipy pygame matplotlib lz4 pyyaml

Usage:

Everything is configured through a central configuration file located at config/default.

Environment

Specify the environment parameters for training and testing. The map size must remain consistent between training and testing, but all other parameters can be adjusted accordingly.

reward_scheme: Change between any reward scheme defined in utils.make_reward_scheme.

Training

Select a neural network architecture from models/arch/ using the module_file parameter.
Note that, when specifying an entropy_coeff schedule, you should multiply each timestep value by num_agents to get an accurate schedule during training.
- The schedule uses a format of [[timestep, value], ..., [timestep, value]].
Model checkpoints, results, and a copy of the config file will be saved under experiment/<env_name>/<verion>.

When everything is set as you prefer in the config file, run python main.py. You can view live metrics by running tensorboard --logdir=~\ray_results and selecting the latest run.

Testing

Select the model to test (i.e. v1)
Specify a checkpoint number greater than -1 to test a checkpoint from ...model_path/ckpt/<i>, otherwise use the latest saved model.
Set explore: True to enable some stochasticity.
A csv file with various results will be saved at ...model_path/test-results/

Run python main.py --test to begin testing.

Testing the Baseline

For the baseline, I implemented the following paper: Multi-robot exploration under the constraints of wireless networking

To test the baseline, change environment:env_name to "baseline" in the config file. This does not use a trained model.

Additional Notes

Use utils.gen_train_test_split() to generate 100 new maps for training and testing.
- Tune obstacle densities by modifying the function.
Define new reward schemes and modify existing ones in environment_rewards.py. Add new schemes to utils.make_reward_scheme to use them in your config.
Add custom configuration files under config/ and use --config <file_name> to specify during training and testing.
- python main.py --config <custom_config>
- python main.py --test --config <custom_config>
See the Ray RLlib documentation for ray-specific info. I use the old stack (not RLModule).

Project Structure

/
├── config/                                         # YAML project config files
│   └── default.txt
│
├── environment/
│   ├── envs/                              
│   │   ├── gridworld.py
│   │   └── baseline.py
│   ├── obstacle_mats/                              # custom obstacle maps (origin at top-left)
│   │   ├── testing/
│   │   │   ├── mat0
│   │   │   └── ...
│   │   └── training/
│   │       ├── mat0
│   │       └── ...
│   └── rewards.py                                  # create and customize reward functions                           
│
├── experiments/                                    # training and testing files split by scenario
│   ├── gridworld/                               
│   │   └── v0/                                     
│   │       ├── ckpt/                               # model checkpoint(s) during training
│   │       │   ├── 0/                              # checkpoint 0, 1, ..., n
│   │       │   │   └── <rllib_algorithm_files>
│   │       │   └── ...
│   │       ├── saved/                              # the final model after training is finished
│   │       │   └── <rllib_algorithm_files>
│   │       ├── test-results/               
│   │       │   └── results.csv
│   │       ├── train-metrics/
│   │       │   └── metrics_plot.png
│   │       └── config.txt                          # copy of config/<config.txt> used for training
│   └── baseline/
│       ├── v0/  
│       │   └── ...
│       └── ...
│
├── models/                                         # neural network architectures
│   ├── arch/
│   │   ├── cnn_2conv2linear.py
│   │   └── ...                        
│   └── rl_wrapper.py                               # wrapper for Ray RLlib
│
├── main.py                                         # main entry point for training and testing
├── test.py                                         
├── train.py
└── utils.py                                        # handles arguments, environments, and metrics

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
config		config
environment		environment
experiments		experiments
final_results/figures		final_results/figures
images		images
models		models
.gitignore		.gitignore
README.md		README.md
main.py		main.py
test.py		test.py
train.py		train.py
utils.py		utils.py
v5		v5
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Robot Exploration with RLlib

Example Episode

Network Architecture

Installation (Python v3.10)

Usage:

Environment

Training

Testing

Testing the Baseline

Additional Notes

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Jcorrieri/multiagent-gridworld

Folders and files

Latest commit

History

Repository files navigation

Multi-Robot Exploration with RLlib

Example Episode

Network Architecture

Installation (Python v3.10)

Usage:

Environment

Training

Testing

Testing the Baseline

Additional Notes

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages