Reinforcement Learning for Multi-Robot Fleet Navigation under Uncertainty
N-Drill-Master-RL implements deep reinforcement learning solutions for multi-robot fleet management in dynamic warehouse environments. The project explores the intersection of Deep RL and combinatorial optimization, addressing challenges in pick-up and delivery services with robot fleets operating under uncertainty.
- Multi-Agent RL: CTDE (Centralized Training with Decentralized Execution)
- Graph Attention Networks: Scalable inter-agent communication
- PPO-style Policy Gradient: Stable training with clipped objectives
- QMIX-style Value Mixing: Cooperative multi-agent learning
- Gymnasium-Compatible Environment: Easy integration with RL frameworks
This project uses Nix for reproducible environment management. To get started:
git clone https://github.com/nkermani/N-Drill-Master-RL.git
cd N-Drill-Master-RL
# Enter the Nix shell (automatically creates venv and installs dependencies)
nix-shell
# The shell will:
# 1. Create a Python 3.11 virtual environment in .venv/
# 2. Install PyTorch and all requirements
# 3. Activate the environment
#
# To exit: deactivate
# To re-enter: nix-shellThis project uses Nix for reproducible environment management. To get started:
git clone https://github.com/nkermani/N-Drill-Master-RL.git
cd N-Drill-Master-RL
# Enter the Nix shell (automatically creates venv and installs dependencies)
nix-shell
# The shell will:
# 1. Create a Python 3.11 virtual environment in .venv/
# 2. Install PyTorch and all requirements
# 3. Activate the environment
#
# To exit: deactivate
# To re-enter: nix-shellIf you have Nix installed, simply run:
nix-shell # or: nix-shell shell.nixThis provides all dependencies automatically.
git clone https://github.com/nkermani/N-Drill-Master-RL.git
cd N-Drill-Master-RL
pip install -e . # Editable installDependencies:
- PyTorch >= 2.0 (optional, for model training)
- PyTorch Geometric >= 2.3 (optional, for model training)
- Gymnasium >= 0.29
- NumPy, Pandas, Matplotlib
After entering the nix-shell (see Installation above):
# Run the complete pipeline: environment test, visualization, and training
python run.pyThis will:
- Test all imports (PyTorch, Gymnasium, Matplotlib)
- Create the warehouse environment
- Generate visualization plots in
visualizations/ - Train the RL model for 100 updates
from src.warehouse_env import WarehouseEnv
env = WarehouseEnv(
num_robots=10,
grid_size=(10, 10),
num_stations=8,
task_arrival_rate=0.3,
max_tasks=50,
seed=42
)
obs, info = env.reset()
print(f"Active tasks: {info['active_tasks']}")actions = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4] # Example actions per robot
obs, rewards, terminations, truncations, info = env.step(actions)
print(f"Rewards: {rewards}")
print(f"Info: {info}")import torch
from src.model import AttentionPolicy, GNNEncoder
from src.train import Trainer
policy = AttentionPolicy(
state_dim=6,
action_dim=5,
hidden_dim=64,
num_agents=10
)
value_network = AttentionPolicy(
state_dim=6,
action_dim=5,
hidden_dim=64,
num_agents=10
)
trainer = Trainer(
env=env,
policy=policy,
value_network=value_network,
num_agents=10,
learning_rate=3e-4,
batch_size=64,
max_steps=1000
)
history = trainer.train(num_updates=1000, log_interval=10)
print(f"Final reward: {history['total_reward'][-1]:.2f}")trainer.save_checkpoint('checkpoints/model.pt')
trainer.load_checkpoint('checkpoints/model.pt')N-Drill-Master-RL/
├── data/ # Generated datasets
├── notebooks/ # Exploratory analysis
├── src/
│ ├── warehouse_env/ # Multi-agent environment
│ │ ├── warehouse.py # Warehouse environment
│ │ └── __init__.py
│ ├── model/ # RL and GNN models
│ │ ├── attention_policy.py
│ │ ├── qmix.py
│ │ └── __init__.py
│ ├── train.py # Training loops
│ └── __init__.py
├── tests/ # Unit tests
├── requirements.txt
├── README.md
├── SUBJECT.md
├── TECHNICAL_STACK.md
└── EXPLANATIONS.md
| Class | Description |
|---|---|
WarehouseEnv |
Gymnasium-compatible multi-agent environment |
Robot |
Individual robot agent |
Task |
Pick-up/delivery task |
| Class | Description |
|---|---|
GNNEncoder |
Graph Attention Network encoder |
AttentionPolicy |
Policy network with GAT backbone |
CentralizedCritic |
Centralized value function |
MixingNetwork |
QMIX-style mixing network |
| Class | Description |
|---|---|
Trainer |
PPO-style training loop |
PPOAgent |
PPO agent with advantage estimation |
ReplayBuffer |
Experience replay buffer |
# Inside nix-shell
pytest tests/ -v# Inside nix-shell
python -c "
from src.train import visualize_training
history = {'policy_loss': [0.5], 'value_loss': [0.3], 'entropy': [1.2], 'total_reward': [100]}
visualize_training(history, save_path='training_curves.png')
"# Enter nix-shell (first time: creates venv and installs everything)
nix-shell
# Run everything: test environment, create visuals, train model
python run.py
# Exit when done
deactivate
# Re-enter anytime with: nix-shellLast Updated: April 2026 Author: Nathan Kermani