Skip to content

s870488-dev/ros2-multi-agent-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ROS2 Multi-Agent Coordination Demo

A ROS2 (Humble) implementation of a multi-agent task coordination system with fault tolerance, performance metrics, and algorithm comparison. Demonstrates autonomous task assignment, parallel execution, and real-time status monitoring.

Demo Output

[coordinator]: Assigned T001 (navigate@zone_A) -> agent_1 [coordinator]: Assigned T003 (navigate@zone_C) -> agent_2 [agent_1]: Executing T001 (navigate) for 4s... [agent_2]: Executing T003 (navigate) for 4s... [agent_1]: Completed T001 [coordinator]: Task T001 completed by agent_1

Architecture

graph LR
    C[Coordinator Node]
    A1[Agent 1]
    A2[Agent 2]

    C -->|/tasks| A1
    C -->|/tasks| A2
    A1 -->|/agent_status| C
    A2 -->|/agent_status| C
Loading

Coordinator Node (coordinator_node.py)

  • Maintains a priority-sorted task queue
  • Monitors agent status via /agent_status topic
  • Assigns tasks to idle agents dynamically

Agent Node (agent_node.py)

  • Subscribes to /tasks topic
  • Executes assigned tasks concurrently (threading)
  • Publishes real-time status to /agent_status

Key Features

  • Priority-based scheduling — higher priority tasks assigned first
  • Parallel execution — multiple agents work simultaneously
  • Dynamic load balancing — tasks assigned immediately when agent becomes idle
  • Fault tolerance — heartbeat monitoring with automatic task recovery
  • Algorithm comparison — benchmarking framework for scheduling strategies
  • Scalable — add more agents without changing coordinator logic

Background

This demo extends my experience building multi-agent decision systems (100+ assets, 25 time dimensions, 6 API integrations) into the ROS2 robotics domain, demonstrating the same coordination patterns applied to autonomous robot task management.

Quick Start

Prerequisites

  • Docker Desktop (Apple Silicon / Linux)
  • Git

Run

git clone https://github.com/s870488-dev/ros2-multi-agent-demo.git
cd ros2-multi-agent-demo

# Build Docker image
cd docker && docker compose build

# Start container
docker compose up -d

# Enter container
docker exec -it ros2_dev bash

# Build ROS2 package
cd /ros2_ws && colcon build
source install/setup.bash

# Launch demo (basic)
ros2 launch multi_agent_coordinator demo.launch.py

# Run algorithm comparison
ros2 run multi_agent_coordinator coordinator_comparison priority
ros2 run multi_agent_coordinator coordinator_comparison random
ros2 run multi_agent_coordinator coordinator_comparison least_loaded

Fault Tolerance

The coordinator monitors each agent via heartbeat. If an agent stops responding for >5 seconds, it is marked dead and all in-progress tasks are automatically re-queued and reassigned. [FAULT] agent_1 heartbeat lost (7.0s) → marking dead [RECOVER] T002 re-queued (was on agent_1) [RECOVER] T005 re-queued (was on agent_1) [RECOVER] T004 re-queued (was on agent_1) [ASSIGN] T002 → agent_2 [ASSIGN] T005 → agent_2 [ASSIGN] T004 → agent_2 Result: All 6 tasks completed. Zero task loss. Full log: docs/fault_tolerance_demo.txt

Algorithm Comparison

Three scheduling strategies were benchmarked on an identical 6-task workload (2× navigate, 2× inspect, 1× collect, varying priority levels):

Algorithm Avg Wait (s) Avg Exec (s) Runtime (s) Avg Util (%)
Priority 8.3 4.9 21.0 71.4
Random 9.3 4.9 21.0 70.2
Least-loaded 8.3 4.9 21.0 72.6

Observation: In execution-time-dominated scenarios, total runtime converges across strategies. The key differentiator is wait time distribution — Priority and Least-loaded consistently assign high-priority tasks earlier, while Random introduces unnecessary delay for critical tasks. Least-loaded achieves marginally higher agent utilization by balancing cumulative workload.

Raw data: docs/metrics_priority.csv · docs/metrics_random.csv · docs/metrics_least_loaded.csv

ROS2 Topics

Topic Type Description
/tasks std_msgs/String (JSON) Task assignments from coordinator
/agent_status std_msgs/String (JSON) Agent status reports

Task Types

Type Duration Description
navigate 4s Move to target zone
inspect 6s Inspect target zone
collect 3s Collect item at zone

Environment

  • ROS2 Humble (LTS)
  • Python 3.10
  • Docker (linux/arm64 + amd64)
  • Tested on Apple Silicon (M4) and Linux

Visualization

Algorithm Comparison

Analysis

Wait time distribution is the primary differentiator across algorithms:

  • Priority and Least-loaded assign high-priority tasks (T001, T003) first, achieving 2s wait — the theoretical minimum given startup delay
  • Random scheduling assigned inspect tasks (T002, T005) first in the tested run, causing navigate tasks to wait up to 16s
  • In execution-time-dominated workloads, total runtime converges across all strategies (21s) — the scheduling bottleneck only emerges when task arrival rate exceeds agent capacity

Agent utilization remains consistent (70–73%) across all algorithms with 2 agents and 6 tasks, suggesting utilization is bounded by task density rather than scheduling strategy in this configuration.

Implication for multi-robot systems: Priority-based scheduling provides predictable latency for critical tasks (e.g. emergency navigation) without sacrificing overall throughput — a desirable property for real-world robot coordination where task urgency varies.

About

Multi-agent task coordination with ROS2 Humble — fault tolerance, performance metrics, and scheduling algorithm comparison

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors