Skip to content

pgq18/DreamBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DreamBot

MIT License ROS

An intelligent robotic system that explores, remembers, and navigates using Vision-Language Models and Vector Databases.


Overview

DreamBot is a ROS-based intelligent robotic system running on the Unitree Go2 quadruped robot. It bridges modern AI technologies (Vision-Language Models) with classical robotics (SLAM, path planning) to create a robot that can understand natural language commands and autonomously navigate to remembered objects.

What DreamBot Does

During Exploration: The robot autonomously wanders around the environment, continuously detecting and memorizing objects it encounters, marking them on a semantic map.

On User Command: Users can instruct the robot to navigate to any previously seen object using natural language (e.g., "go to the red cup").

Key Innovation

  • Bridges perception with action: Combines what objects are with how to reach them
  • Persistent semantic memory: Objects remembered across sessions via Milvus vector database
  • Zero-shot detection: Detect any object using natural language without prior training
  • Hybrid search: Combines semantic (COSINE) and spatial (L2) search for intelligent retrieval

Key Features

Two-Phase Operation

Phase Description
Exploration Phase Robot autonomously explores, detects objects, and builds semantic memory
Navigation Phase User commands robot to navigate to any remembered object

Core Capabilities

  • Autonomous Exploration & Memorization - Robot records what it sees and marks objects on the semantic map
  • Zero-Shot Object Detection - Detect any object using natural language via Qwen2.5-VL-72B
  • Semantic Memory System - Store and retrieve objects by meaning using 768-dim vector embeddings
  • Natural Language Navigation - Navigate using conversational commands ("go to the blue cup")
  • 3D Spatial Understanding - Transform 2D detections to 3D world coordinates
  • Persistent Memory - Objects remembered across sessions via Milvus
  • Real-time Visualization - Live 3D bounding boxes in RViz
  • Hybrid Search - Combine semantic (COSINE) and spatial (L2) search

Core Workflow

Phase 1: Autonomous Exploration & Memorization

While the robot explores/wanders in the environment:

  1. SLAM builds a map of the surroundings in real-time
  2. Camera continuously captures RGB-D images and point clouds
  3. VLM automatically detects and identifies objects in view
  4. Objects are stored in Milvus database with:
    • Semantic embeddings (768-dim vectors for natural language search)
    • 3D positions in map frame (for navigation)
    • Labels and detailed descriptions
  5. Free navigation space is extracted from costmap
  6. Result: A semantic map with all discovered objects marked on it

Key Insight: The robot remembers what it has seen during exploration, building a persistent memory of the environment.

Phase 2: On-Demand Navigation

When user issues a command:

  1. User says: "go to the red cup" or "navigate to the blue book"
  2. System searches Milvus using semantic similarity (COSINE)
  3. Retrieves the object's 3D position from memory
  4. Finds the nearest free navigation point (L2 distance)
  5. Plans optimal path using TEB local planner
  6. Navigates autonomously to the target location

Key Insight: Users can navigate to any remembered object without re-detecting it - the robot already knows where things are!


Technical Pipeline

Detection & Memorization Pipeline

/instruction ──▶ [RGB-D Camera] ──▶ [VLM Detection] ──▶ [2D→3D Projection]
                      │                    │                    │
               /d435/color/        Qwen2.5-VL-72B       TF Transform
               image_raw                                  (camera→map)
                      │                    │                    │
                      └────────────────────┴────────────────────┘
                                           │
                                           ▼
                                   [Milvus Database]
                                   • Object embeddings
                                   • 3D positions
                                   • Labels & descriptions

Navigation Execution Pipeline

/goal_input ──▶ [Milvus Search] ──▶ [Free Space Query] ──▶ [move_base]
     │               │                     │                    │
 "red cup"     COSINE search          L2 distance         TEB planner
               (semantic)             (spatial)
     │               │                     │                    │
     └───────────────┴─────────────────────┴────────────────────┘
                             │
                             ▼
                      [Robot Navigation]

Hardware Requirements

Component Specification
Robot Unitree Go2 quadruped robot
Camera Intel RealSense D435 RGB-D camera
Computer ROS-compatible system (Ubuntu 18.04/20.04)

Software Dependencies

ROS Packages

  • ROS Noetic
  • slam_toolbox - SLAM mapping and localization
  • move_base - Navigation stack
  • teb_local_planner - Trajectory execution
  • cv_bridge - OpenCV-ROS integration

Installation

# 1. Clone repository
cd ~/catkin_ws/src
git clone <repository_url> DreamBot

# 2. Install Python dependencies
pip install pymilvus flask openai opencv-python numpy pillow

# 3. Build ROS package
cd ~/catkin_ws
catkin_make

# 4. Source the workspace
source devel/setup.bash

Quick Start

Step 0: Setup Simulation Environment (Optional)

If using the Unitree Go2 in simulation mode:

# Start the junior_ctrl controller for Go2
sudo ./devel/lib/unitree_guide/junior_ctrl

In another terminal, control the robot with keyboard:

rosrun teleop_twist_keyboard teleop_twist_keyboard.py

Step 1: Start Milvus Server

python Milvus/server_test.py

The server will start on http://0.0.0.0:5002.

Step 2: Start Full System

roslaunch DreamBot core.launch

This launches:

  • SLAM (slam_toolbox)
  • Navigation stack (move_base + TEB planner)
  • Detection node
  • RViz visualization

Step 3: Explore and Memorize

Drive the robot around using keyboard commands. The system will automatically:

  • Build a map using SLAM
  • Detect and memorize objects in view (when instruction is sent)
  • Mark objects on the semantic map

Step 4: Navigate to Objects

# Navigate to a previously detected object
rostopic pub /goal_input std_msgs/String "data: 'the red cup'"

ROS Topics

Subscribed Topics

Topic Message Type Description
/d435/color/image_raw sensor_msgs/Image RGB images from RealSense
/d435/depth/color/points sensor_msgs/PointCloud2 Point cloud data
/d435/color/camera_info sensor_msgs/CameraInfo Camera intrinsic parameters
/instruction std_msgs/String Detection commands
/goal_input std_msgs/String Navigation target description

Published Topics

Topic Message Type Description
/detect/labeled_image sensor_msgs/Image Annotated detection images
/detect/marker_array visualization_msgs/MarkerArray 3D markers for RViz
/detect/points_rgb_frame_color sensor_msgs/PointCloud2 Detected point cloud
/robot_pose geometry_msgs/PoseStamped Current robot pose in map frame

Milvus API Endpoints

The Milvus server runs on port 5002 by default.

Endpoint Method Description
/api/insert/object POST Insert detected objects with embeddings
/api/search/object GET Semantic search for objects (COSINE)
/api/insert/point POST Insert free space navigation points
/api/search/point GET Find nearest free navigation point (L2)

Example API Calls

# Insert object
import requests
import json

data = {
    "data": [{
        "label": "cup",
        "description": "a red ceramic cup on the table",
        "timestamp": 0,
        "position": [1.5, 2.0, 0.3]
    }]
}
response = requests.post("http://localhost:5002/api/insert/object", json=data)

# Search object
query = {"data": ["red cup"]}
response = requests.get("http://localhost:5002/api/search/object", json=query)

Usage Examples

# Navigate to a remembered object
rostopic pub /goal_input std_msgs/String "data: 'red cup'"

# Navigate using description
rostopic pub /goal_input std_msgs/String "data: 'the blue book on the shelf'"

Project Structure

DreamBot/
├── scripts/                      # ROS nodes
│   ├── detection_node.py         # VLM-based object detection
│   ├── navigation_node.py        # Semantic navigation controller
│   └── save_costmap.py           # Map persistence utilities
│
├── utils/                        # Utility modules
│   ├── model.py                  # VLM integration (Qwen2.5-VL)
│   ├── memory_process.py         # Memory operations
│   ├── visualization.py          # RViz marker utilities
│   ├── pointcloud.py             # Point cloud utilities
│   └── Milvus/
│       └── Milvus_client.py      # HTTP client for Milvus
│
├── Milvus/                       # Database server
│   └── server_test.py            # Flask API server
│
├── launch/                       # ROS launch files
│   ├── core.launch               # Full system launch
│   ├── detect_node.launch        # Detection node only
│   ├── move_base.launch          # Navigation stack
│   ├── slam_toolbox_mapping.launch       # SLAM mapping mode
│   ├── slam_toolbox_localization.launch  # SLAM localization mode
│   └── slam_toolbox_localization_detection.launch
│
├── param/                        # Configuration files
│   ├── slam_toolbox/             # SLAM parameters
│   ├── costmap/                  # Costmap configuration
│   ├── planner/                  # Planner parameters
│   └── move_base_params.yaml     # move_base settings
│
├── rviz_config/                  # RViz configurations
│   └── default.rviz
│
├── CMakeLists.txt
├── package.xml
└── README.md

Launch Files

Launch File Purpose
core.launch Full system: SLAM + navigation + detection + RViz
detect_node.launch Detection node only
move_base.launch Navigation stack (move_base + TEB)
slam_toolbox_mapping.launch SLAM in mapping mode
slam_toolbox_localization.launch SLAM in localization mode
slam_toolbox_localization_detection.launch Localization + detection

Configuration

SLAM Configuration

param/slam_toolbox/mapper_params_localization.yaml
param/slam_toolbox/mapper_params_mapping.yaml

Costmap Configuration

param/costmap/costmap_common_params.yaml
param/costmap/local_costmap_params.yaml
param/costmap/global_costmap_params.yaml

Planner Configuration

param/planner/teb_local_planner_params.yaml
param/planner/base_global_planner_param.yaml

About

An intelligent robotic system that explores, remembers, and navigates using Vision-Language Models and Vector Databases.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors