DreamBot

An intelligent robotic system that explores, remembers, and navigates using Vision-Language Models and Vector Databases.

Overview

DreamBot is a ROS-based intelligent robotic system running on the Unitree Go2 quadruped robot. It bridges modern AI technologies (Vision-Language Models) with classical robotics (SLAM, path planning) to create a robot that can understand natural language commands and autonomously navigate to remembered objects.

What DreamBot Does

During Exploration: The robot autonomously wanders around the environment, continuously detecting and memorizing objects it encounters, marking them on a semantic map.

On User Command: Users can instruct the robot to navigate to any previously seen object using natural language (e.g., "go to the red cup").

Key Innovation

Bridges perception with action: Combines what objects are with how to reach them
Persistent semantic memory: Objects remembered across sessions via Milvus vector database
Zero-shot detection: Detect any object using natural language without prior training
Hybrid search: Combines semantic (COSINE) and spatial (L2) search for intelligent retrieval

Key Features

Two-Phase Operation

Phase	Description
Exploration Phase	Robot autonomously explores, detects objects, and builds semantic memory
Navigation Phase	User commands robot to navigate to any remembered object

Core Capabilities

Autonomous Exploration & Memorization - Robot records what it sees and marks objects on the semantic map
Zero-Shot Object Detection - Detect any object using natural language via Qwen2.5-VL-72B
Semantic Memory System - Store and retrieve objects by meaning using 768-dim vector embeddings
Natural Language Navigation - Navigate using conversational commands ("go to the blue cup")
3D Spatial Understanding - Transform 2D detections to 3D world coordinates
Persistent Memory - Objects remembered across sessions via Milvus
Real-time Visualization - Live 3D bounding boxes in RViz
Hybrid Search - Combine semantic (COSINE) and spatial (L2) search

Core Workflow

Phase 1: Autonomous Exploration & Memorization

While the robot explores/wanders in the environment:

SLAM builds a map of the surroundings in real-time
Camera continuously captures RGB-D images and point clouds
VLM automatically detects and identifies objects in view
Objects are stored in Milvus database with:
- Semantic embeddings (768-dim vectors for natural language search)
- 3D positions in map frame (for navigation)
- Labels and detailed descriptions
Free navigation space is extracted from costmap
Result: A semantic map with all discovered objects marked on it

Key Insight: The robot remembers what it has seen during exploration, building a persistent memory of the environment.

Phase 2: On-Demand Navigation

When user issues a command:

User says: "go to the red cup" or "navigate to the blue book"
System searches Milvus using semantic similarity (COSINE)
Retrieves the object's 3D position from memory
Finds the nearest free navigation point (L2 distance)
Plans optimal path using TEB local planner
Navigates autonomously to the target location

Key Insight: Users can navigate to any remembered object without re-detecting it - the robot already knows where things are!

Technical Pipeline

Detection & Memorization Pipeline

/instruction ──▶ [RGB-D Camera] ──▶ [VLM Detection] ──▶ [2D→3D Projection]
                      │                    │                    │
               /d435/color/        Qwen2.5-VL-72B       TF Transform
               image_raw                                  (camera→map)
                      │                    │                    │
                      └────────────────────┴────────────────────┘
                                           │
                                           ▼
                                   [Milvus Database]
                                   • Object embeddings
                                   • 3D positions
                                   • Labels & descriptions

Navigation Execution Pipeline

/goal_input ──▶ [Milvus Search] ──▶ [Free Space Query] ──▶ [move_base]
     │               │                     │                    │
 "red cup"     COSINE search          L2 distance         TEB planner
               (semantic)             (spatial)
     │               │                     │                    │
     └───────────────┴─────────────────────┴────────────────────┘
                             │
                             ▼
                      [Robot Navigation]

Hardware Requirements

Component	Specification
Robot	Unitree Go2 quadruped robot
Camera	Intel RealSense D435 RGB-D camera
Computer	ROS-compatible system (Ubuntu 18.04/20.04)

Software Dependencies

ROS Packages

ROS Noetic
slam_toolbox - SLAM mapping and localization
move_base - Navigation stack
teb_local_planner - Trajectory execution
cv_bridge - OpenCV-ROS integration

Installation

# 1. Clone repository
cd ~/catkin_ws/src
git clone <repository_url> DreamBot

# 2. Install Python dependencies
pip install pymilvus flask openai opencv-python numpy pillow

# 3. Build ROS package
cd ~/catkin_ws
catkin_make

# 4. Source the workspace
source devel/setup.bash

Quick Start

Step 0: Setup Simulation Environment (Optional)

If using the Unitree Go2 in simulation mode:

# Start the junior_ctrl controller for Go2
sudo ./devel/lib/unitree_guide/junior_ctrl

In another terminal, control the robot with keyboard:

rosrun teleop_twist_keyboard teleop_twist_keyboard.py

Step 1: Start Milvus Server

python Milvus/server_test.py

The server will start on http://0.0.0.0:5002.

Step 2: Start Full System

roslaunch DreamBot core.launch

This launches:

SLAM (slam_toolbox)
Navigation stack (move_base + TEB planner)
Detection node
RViz visualization

Step 3: Explore and Memorize

Drive the robot around using keyboard commands. The system will automatically:

Build a map using SLAM
Detect and memorize objects in view (when instruction is sent)
Mark objects on the semantic map

Step 4: Navigate to Objects

# Navigate to a previously detected object
rostopic pub /goal_input std_msgs/String "data: 'the red cup'"

ROS Topics

Subscribed Topics

Topic	Message Type	Description
`/d435/color/image_raw`	`sensor_msgs/Image`	RGB images from RealSense
`/d435/depth/color/points`	`sensor_msgs/PointCloud2`	Point cloud data
`/d435/color/camera_info`	`sensor_msgs/CameraInfo`	Camera intrinsic parameters
`/instruction`	`std_msgs/String`	Detection commands
`/goal_input`	`std_msgs/String`	Navigation target description

Published Topics

Topic	Message Type	Description
`/detect/labeled_image`	`sensor_msgs/Image`	Annotated detection images
`/detect/marker_array`	`visualization_msgs/MarkerArray`	3D markers for RViz
`/detect/points_rgb_frame_color`	`sensor_msgs/PointCloud2`	Detected point cloud
`/robot_pose`	`geometry_msgs/PoseStamped`	Current robot pose in map frame

Milvus API Endpoints

The Milvus server runs on port 5002 by default.

Endpoint	Method	Description
`/api/insert/object`	POST	Insert detected objects with embeddings
`/api/search/object`	GET	Semantic search for objects (COSINE)
`/api/insert/point`	POST	Insert free space navigation points
`/api/search/point`	GET	Find nearest free navigation point (L2)

Example API Calls

# Insert object
import requests
import json

data = {
    "data": [{
        "label": "cup",
        "description": "a red ceramic cup on the table",
        "timestamp": 0,
        "position": [1.5, 2.0, 0.3]
    }]
}
response = requests.post("http://localhost:5002/api/insert/object", json=data)

# Search object
query = {"data": ["red cup"]}
response = requests.get("http://localhost:5002/api/search/object", json=query)

Usage Examples

# Navigate to a remembered object
rostopic pub /goal_input std_msgs/String "data: 'red cup'"

# Navigate using description
rostopic pub /goal_input std_msgs/String "data: 'the blue book on the shelf'"

Project Structure

DreamBot/
├── scripts/                      # ROS nodes
│   ├── detection_node.py         # VLM-based object detection
│   ├── navigation_node.py        # Semantic navigation controller
│   └── save_costmap.py           # Map persistence utilities
│
├── utils/                        # Utility modules
│   ├── model.py                  # VLM integration (Qwen2.5-VL)
│   ├── memory_process.py         # Memory operations
│   ├── visualization.py          # RViz marker utilities
│   ├── pointcloud.py             # Point cloud utilities
│   └── Milvus/
│       └── Milvus_client.py      # HTTP client for Milvus
│
├── Milvus/                       # Database server
│   └── server_test.py            # Flask API server
│
├── launch/                       # ROS launch files
│   ├── core.launch               # Full system launch
│   ├── detect_node.launch        # Detection node only
│   ├── move_base.launch          # Navigation stack
│   ├── slam_toolbox_mapping.launch       # SLAM mapping mode
│   ├── slam_toolbox_localization.launch  # SLAM localization mode
│   └── slam_toolbox_localization_detection.launch
│
├── param/                        # Configuration files
│   ├── slam_toolbox/             # SLAM parameters
│   ├── costmap/                  # Costmap configuration
│   ├── planner/                  # Planner parameters
│   └── move_base_params.yaml     # move_base settings
│
├── rviz_config/                  # RViz configurations
│   └── default.rviz
│
├── CMakeLists.txt
├── package.xml
└── README.md

Launch Files

Launch File	Purpose
`core.launch`	Full system: SLAM + navigation + detection + RViz
`detect_node.launch`	Detection node only
`move_base.launch`	Navigation stack (move_base + TEB)
`slam_toolbox_mapping.launch`	SLAM in mapping mode
`slam_toolbox_localization.launch`	SLAM in localization mode
`slam_toolbox_localization_detection.launch`	Localization + detection

Configuration

SLAM Configuration

param/slam_toolbox/mapper_params_localization.yaml
param/slam_toolbox/mapper_params_mapping.yaml

Costmap Configuration

param/costmap/costmap_common_params.yaml
param/costmap/local_costmap_params.yaml
param/costmap/global_costmap_params.yaml

Planner Configuration

param/planner/teb_local_planner_params.yaml
param/planner/base_global_planner_param.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Milvus		Milvus
images		images
launch		launch
map		map
param		param
rviz_config		rviz_config
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
package.xml		package.xml

Folders and files

Latest commit

History

Repository files navigation

DreamBot

Overview

What DreamBot Does

Key Innovation

Key Features

Two-Phase Operation

Core Capabilities

Core Workflow

Phase 1: Autonomous Exploration & Memorization

Phase 2: On-Demand Navigation

Technical Pipeline

Detection & Memorization Pipeline

Navigation Execution Pipeline

Hardware Requirements

Software Dependencies

ROS Packages

Installation

Quick Start

Step 0: Setup Simulation Environment (Optional)

Step 1: Start Milvus Server

Step 2: Start Full System

Step 3: Explore and Memorize

Step 4: Navigate to Objects

ROS Topics

Subscribed Topics

Published Topics

Milvus API Endpoints

Example API Calls

Usage Examples

Project Structure

Launch Files

Configuration

SLAM Configuration

Costmap Configuration

Planner Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages