An intelligent robotic system that explores, remembers, and navigates using Vision-Language Models and Vector Databases.
DreamBot is a ROS-based intelligent robotic system running on the Unitree Go2 quadruped robot. It bridges modern AI technologies (Vision-Language Models) with classical robotics (SLAM, path planning) to create a robot that can understand natural language commands and autonomously navigate to remembered objects.
During Exploration: The robot autonomously wanders around the environment, continuously detecting and memorizing objects it encounters, marking them on a semantic map.
On User Command: Users can instruct the robot to navigate to any previously seen object using natural language (e.g., "go to the red cup").
- Bridges perception with action: Combines what objects are with how to reach them
- Persistent semantic memory: Objects remembered across sessions via Milvus vector database
- Zero-shot detection: Detect any object using natural language without prior training
- Hybrid search: Combines semantic (COSINE) and spatial (L2) search for intelligent retrieval
| Phase | Description |
|---|---|
| Exploration Phase | Robot autonomously explores, detects objects, and builds semantic memory |
| Navigation Phase | User commands robot to navigate to any remembered object |
- Autonomous Exploration & Memorization - Robot records what it sees and marks objects on the semantic map
- Zero-Shot Object Detection - Detect any object using natural language via Qwen2.5-VL-72B
- Semantic Memory System - Store and retrieve objects by meaning using 768-dim vector embeddings
- Natural Language Navigation - Navigate using conversational commands ("go to the blue cup")
- 3D Spatial Understanding - Transform 2D detections to 3D world coordinates
- Persistent Memory - Objects remembered across sessions via Milvus
- Real-time Visualization - Live 3D bounding boxes in RViz
- Hybrid Search - Combine semantic (COSINE) and spatial (L2) search
While the robot explores/wanders in the environment:
- SLAM builds a map of the surroundings in real-time
- Camera continuously captures RGB-D images and point clouds
- VLM automatically detects and identifies objects in view
- Objects are stored in Milvus database with:
- Semantic embeddings (768-dim vectors for natural language search)
- 3D positions in map frame (for navigation)
- Labels and detailed descriptions
- Free navigation space is extracted from costmap
- Result: A semantic map with all discovered objects marked on it
Key Insight: The robot remembers what it has seen during exploration, building a persistent memory of the environment.
When user issues a command:
- User says: "go to the red cup" or "navigate to the blue book"
- System searches Milvus using semantic similarity (COSINE)
- Retrieves the object's 3D position from memory
- Finds the nearest free navigation point (L2 distance)
- Plans optimal path using TEB local planner
- Navigates autonomously to the target location
Key Insight: Users can navigate to any remembered object without re-detecting it - the robot already knows where things are!
/instruction ──▶ [RGB-D Camera] ──▶ [VLM Detection] ──▶ [2D→3D Projection]
│ │ │
/d435/color/ Qwen2.5-VL-72B TF Transform
image_raw (camera→map)
│ │ │
└────────────────────┴────────────────────┘
│
▼
[Milvus Database]
• Object embeddings
• 3D positions
• Labels & descriptions
/goal_input ──▶ [Milvus Search] ──▶ [Free Space Query] ──▶ [move_base]
│ │ │ │
"red cup" COSINE search L2 distance TEB planner
(semantic) (spatial)
│ │ │ │
└───────────────┴─────────────────────┴────────────────────┘
│
▼
[Robot Navigation]
| Component | Specification |
|---|---|
| Robot | Unitree Go2 quadruped robot |
| Camera | Intel RealSense D435 RGB-D camera |
| Computer | ROS-compatible system (Ubuntu 18.04/20.04) |
- ROS Noetic
slam_toolbox- SLAM mapping and localizationmove_base- Navigation stackteb_local_planner- Trajectory executioncv_bridge- OpenCV-ROS integration
# 1. Clone repository
cd ~/catkin_ws/src
git clone <repository_url> DreamBot
# 2. Install Python dependencies
pip install pymilvus flask openai opencv-python numpy pillow
# 3. Build ROS package
cd ~/catkin_ws
catkin_make
# 4. Source the workspace
source devel/setup.bashIf using the Unitree Go2 in simulation mode:
# Start the junior_ctrl controller for Go2
sudo ./devel/lib/unitree_guide/junior_ctrlIn another terminal, control the robot with keyboard:
rosrun teleop_twist_keyboard teleop_twist_keyboard.pypython Milvus/server_test.pyThe server will start on http://0.0.0.0:5002.
roslaunch DreamBot core.launchThis launches:
- SLAM (slam_toolbox)
- Navigation stack (move_base + TEB planner)
- Detection node
- RViz visualization
Drive the robot around using keyboard commands. The system will automatically:
- Build a map using SLAM
- Detect and memorize objects in view (when instruction is sent)
- Mark objects on the semantic map
# Navigate to a previously detected object
rostopic pub /goal_input std_msgs/String "data: 'the red cup'"| Topic | Message Type | Description |
|---|---|---|
/d435/color/image_raw |
sensor_msgs/Image |
RGB images from RealSense |
/d435/depth/color/points |
sensor_msgs/PointCloud2 |
Point cloud data |
/d435/color/camera_info |
sensor_msgs/CameraInfo |
Camera intrinsic parameters |
/instruction |
std_msgs/String |
Detection commands |
/goal_input |
std_msgs/String |
Navigation target description |
| Topic | Message Type | Description |
|---|---|---|
/detect/labeled_image |
sensor_msgs/Image |
Annotated detection images |
/detect/marker_array |
visualization_msgs/MarkerArray |
3D markers for RViz |
/detect/points_rgb_frame_color |
sensor_msgs/PointCloud2 |
Detected point cloud |
/robot_pose |
geometry_msgs/PoseStamped |
Current robot pose in map frame |
The Milvus server runs on port 5002 by default.
| Endpoint | Method | Description |
|---|---|---|
/api/insert/object |
POST | Insert detected objects with embeddings |
/api/search/object |
GET | Semantic search for objects (COSINE) |
/api/insert/point |
POST | Insert free space navigation points |
/api/search/point |
GET | Find nearest free navigation point (L2) |
# Insert object
import requests
import json
data = {
"data": [{
"label": "cup",
"description": "a red ceramic cup on the table",
"timestamp": 0,
"position": [1.5, 2.0, 0.3]
}]
}
response = requests.post("http://localhost:5002/api/insert/object", json=data)
# Search object
query = {"data": ["red cup"]}
response = requests.get("http://localhost:5002/api/search/object", json=query)# Navigate to a remembered object
rostopic pub /goal_input std_msgs/String "data: 'red cup'"
# Navigate using description
rostopic pub /goal_input std_msgs/String "data: 'the blue book on the shelf'"DreamBot/
├── scripts/ # ROS nodes
│ ├── detection_node.py # VLM-based object detection
│ ├── navigation_node.py # Semantic navigation controller
│ └── save_costmap.py # Map persistence utilities
│
├── utils/ # Utility modules
│ ├── model.py # VLM integration (Qwen2.5-VL)
│ ├── memory_process.py # Memory operations
│ ├── visualization.py # RViz marker utilities
│ ├── pointcloud.py # Point cloud utilities
│ └── Milvus/
│ └── Milvus_client.py # HTTP client for Milvus
│
├── Milvus/ # Database server
│ └── server_test.py # Flask API server
│
├── launch/ # ROS launch files
│ ├── core.launch # Full system launch
│ ├── detect_node.launch # Detection node only
│ ├── move_base.launch # Navigation stack
│ ├── slam_toolbox_mapping.launch # SLAM mapping mode
│ ├── slam_toolbox_localization.launch # SLAM localization mode
│ └── slam_toolbox_localization_detection.launch
│
├── param/ # Configuration files
│ ├── slam_toolbox/ # SLAM parameters
│ ├── costmap/ # Costmap configuration
│ ├── planner/ # Planner parameters
│ └── move_base_params.yaml # move_base settings
│
├── rviz_config/ # RViz configurations
│ └── default.rviz
│
├── CMakeLists.txt
├── package.xml
└── README.md
| Launch File | Purpose |
|---|---|
core.launch |
Full system: SLAM + navigation + detection + RViz |
detect_node.launch |
Detection node only |
move_base.launch |
Navigation stack (move_base + TEB) |
slam_toolbox_mapping.launch |
SLAM in mapping mode |
slam_toolbox_localization.launch |
SLAM in localization mode |
slam_toolbox_localization_detection.launch |
Localization + detection |
param/slam_toolbox/mapper_params_localization.yaml
param/slam_toolbox/mapper_params_mapping.yamlparam/costmap/costmap_common_params.yaml
param/costmap/local_costmap_params.yaml
param/costmap/global_costmap_params.yamlparam/planner/teb_local_planner_params.yaml
param/planner/base_global_planner_param.yaml