A Spatial Retrieval-Augmented Generation system for latent world models, designed for embodied spatial intelligence in robotics, autonomous navigation, and embodied AI.
This project implements a memory-augmented latent world model that:
- Encodes observations (RGB, depth, proprioception) into compact latent representations
- Stores latent states with spatial metadata in a vector database
- Retrieves relevant past experiences using hybrid spatial + latent similarity search
- Predicts future states by conditioning on retrieved memory context
Result: Improved prediction accuracy (15-30%) and sample efficiency for embodied agents.
📖 New to Spatial-RAG? See the Practical Usage Guide for real-world applications and examples.
┌─────────────────────────────────────────────────────────────────────────────┐
│ SPATIAL-RAG SYSTEM OVERVIEW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ PERCEPTION WORLD MODEL MEMORY │
│ ────────── ─────────── ──────── │
│ │
│ 📷 Camera ──────► ┌─────────┐ ┌──────────┐ │
│ │ Encoder │ ──► z[32] ─────►│ Qdrant │ │
│ 🎮 Actions ──────► └────┬────┘ │ │ (Vector │ │
│ │ │ │ DB) │ │
│ 📍 Pose ──────► ▼ │ └────┬─────┘ │
│ ┌──────────┐ │ │ │
│ │Transition│◄──────┴─────────────┘ │
│ └────┬─────┘ Retrieved │
│ │ Memories │
│ ▼ │
│ ┌─────────┐ │
│ │ Decoder │ ──► 🖼️ Predicted Frame │
│ └─────────┘ │
│ │ │
│ ▼ │
│ ┌─────────┐ │
│ │ Policy │ ──► 🤖 Motor Commands │
│ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Key Components:
Perception → Encoder → Latent z → Memory Bank (Qdrant)
↓ ↓
Transition ← ← Retrieval Module
↓
Decoder → Reconstruction
↓
Policy/Planner → Actions
# Clone the repository
git clone <repo-url>
cd Spatial-RAG-Worldmodel
# Install dependencies (optional for local dev)
pip install -r requirements.txt# Build the shared base image
docker compose build base
# Start core services (Qdrant + API)
docker compose --profile core up -d
# Start UI dashboard
docker compose --profile ui up -dAccess:
- 🌐 API: http://localhost:8080
- 📚 API Docs: http://localhost:8080/docs
- 🖥️ UI Dashboard: http://localhost:3000
- 🔍 Qdrant: http://localhost:6333
# Generate synthetic data
docker compose run --rm generate-data python scripts/simulate_env.py --out data/trajectories --n 500
# Train models
docker compose run --rm train
# Restart API with trained model
docker compose restart api- Open http://localhost:3000
- Click "Generate Random Latent"
- Click "Start Rollout"
- Watch predicted frames stream in real-time!
| Application | Use Case |
|---|---|
| 🚁 Autonomous Drones | Navigate cities using past flight memories |
| 🚗 Self-Driving Cars | Predict pedestrian behavior at intersections |
| 📦 Warehouse Robots | Remember item locations for faster picking |
| 🏠 Home Assistants | Learn house layout, remember where things are |
| 👓 AR Navigation | Predictive overlays based on spatial memory |
📖 See Practical Usage Guide for detailed examples.
All services share optimized images (~25GB total):
| Service | Purpose |
|---|---|
api |
FastAPI inference server |
ui |
Next.js dashboard |
qdrant |
Vector database |
train |
Model training |
ros2 |
ROS2 robotics node |
generate-data |
Synthetic data |
collect |
Data collection |
# Start services
docker compose --profile core up -d # API + Qdrant
docker compose --profile ui up -d # Dashboard
docker compose --profile ros2 up -d # ROS2 node
# Run tasks
docker compose run --rm train # Train model
docker compose run --rm experiment # Run experiments
docker compose run --rm reports # Generate reports
# Manage
docker compose logs -f api # View logs
docker compose ps # Check status
docker compose down # Stop allReal-time robotics integration with ROS2 Humble. ✅ Fully tested and working!
┌─────────────────────────────────────────────────────────────────────────────┐
│ ROS2 DATA FLOW (@25Hz) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 📷 Camera ───► webcam_bridge.py ───► FastAPI ───► ROS2 Bridge │
│ │ │ │ │
│ Capture Encode Publish │
│ Frames to z[32] Topics │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ │
│ │ z_next │ │ /latent │ │
│ │predicted│ │/latent_ │ │
│ └─────────┘ │ next │ │
│ └────┬─────┘ │
│ │ │
│ 🎮 /actions ◄───────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 🤖 Motors │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
# Start all ROS2 services
docker compose --profile ros2 up -d
# Stream webcam with predictions to ROS2 (on Windows host)
python scripts/webcam_bridge.py --mode api --camera 0 --fps 5 --display --ros2-bridge http://localhost:8082| Topic | Direction | Rate | Description |
|---|---|---|---|
/latent |
Publish | ~25Hz | Current 32-dim latent state |
/latent_next |
Publish | ~25Hz | Predicted next latent |
/actions |
Subscribe | 5Hz+ | Action commands [x, y] |
/camera/image_raw |
Subscribe | - | Camera images |
# Publish actions (forward motion)
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && \
ros2 topic pub /actions std_msgs/Float32MultiArray '{data: [1.0, 0.5]}' --rate 5"# Echo latent vector
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic echo /latent --once"
# Echo prediction
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic echo /latent_next --once"
# Check rates (~25Hz)
docker compose exec ros2-bridge bash -c "source /opt/ros/humble/setup.bash && ros2 topic hz /latent"Stream your laptop webcam to the API for real-time encoding:
# Install on Windows (NOT Docker)
pip install opencv-python requests pillow
# List cameras
python scripts/webcam_bridge.py --mode list
# Stream with live preview
python scripts/webcam_bridge.py --mode api --camera 0 --displayOutput:
Streaming to http://localhost:8080 at 5 FPS
Frame 100: latent mean=-0.0575, FPS=4.8
📖 See Webcam Streaming Guide for details.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/encode |
POST | Image → latent |
/webcam/encode |
POST | Base64 image → latent (for webcam) |
/predict |
POST | One-step prediction |
/rollout |
POST | Multi-step rollout |
/stream-rollout |
GET | SSE streaming |
/retrieve |
POST | Memory search |
| Model | Latent MSE | Improvement |
|---|---|---|
| Baseline | 0.0234 | - |
| Spatial-RAG | 0.0198 | 15.4% |
| Feature | Status | Description |
|---|---|---|
| ✅ Latent Encoding | Done | Real-time camera → 32-dim latent @ 25Hz |
| ✅ Next-State Prediction | Done | /latent_next predictions |
| ✅ Memory Retrieval | Done | Qdrant-based spatial memory |
| ✅ ROS2 Integration | Done | /latent, /actions topics |
| 🔜 Policy Network | Planned | Neural net: latent → motor commands |
| 🔜 Robot Training Data | Planned | Collect from YOUR robot |
| 🔜 Path Planning | Planned | A*/RRT goal navigation |
📖 See Robot Integration Guide for full autonomy details.
| Option | Price (PKR) | Inference | Best For |
|---|---|---|---|
| Pi 4 (4GB) | ~Rs 18,000 | ~50ms | Budget robots |
| Pi 5 (8GB) | ~Rs 28,000 | ~30ms | Faster autonomy |
| Jetson Orin | ~Rs 60,000+ | ~5ms | Production |
❌ Pi Zero not recommended (too slow for real-time inference)
| Document | Description |
|---|---|
| 📖 Practical Guide | Real-world applications and examples |
| 🤖 Robot Integration | End-to-end robot setup guide |
| 🛠️ Build Guide | Shopping list + assembly instructions |
| 📷 Webcam Streaming | Stream laptop camera to API |
| 🏗️ Design | Architecture and system design |
| 🚀 Deployment | Production deployment guide |
| 📷 Data Collection | Collecting robot data |
| 📋 Quick Reference | Command cheat sheet |
Spatial-RAG-Worldmodel/
├── api/ # FastAPI server
├── ui/ # Next.js dashboard
├── ros2_ws/ # ROS2 integration
├── src/ # Core library
│ ├── models/ # Encoder, Transition, Decoder
│ ├── memory/ # Qdrant, Faiss stores
│ └── datasets/ # Data loading
├── scripts/ # Training, export, collection
├── docs/ # Documentation
└── docker-compose.yml # Service orchestration
| Variable | Default | Description |
|---|---|---|
Z_DIM |
32 | Latent dimension |
ACTION_DIM |
2 | Action dimension |
TOPK |
8 | Retrieved memories |
QDRANT_HOST |
localhost | Qdrant host |
MIT License
@software{spatial_rag_worldmodel,
title={Spatial-RAG World Model for Embodied Spatial Intelligence},
author={Adnan Sattar},
year={2025},
url={https://github.com/adnansattar/Spatial-RAG-Worldmodel}
}

