A real-time UI wrapper that adapts OpenEMMA for live autonomous driving in CARLA Simulator (0.9.16).
OpenEMMA was originally designed for offline trajectory prediction on the nuScenes dataset. This project brings it into a real-time CARLA environment with a visual UI, multi-model VLM support, and a Chain-of-Thought (CoT) reasoning display.
- Real-time CARLA driving with route-based pure-pursuit controller
- 4-step Chain-of-Thought pipeline displayed in UI:
- Scene Description
- Critical Object Detection
- Driving Intent
- Motion Prediction
- 4 VLM backends supported:
- LLaVA-v1.5-7b (local)
- LLaMA-3.2-11B-Vision (local, recommended)
- Qwen2-VL-7B-Instruct (local)
- GPT-4o (OpenAI API)
- Safety systems: red light detection, lane-keeping correction, stall recovery
- Chase camera with real-time info panel (speed, steering, VLM I/O)
openemmaUI.py # Main launcher & OpenEMMA agent
├── ui_common/
│ ├── agent_runner.py # CARLA simulation runner + SafetyLimiter
│ ├── panel.py # Info panel with color-coded LLM I/O
│ ├── camera.py # Third-person chase camera
│ ├── renderer.py # Window compositor (camera + panel)
│ ├── carla_setup.py # Auto-detect CARLA PythonAPI
│ └── carla_utils.py # CARLA connection utilities
└── OpenEMMA/ # Original OpenEMMA (cloned separately)
├── openemma/
└── llava/
Control flow:
- Primary control: Route-based pure-pursuit steering (no VLM dependency)
- VLM advisory: CoT runs every 20 frames in a background thread; results modulate target speed and are displayed in the UI panel
- Safety layer:
SafetyLimiteroverrides control for red lights, off-road correction, and stall recovery
All experiments and benchmarks were conducted on the following hardware:
| Component | Specification |
|---|---|
| CPU | Intel Core i9-14900K |
| RAM | 128 GB DDR5 |
| GPU | NVIDIA RTX 5090 (32 GB VRAM) |
| OS | Windows 11 Pro |
| CUDA | 12.8 |
| Python | 3.12 |
| CARLA | 0.9.16 |
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 / Linux | Windows 11 Pro |
| GPU | 16 GB VRAM (LLaVA) | 24+ GB VRAM (LLaMA-3.2-11B) |
| RAM | 16 GB | 32+ GB |
| CUDA | 12.1+ | 12.8+ |
| Python | 3.10 | 3.12 |
| CARLA | 0.9.16 | 0.9.16 |
Download and extract CARLA 0.9.16 from the official releases.
# Example: extract to a known location
# Windows: C:\CARLA_0.9.16\
# Linux: /opt/carla/git clone https://github.com/justinbrianhwang/OpenEMMA-UI.git
cd OpenEMMA-UIgit clone https://github.com/taco-group/OpenEMMA.gitconda create -n openemma python=3.12 -y
conda activate openemma# CUDA 12.8 example
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
# CUDA 12.1 example
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121pip install -r requirements.txt# Find the .whl file in your CARLA installation
pip install /path/to/CARLA_0.9.16/PythonAPI/carla/dist/carla-0.9.16-cp312-cp312-win_amd64.whlChoose one or more backends:
# LLaMA-3.2-11B-Vision (recommended, ~22GB VRAM)
# Downloads automatically from HuggingFace on first run
# Or pre-download:
python -c "from transformers import MllamaForConditionalGeneration; MllamaForConditionalGeneration.from_pretrained('meta-llama/Llama-3.2-11B-Vision-Instruct')"
# LLaVA-v1.5-7b (~14GB VRAM)
python -c "from transformers import LlavaForConditionalGeneration; LlavaForConditionalGeneration.from_pretrained('llava-hf/llava-1.5-7b-hf')"
# Qwen2-VL-7B (~16GB VRAM)
python -c "from transformers import Qwen2VLForConditionalGeneration; Qwen2VLForConditionalGeneration.from_pretrained('Qwen/Qwen2-VL-7B-Instruct')"
# GPT-4o: No download needed, just set API key
export OPENAI_API_KEY=sk-...# Windows
cd C:\CARLA_0.9.16
CarlaUE4.exe
# Linux
cd /opt/carla
./CarlaUE4.shconda activate openemma
# LLaMA-3.2-11B-Vision (recommended)
python openemmaUI.py --llama
# LLaVA-v1.5-7b
python openemmaUI.py --llava
# Qwen2-VL-7B
python openemmaUI.py --qwen
# GPT-4o (requires OPENAI_API_KEY)
OPENAI_API_KEY=sk-... python openemmaUI.py --gpt
# Specify town
python openemmaUI.py --llama --town Town02
# Custom model path
python openemmaUI.py --model-path /path/to/custom/model| Key | Action |
|---|---|
| ESC | Quit |
| Mouse | UI interaction |
Watch each VLM backend driving in CARLA Town01 (click thumbnails to play):
| LLaVA-v1.5-7b (★★☆☆☆) | LLaMA-3.2-11B-Vision (★★★★☆) |
|---|---|
![]() |
![]() |
| Qwen2-VL-7B (★★★☆☆) | GPT-4o (★★★★★) |
|---|---|
![]() |
![]() |
We benchmarked 4 VLM backends on Town01 with identical routes and traffic conditions.
| Model | Scene Quality | Hallucination | Intent Accuracy | VRAM | Cost | Rating |
|---|---|---|---|---|---|---|
| GPT-4o | Excellent | Minimal | High | 0 GB | ~$0.01/frame | ★★★★★ |
| LLaMA-3.2-11B | Good | Minimal | High | ~22 GB | Free | ★★★★☆ |
| Qwen2-VL-7B | Decent | Moderate | Medium | ~16 GB | Free | ★★★☆☆ |
| LLaVA-v1.5-7b | Poor | Severe | Low | ~14 GB | Free | ★★☆☆☆ |
See VLM_Model_Comparison.md for detailed analysis with examples.
Key findings:
- LLaMA-3.2-11B-Vision is the best local model with minimal hallucination
- LLaVA and Qwen suffer from persistent "red traffic light" hallucination on empty roads
- GPT-4o provides the most detailed scene descriptions but requires API costs
OpenEMMA-UI/
├── README.md # This file
├── LICENSE # Apache 2.0
├── requirements.txt # Python dependencies
├── openemmaUI.py # Main launcher & agent (route + VLM CoT)
├── VLM_Model_Comparison.md # Detailed VLM benchmark results
├── ui_common/ # Shared UI & simulation framework
│ ├── __init__.py
│ ├── agent_runner.py # AgentRunner + SafetyLimiter
│ ├── panel.py # InfoPanel (speed, steer, LLM I/O)
│ ├── camera.py # ChaseCameraManager
│ ├── renderer.py # UIRenderer (compositor)
│ ├── carla_setup.py # CARLA PythonAPI auto-detection
│ └── carla_utils.py # CarlaConnection helper
└── OpenEMMA/ # Clone from taco-group/OpenEMMA
├── openemma/
├── llava/
└── ...
CARLA connection refused / module 'carla' has no attribute 'Client'
- Make sure
CarlaUE4.exeis running before launching OpenEMMA-UI - Verify the CARLA PythonAPI wheel is installed in your conda env:
pip install /path/to/CARLA_0.9.16/PythonAPI/carla/dist/carla-0.9.16-cp312-cp312-win_amd64.whl
- Check that your Python version matches the wheel (e.g.,
cp312= Python 3.12)
CUDA out of memory
- LLaMA-3.2-11B requires ~22 GB VRAM. If your GPU has less, try:
--llava(14 GB) or--qwen(16 GB) instead--gptuses no local VRAM (cloud API)
- Close other GPU-intensive applications before running
Model download stuck / HuggingFace authentication error
- Some models (e.g., LLaMA-3.2) require accepting the license on HuggingFace first
- Login via:
huggingface-cli login - Or download manually and use
--model-path /local/path
Black screen / no camera output
- CARLA rendering may take a few seconds to initialize
- Try switching to a simpler town:
--town Town01 - Ensure your GPU drivers are up to date
IMU sensor timeout warning
[WARNING] Sensor timeout. Missing: {'imu'}is non-critical- The system falls back to speed-only estimation automatically
- This occurs occasionally in CARLA 0.9.16 under high GPU load
- VLM is advisory only: The VLM Chain-of-Thought pipeline provides scene understanding displayed in the UI, but actual vehicle control relies on the route-based pure-pursuit controller. The VLM modulates target speed but does not directly steer.
- Hallucination in smaller models: LLaVA-v1.5-7b and Qwen2-VL-7B frequently hallucinate "red traffic light" on empty roads, causing unnecessary stops. Use LLaMA-3.2-11B or GPT-4o for more reliable scene understanding.
- Single-town evaluation: Current benchmarks are conducted on Town01. Results may vary on more complex maps (Town03, Town05) with different traffic patterns.
- No multi-agent traffic: Testing is performed with CARLA's default traffic manager. Dense traffic scenarios have not been extensively evaluated.
- Windows-only testing: While the codebase should work on Linux, it has only been tested on Windows 11.
If you use this project in your research, please cite both the original OpenEMMA paper and this repository:
@misc{openemma2024,
title={OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving},
author={Shuo Xing and Chengyuan Qian and Hongyuan Hua and Kexin Tian and Yu Zhang and Siheng Chen and Zhengzhong Tu},
year={2024},
eprint={2412.15208},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{openemmaui2025,
title={OpenEMMA-UI: Real-Time VLM-Driven Autonomous Driving in CARLA},
author={Sunjun Hwang},
year={2025},
url={https://github.com/justinbrianhwang/OpenEMMA-UI}
}This project builds upon the excellent work of the original OpenEMMA authors:
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving Shuo Xing, Chengyuan Qian, Hongyuan Hua, Kexin Tian, Yu Zhang, Siheng Chen, Zhengzhong Tu TACO Research Group, Texas A&M University Paper: arXiv:2412.15208 Repository: taco-group/OpenEMMA
We sincerely thank the OpenEMMA team for making their research open-source, which made this real-time adaptation possible.
Additional thanks to:
- CARLA Simulator team for the open-source driving simulator
- Hugging Face for hosting the VLM model weights
- The developers of LLaVA, LLaMA, Qwen2-VL, and GPT-4o for their VLM models
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.




