GitHub - ginwind/VLA-JEPA: VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

⭐ If our project helps you, please give us a star on GitHub to support us!

TODO

Environment Setup

git clone https://github.com/ginwind/VLA-JEPA

# Create conda environment
conda create -n VLA_JEPA python=3.10 -y
conda activate VLA_JEPA

# Install requirements
pip install -r requirements.txt

# Install FlashAttention2
pip install flash-attn --no-build-isolation

# Install project
pip install -e .

This repository's code is based on the starVLA.

Training

0️⃣ Pretrained Model Preparation

Download the Qwen3-VL-2B and the V-JEPA2 encoder.

1️⃣ Data Preparation

Download the following datasets:

2️⃣ Start Training

Depending on whether you are conducting pre-training or post-training, select the appropriate training script and YAML configuration file from the /scripts directory.

Ensure the following configurations are updated in the YAML file:

framework.qwenvl.basevlm and framework.vj2_model.base_encoder should be set to the paths of your respective checkpoints.
Update datasets.vla_data.data_root_dir, datasets.video_data.video_dir, and datasets.video_data.text_file to match the paths of your datasets.

Once the configurations are updated, you can proceed to start the training process.

Evaluation

Download the model checkpoints from Hugging Face: https://huggingface.co/ginwind/VLA-JEPA

Environment: Install the required Python packages into your VLA-JEPA environment:

pip install tyro matplotlib mediapy websockets msgpack
pip install numpy==1.24.4

LIBERO

LIBERO setup: Prepare the LIBERO benchmark in a separate conda environment following the official LIBERO instructions: https://github.com/Lifelong-Robot-Learning/LIBERO
Configuration: In the downloaded checkpoint folder, update config.json and config.yaml to point the following fields to your local checkpoints:
- framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpoint
- framework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
Evaluation script: Edit examples/LIBERO/eval_libero.sh and set the LIBERO_HOME environment variable (line 4) to your local LIBERO code path, and set the sim_python variable (line 9) to the Python executable of the LIBERO conda environment. Finally, set the your_ckpt variable (line 11) to the path of the downloaded LIBERO/checkpoints/VLA-JEPA-LIBERO.pt.
Run evaluation: Launch the evaluation (the script runs the four task suites in parallel across 4 GPUs):

bash ./examples/LIBERO/eval_libero.sh

LIBERO-Plus

LIBERO-Plus setup: Clone the LIBERO-Plus repository: https://github.com/sylvestf/LIBERO-plus. In ./examples/LIBERO-Plus/libero_plus_init.py, update line 121 to point to your LIBERO-Plus/libero/libero/benchmark/task_classification.json. Replace the original LIBERO-Plus/libero/libero/benchmark/__init__.py with the provided modified implementation (see ./examples/LIBERO-Plus/libero_plus_init.py) to enable evaluation over perturbation dimensions. Finally, follow the official LIBERO-Plus installation instructions and build the benchmark in a separate conda environment.
Configuration: In the downloaded checkpoint folder, update config.json and config.yaml to point the following fields to your local checkpoints:
- framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpoint
- framework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
Evaluation script: Edit examples/LIBERO-Plus/eval_libero_plus.sh and set the LIBERO_HOME environment variable (line 4) to your local LIBERO-Plus code path, and set the sim_python variable (line 9) to the Python executable of the LIBERO-Plus conda environment. Finally, set the your_ckpt variable (line 11) to the path of the downloaded LIBERO/checkpoints/VLA-JEPA-LIBERO.pt.
Run evaluation: Launch the evaluation (the script runs the seven pertubation dimensions in parallel across 7 GPUs):

bash ./examples/LIBERO-Plus/eval_libero_plus.sh

Notes: Ensure each process has access to a GPU and verify that all checkpoint paths in the configuration files are correct before running the evaluation.

Acknowledgement

We extend our sincere gratitude to the starVLA project and the V-JEPA2 project for their invaluable open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
deployment		deployment
examples		examples
scripts		scripts
starVLA		starVLA
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

TODO

Environment Setup

Training

0️⃣ Pretrained Model Preparation

1️⃣ Data Preparation

2️⃣ Start Training

Evaluation

LIBERO

LIBERO-Plus

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

ginwind/VLA-JEPA

Folders and files

Latest commit

History

Repository files navigation

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

TODO

Environment Setup

Training

0️⃣ Pretrained Model Preparation

1️⃣ Data Preparation

2️⃣ Start Training

Evaluation

LIBERO

LIBERO-Plus

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages