Skip to content

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Notifications You must be signed in to change notification settings

ginwind/VLA-JEPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper PDF Project Page Hugging Face Code License

⭐ If our project helps you, please give us a star on GitHub to support us!

TODO

  • Partial training code
  • LIBERO evaluation code
  • LIBERO-Plus evaluation code
  • SimplerEnv evaluation code
  • Training codes for custom datasets

Environment Setup

git clone https://github.com/ginwind/VLA-JEPA

# Create conda environment
conda create -n VLA_JEPA python=3.10 -y
conda activate VLA_JEPA

# Install requirements
pip install -r requirements.txt

# Install FlashAttention2
pip install flash-attn --no-build-isolation

# Install project
pip install -e .

This repository's code is based on the starVLA.

Training

0️⃣ Pretrained Model Preparation

Download the Qwen3-VL-2B and the V-JEPA2 encoder.

1️⃣ Data Preparation

Download the following datasets:

2️⃣ Start Training

Depending on whether you are conducting pre-training or post-training, select the appropriate training script and YAML configuration file from the /scripts directory.

Ensure the following configurations are updated in the YAML file:

  • framework.qwenvl.basevlm and framework.vj2_model.base_encoder should be set to the paths of your respective checkpoints.
  • Update datasets.vla_data.data_root_dir, datasets.video_data.video_dir, and datasets.video_data.text_file to match the paths of your datasets.

Once the configurations are updated, you can proceed to start the training process.

Evaluation

Download the model checkpoints from Hugging Face: https://huggingface.co/ginwind/VLA-JEPA

Environment: Install the required Python packages into your VLA-JEPA environment:

pip install tyro matplotlib mediapy websockets msgpack
pip install numpy==1.24.4

LIBERO

  • LIBERO setup: Prepare the LIBERO benchmark in a separate conda environment following the official LIBERO instructions: https://github.com/Lifelong-Robot-Learning/LIBERO

  • Configuration: In the downloaded checkpoint folder, update config.json and config.yaml to point the following fields to your local checkpoints:

    • framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpoint
    • framework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
  • Evaluation script: Edit examples/LIBERO/eval_libero.sh and set the LIBERO_HOME environment variable (line 4) to your local LIBERO code path, and set the sim_python variable (line 9) to the Python executable of the LIBERO conda environment. Finally, set the your_ckpt variable (line 11) to the path of the downloaded LIBERO/checkpoints/VLA-JEPA-LIBERO.pt.

  • Run evaluation: Launch the evaluation (the script runs the four task suites in parallel across 4 GPUs):

bash ./examples/LIBERO/eval_libero.sh

LIBERO-Plus

  • LIBERO-Plus setup: Clone the LIBERO-Plus repository: https://github.com/sylvestf/LIBERO-plus. In ./examples/LIBERO-Plus/libero_plus_init.py, update line 121 to point to your LIBERO-Plus/libero/libero/benchmark/task_classification.json. Replace the original LIBERO-Plus/libero/libero/benchmark/__init__.py with the provided modified implementation (see ./examples/LIBERO-Plus/libero_plus_init.py) to enable evaluation over perturbation dimensions. Finally, follow the official LIBERO-Plus installation instructions and build the benchmark in a separate conda environment.

  • Configuration: In the downloaded checkpoint folder, update config.json and config.yaml to point the following fields to your local checkpoints:

    • framework.qwenvl.basevlm: path to the Qwen3-VL-2B checkpoint
    • framework.vj2_model.base_encoder: path to the V-JEPA encoder checkpoint
  • Evaluation script: Edit examples/LIBERO-Plus/eval_libero_plus.sh and set the LIBERO_HOME environment variable (line 4) to your local LIBERO-Plus code path, and set the sim_python variable (line 9) to the Python executable of the LIBERO-Plus conda environment. Finally, set the your_ckpt variable (line 11) to the path of the downloaded LIBERO/checkpoints/VLA-JEPA-LIBERO.pt.

  • Run evaluation: Launch the evaluation (the script runs the seven pertubation dimensions in parallel across 7 GPUs):

bash ./examples/LIBERO-Plus/eval_libero_plus.sh

Notes: Ensure each process has access to a GPU and verify that all checkpoint paths in the configuration files are correct before running the evaluation.

Acknowledgement

We extend our sincere gratitude to the starVLA project and the V-JEPA2 project for their invaluable open-source contributions.

About

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published