Clarification / release request for navigation task

Hi V-JEPA team,

Thanks for the great paper and release.

I’m especially interested in the navigation results in:

- **Section 3.4: Navigation Planning**
- **Figure 9**
- **Table 7: Open Loop Navigation Planning**

Could you clarify or release the exact setup used for these results?

The main points that seem unclear are:

1. **Frozen or finetuned encoder?**  
   In first paragraph of section 3 (Results), the paper text says V-JEPA 2.1 is used as a frozen encoder, but the **Table 7 caption** says “we finetune V-JEPA 2.1 on robot navigation datasets”.

2. **Exact world model recipe**  
   Section 3.4 says you train a CDiT on top of V-JEPA 2.1, predict **clean representations instead of noise**, and use **DDIM**.  
   Could you share the exact config / architecture changes relative to NWM?

3. **Reproducibility**  
   If possible, could you release the config / checkpoint / eval code for the navigation experiment?

This part of the paper is very interesting, and having the exact setup would make reproduction much easier.

Thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification / release request for navigation task #155

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification / release request for navigation task #155

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions