Skip to content

Clarification / release request for navigation task #155

@lqh52

Description

@lqh52

Hi V-JEPA team,

Thanks for the great paper and release.

I’m especially interested in the navigation results in:

  • Section 3.4: Navigation Planning
  • Figure 9
  • Table 7: Open Loop Navigation Planning

Could you clarify or release the exact setup used for these results?

The main points that seem unclear are:

  1. Frozen or finetuned encoder?
    In first paragraph of section 3 (Results), the paper text says V-JEPA 2.1 is used as a frozen encoder, but the Table 7 caption says “we finetune V-JEPA 2.1 on robot navigation datasets”.

  2. Exact world model recipe
    Section 3.4 says you train a CDiT on top of V-JEPA 2.1, predict clean representations instead of noise, and use DDIM.
    Could you share the exact config / architecture changes relative to NWM?

  3. Reproducibility
    If possible, could you release the config / checkpoint / eval code for the navigation experiment?

This part of the paper is very interesting, and having the exact setup would make reproduction much easier.

Thanks again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions