Skip to content

DINOv2 or v3 - conceptual question #158

@bumi001

Description

@bumi001

Hi, since DINOv2 and v3 use the iBOT loss and and learn the embeddings of masked patches using surrounding patches, can we consider them also as Joint Embedding Predictive Architectures? Maybe we can think of the linear head used in the iBOT loss as their predictor in which case they use the predictor in both the student and the teacher. They use two views as opposed to I-JEPA and V-JEPA using the same image or video segment for both the student and the teacher.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions