Hi V-JEPA 2 team,
Posting this as a research-collaboration ask, not a feature request. I work on URML, an open spec for substrate-neutral robot intent (Apache 2.0, urml.dev). Two integration vectors with V-JEPA 2 look interesting from URML's side, and I wanted to surface the shape before writing code.
Vector A: URML primitives as V-JEPA 2-AC action conditioning input. Each URML primitive maps to one or more action-conditioning tokens, and the model's prediction proceeds normally over those tokens. If URML annotation lands on Droid (the OXE annotation proposal, RFC-0046, is the upstream path), V-JEPA 2-AC can fine-tune on URML-annotated trajectories and emit URML primitive sequences as its action representation.
Vector B: V-JEPA 2 predictions as URML's predictive-safety lane. Before URML's validator accepts a candidate program, the model predicts the end-state video embedding, and URML's safety envelope checks the prediction. The pre-execution simulation runs against a learned model of the world rather than an analytical one. No other URML target offers this shape, which is why V-JEPA 2 is the one in URML's outreach landscape I'm most curious to discuss.
Full write-up with the proposed encoding, mapping, drawbacks, and alternatives:
https://github.com/URML-MARS/URML/blob/main/docs/rfcs/0052-meta-fair-vjepa2.md
Things I'd want input on before building:
- What is the right token-level encoding for URML primitives in V-JEPA 2-AC's action conditioning? The Droid action representation is straightforward; the URML-primitive boundaries are the open question.
- The predictive-safety lane (Vector B) is novel. Comfortable with framing it as a pre-execution gate, or would FAIR prefer URML keep it at evaluation-only (visualize predictions, don't gate execution on them)?
- Is URML annotation on Droid trajectories, matching the OXE sidecar shape, acceptable from FAIR's side?
- Bridge home: standalone urml-vjepa2-bridge on PyPI, or a contributed example in facebookresearch/vjepa2?
- Anything coming up on the FAIR side (workshop, benchmark, paper) where a URML conformance lane would slot in usefully?
No rush. The world-model angle is the most distinctive thing across this outreach program and I'd rather get the shape right than ship a wrapper you'd want rebuilt.
Ido
greenvh@gmail.com
Hi V-JEPA 2 team,
Posting this as a research-collaboration ask, not a feature request. I work on URML, an open spec for substrate-neutral robot intent (Apache 2.0, urml.dev). Two integration vectors with V-JEPA 2 look interesting from URML's side, and I wanted to surface the shape before writing code.
Vector A: URML primitives as V-JEPA 2-AC action conditioning input. Each URML primitive maps to one or more action-conditioning tokens, and the model's prediction proceeds normally over those tokens. If URML annotation lands on Droid (the OXE annotation proposal, RFC-0046, is the upstream path), V-JEPA 2-AC can fine-tune on URML-annotated trajectories and emit URML primitive sequences as its action representation.
Vector B: V-JEPA 2 predictions as URML's predictive-safety lane. Before URML's validator accepts a candidate program, the model predicts the end-state video embedding, and URML's safety envelope checks the prediction. The pre-execution simulation runs against a learned model of the world rather than an analytical one. No other URML target offers this shape, which is why V-JEPA 2 is the one in URML's outreach landscape I'm most curious to discuss.
Full write-up with the proposed encoding, mapping, drawbacks, and alternatives:
https://github.com/URML-MARS/URML/blob/main/docs/rfcs/0052-meta-fair-vjepa2.md
Things I'd want input on before building:
No rush. The world-model angle is the most distinctive thing across this outreach program and I'd rather get the shape right than ship a wrapper you'd want rebuilt.
Ido
greenvh@gmail.com