Thanks for the release. I'm running action-conditioned inference with Cosmos3-Nano via vLLM-Omni. The model card gives per-embodiment dims and says extra_params["action"] takes a normalized (T, D) array, but only agibotworld (FD) and av (ID) examples ship, and the exact channel layout + normalizers aren't documented. Could you share, ideally as a per-embodiment table:
- Egocentric (57D): exact ordering — block order (ego 9D / left vs right hand / wrist 9D + grasp 15D), translation-vs-rotation order in each 9D pose, the 6D-rotation layout, the 5-fingertip order and which of the 21 keypoints they are, coordinate convention, and the
domain_name string.
- Robots (Franka 10D, dual Franka 20D, AgiBot 29D, UR/Google/WidowX 10D, UMI 9D): exact channel order (EE translation/rotation/gripper) — pose-delta or joint-angle based?
- Normalizers: where the per-dimension stats live and how to apply them.
A pointer to the canonical file in cosmos-framework would be perfect. Thanks!
Thanks for the release. I'm running action-conditioned inference with Cosmos3-Nano via vLLM-Omni. The model card gives per-embodiment dims and says
extra_params["action"]takes a normalized(T, D)array, but onlyagibotworld(FD) andav(ID) examples ship, and the exact channel layout + normalizers aren't documented. Could you share, ideally as a per-embodiment table:domain_namestring.A pointer to the canonical file in cosmos-framework would be perfect. Thanks!