Skip to content

Action channel layout + normalizers for egocentric (57D) and robot embodiments #184

Description

@TQTQliu

Thanks for the release. I'm running action-conditioned inference with Cosmos3-Nano via vLLM-Omni. The model card gives per-embodiment dims and says extra_params["action"] takes a normalized (T, D) array, but only agibotworld (FD) and av (ID) examples ship, and the exact channel layout + normalizers aren't documented. Could you share, ideally as a per-embodiment table:

  1. Egocentric (57D): exact ordering — block order (ego 9D / left vs right hand / wrist 9D + grasp 15D), translation-vs-rotation order in each 9D pose, the 6D-rotation layout, the 5-fingertip order and which of the 21 keypoints they are, coordinate convention, and the domain_name string.
  2. Robots (Franka 10D, dual Franka 20D, AgiBot 29D, UR/Google/WidowX 10D, UMI 9D): exact channel order (EE translation/rotation/gripper) — pose-delta or joint-angle based?
  3. Normalizers: where the per-dimension stats live and how to apply them.

A pointer to the canonical file in cosmos-framework would be perfect. Thanks!

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions