Action channel layout + normalizers for egocentric (57D) and robot embodiments

Thanks for the release. I'm running action-conditioned inference with Cosmos3-Nano via vLLM-Omni. The model card gives per-embodiment dims and says `extra_params["action"]` takes a normalized `(T, D)` array, but only `agibotworld` (FD) and `av` (ID) examples ship, and the exact channel layout + normalizers aren't documented. Could you share, ideally as a per-embodiment table:

1. **Egocentric (57D):** exact ordering — block order (ego 9D / left vs right hand / wrist 9D + grasp 15D), translation-vs-rotation order in each 9D pose, the 6D-rotation layout, the 5-fingertip order and which of the 21 keypoints they are, coordinate convention, and the `domain_name` string.
2. **Robots (Franka 10D, dual Franka 20D, AgiBot 29D, UR/Google/WidowX 10D, UMI 9D):** exact channel order (EE translation/rotation/gripper) — pose-delta or joint-angle based?
3. **Normalizers:** where the per-dimension stats live and how to apply them.

A pointer to the canonical file in cosmos-framework would be perfect. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Action channel layout + normalizers for egocentric (57D) and robot embodiments #184

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Action channel layout + normalizers for egocentric (57D) and robot embodiments #184

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions