Hi, thanks for open-sourcing this project — the benchmark design and task taxonomy are genuinely well organized, and the released resources are very helpful for the community. I really appreciate the effort your team put into building and maintaining this benchmark.
I noticed a possible labeling issue in the L4 / Goal-DrivenExecution tasks: it seems that the RoboticArm and AgenticNavigation sub-task labels may have been swapped. The current mappings appear inconsistent with the corresponding task contents and descriptions. It might be worth double-checking the annotations for these two categories.
spatree2 label |
session_id |
metricfunc |
question text |
actual task |
AgenticNavigation |
5618–5867 (250) |
manipulateeval |
"…translation and rotation for the robot arm's end-effector… decompose the end-effector movement into 7 steps" |
robotic arm |
RoboticArm |
5868–6117 (250) |
agenticnaveval |
"Task: Visual Navigation Action Sequence Generation … expert visual navigation agent … navigate a robot …" |
navigation |
Thanks again for releasing such a valuable benchmark and for your contributions to the community.
Hi, thanks for open-sourcing this project — the benchmark design and task taxonomy are genuinely well organized, and the released resources are very helpful for the community. I really appreciate the effort your team put into building and maintaining this benchmark.
I noticed a possible labeling issue in the L4 / Goal-DrivenExecution tasks: it seems that the
RoboticArmandAgenticNavigationsub-task labels may have been swapped. The current mappings appear inconsistent with the corresponding task contents and descriptions. It might be worth double-checking the annotations for these two categories.spatree2labelmetricfuncAgenticNavigationmanipulateevalRoboticArmagenticnavevalThanks again for releasing such a valuable benchmark and for your contributions to the community.