Skip to content

robocurve/robolens

🤖 RoboLens

The Inspect AI for robotics

An open-source evaluation framework for physical AI and VLA (vision-language-action) models.

Define a robotics benchmark once, then run any policy against any compatible embodiment — a real robot or a simulator — with reproducible logs and first-class Rerun visualization.

CI Docs Python License: MIT Typed Coverage

Documentation · Quickstart · Concepts · For LLMs


One framework, two swappable inputs

LLM evaluations have a single swappable input: the model. Robotics evaluations have two — and RoboLens makes both first-class and orthogonal:

🧠 Policy — the VLA The "brain". Maps an observation + instruction to an action chunk (a horizon of actions executed open-loop, as π0 / ACT / diffusion policies do).
🦾 Embodiment — the robot or sim The "body + world". Produces observations, executes actions, owns the action/observation spaces and control rate. Real-robot-first; sims are a stricter special case.

A Task — a dataset of Scenes (initial conditions, instructions, success targets) plus scorers — is defined independently of both. Before any rollout, RoboLens checks the (policy, embodiment) pair is compatible (action/observation spaces, semantics, control rate, scene realizability) and fails fast if not.

Install

pip install robolens            # core (numpy only)
pip install "robolens[rerun]"   # + Rerun visualization

Quickstart

No hardware or simulator needed — the dependency-free CubePick mock world exercises the whole stack:

from robolens import eval
from robolens.mock import CubePickEmbodiment, ScriptedPolicy
from robolens.scene import Scene
from robolens.scorer import success_at_end
from robolens.task import Task

task = Task(
    name="cubepick-reach",
    scenes=[Scene(id=f"layout-{i}", instruction="reach the cube", init_seed=i) for i in range(5)],
    scorer=success_at_end(),
    max_steps=80,
)

# The two swappable inputs: a policy (VLA) and an embodiment (robot/sim).
(log,) = eval(task, ScriptedPolicy(), CubePickEmbodiment())
print(log.status, log.results.metrics)   # success {'success_at_end': 1.0}

…or from the command line (components resolve from a registry):

robolens list                                          # registered components
robolens run --task cubepick-reach --policy scripted --embodiment cubepick
robolens inspect logs/cubepick-reach_*.json            # results table

Why RoboLens

  • 🌍 Real-world first. Interfaces assume real-robot reality — human-in-the-loop reset, no privileged success oracle, wall-clock control rate. Simulators just offer more (seeding, privileged success, rendering) via opt-in capabilities.
  • 🔁 Reproducible. Every run yields an immutable, schema-versioned EvalLog with the resolved config, git revision, and package versions — re-readable across releases, and re-scorable offline.
  • 🪶 Light core. Depends only on NumPy. Rerun and simulator/VLA backends are optional extras and separately installable plugins.
  • 🛑 Safe unattended. An explicit error taxonomy separates "record and continue" from "halt and require a human", so a faulted robot never auto-advances overnight.
  • 🎞️ Rerun visualization. Stream camera images, 3D poses, joint/action time-series, and success markers to a .rrd recording.
  • 🧩 Pluggable. Ship robolens-maniskill or robolens-openvla as separate packages — entry points make them appear in robolens list automatically.
  • ⚙️ VLA-native. Action chunking, open-loop execution, and ACT/ALOHA temporal ensembling are built in, with action semantics (control mode, rotation representation, gripper, frame) that make compatibility and ensembling correct.

How it maps to Inspect AI

If you know Inspect AI, you already know RoboLens.

Inspect AI RoboLens
Model Policy (VLA) + Embodiment (two inputs)
Task = dataset + solver + scorer Task = scenes + controller + scorer
Sample Scene
Solver chain Controller middleware (chunking, ensembling, smoothing)
eval()EvalLog eval()EvalLog
@task / @solver / @scorer + registry @task / @policy / @embodiment / @scorer + entry points

This repository is the framework (the "Inspect AI for robotics"). Concrete benchmarks (the "Inspect Evals for robotics") and backend adapters live in separate plugin packages.

Documentation

Full guides and an auto-generated API reference live at robocurve.github.io/robolens. LLM-friendly versions: llms.txt and llms-full.txt.

Development

uv venv && uv pip install -e ".[dev]"
uv run pre-commit install          # ruff + mypy on commit, 100% coverage on push
uv run pytest --cov                 # 100% coverage required
uv run ruff check . && uv run mypy

Pre-commit hooks and a blocking CI coverage gate keep main green. See CONTRIBUTING.md and the design docs in plans/.

License

MIT

About

The Inspect AI for robotics — an open-source evaluation framework for physical AI and VLA (vision-language-action) models. Run any policy on any embodiment (real robot or sim), with reproducible logs and Rerun visualization.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages