🤖 RoboLens

The Inspect AI for robotics

An open-source evaluation framework for physical AI and VLA (vision-language-action) models.

Define a robotics benchmark once, then run any policy against any compatible embodiment — a real robot or a simulator — with reproducible logs and first-class Rerun visualization.

Documentation · Quickstart · Concepts · For LLMs

One framework, two swappable inputs

LLM evaluations have a single swappable input: the model. Robotics evaluations have two — and RoboLens makes both first-class and orthogonal:


🧠 `Policy` — the VLA	The "brain". Maps an observation + instruction to an action chunk (a horizon of actions executed open-loop, as π0 / ACT / diffusion policies do).
🦾 `Embodiment` — the robot or sim	The "body + world". Produces observations, executes actions, owns the action/observation spaces and control rate. Real-robot-first; sims are a stricter special case.

A Task — a dataset of Scenes (initial conditions, instructions, success targets) plus scorers — is defined independently of both. Before any rollout, RoboLens checks the (policy, embodiment) pair is compatible (action/observation spaces, semantics, control rate, scene realizability) and fails fast if not.

Install

pip install robolens            # core (numpy only)
pip install "robolens[rerun]"   # + Rerun visualization

Quickstart

No hardware or simulator needed — the dependency-free CubePick mock world exercises the whole stack:

from robolens import eval
from robolens.mock import CubePickEmbodiment, ScriptedPolicy
from robolens.scene import Scene
from robolens.scorer import success_at_end
from robolens.task import Task

task = Task(
    name="cubepick-reach",
    scenes=[Scene(id=f"layout-{i}", instruction="reach the cube", init_seed=i) for i in range(5)],
    scorer=success_at_end(),
    max_steps=80,
)

# The two swappable inputs: a policy (VLA) and an embodiment (robot/sim).
(log,) = eval(task, ScriptedPolicy(), CubePickEmbodiment())
print(log.status, log.results.metrics)   # success {'success_at_end': 1.0}

…or from the command line (components resolve from a registry):

robolens list                                          # registered components
robolens run --task cubepick-reach --policy scripted --embodiment cubepick
robolens inspect logs/cubepick-reach_*.json            # results table

Why RoboLens

🌍 Real-world first. Interfaces assume real-robot reality — human-in-the-loop reset, no privileged success oracle, wall-clock control rate. Simulators just offer more (seeding, privileged success, rendering) via opt-in capabilities.
🔁 Reproducible. Every run yields an immutable, schema-versioned EvalLog with the resolved config, git revision, and package versions — re-readable across releases, and re-scorable offline.
🪶 Light core. Depends only on NumPy. Rerun and simulator/VLA backends are optional extras and separately installable plugins.
🛑 Safe unattended. An explicit error taxonomy separates "record and continue" from "halt and require a human", so a faulted robot never auto-advances overnight.
🎞️ Rerun visualization. Stream camera images, 3D poses, joint/action time-series, and success markers to a .rrd recording.
🧩 Pluggable. Ship robolens-maniskill or robolens-openvla as separate packages — entry points make them appear in robolens list automatically.
⚙️ VLA-native. Action chunking, open-loop execution, and ACT/ALOHA temporal ensembling are built in, with action semantics (control mode, rotation representation, gripper, frame) that make compatibility and ensembling correct.

How it maps to Inspect AI

If you know Inspect AI, you already know RoboLens.

Inspect AI	RoboLens
`Model`	`Policy` (VLA) + `Embodiment` (two inputs)
`Task = dataset + solver + scorer`	`Task = scenes + controller + scorer`
`Sample`	`Scene`
`Solver` chain	`Controller` middleware (chunking, ensembling, smoothing)
`eval()` → `EvalLog`	`eval()` → `EvalLog`
`@task` / `@solver` / `@scorer` + registry	`@task` / `@policy` / `@embodiment` / `@scorer` + entry points

This repository is the framework (the "Inspect AI for robotics"). Concrete benchmarks (the "Inspect Evals for robotics") and backend adapters live in separate plugin packages.

Documentation

Full guides and an auto-generated API reference live at robocurve.github.io/robolens. LLM-friendly versions: llms.txt and llms-full.txt.

Development

uv venv && uv pip install -e ".[dev]"
uv run pre-commit install          # ruff + mypy on commit, 100% coverage on push
uv run pytest --cov                 # 100% coverage required
uv run ruff check . && uv run mypy

Pre-commit hooks and a blocking CI coverage gate keep main green. See CONTRIBUTING.md and the design docs in plans/.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
docs		docs
examples		examples
plans		plans
src/robolens		src/robolens
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 RoboLens

The Inspect AI for robotics

One framework, two swappable inputs

Install

Quickstart

Why RoboLens

How it maps to Inspect AI

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🤖 RoboLens

The Inspect AI for robotics

One framework, two swappable inputs

Install

Quickstart

Why RoboLens

How it maps to Inspect AI

Documentation

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages