- 🆕 03/2026: Code released.
- ⭐ 03/2026: Benchmark / dataset released.
TIMID is a framework designed to identify and localize temporal mistakes in robotic tasks. Unlike standard action recognition, TIMID focuses on the timing and sequencing of robot executions, detecting when a robot deviates from a "correct" execution path in video streams.
Key Features:
-
Time-Sensitive Analysis: Detects mistakes that are only evident when considering the duration and order of actions.
-
Benchmark Suite: Includes a comprehensive set of "correct" vs "failed" robot execution videos for evaluation.
The code is tested on Ubuntu 22.04 with Python 3.10 and CUDA 12.1.
# Clone the repository
git clone https://github.com/ropertunizar/TIMID.git
cd TIMID
# Create a virtual environment
python -m venv timid_env
source timid_env/bin/activate
# Install dependencies
pip install -r requirements.txt- Data preparation Data and pretrained models are alloceted in Huggingface. You can download using the command line:
hf download nereagallego/TIMID-data --repo-type=dataset --local-dir .
To use the Bridge dataset, please download the first 1,000 episodes. This repository provides the necessary annotations.
- Inference To run a pre-trained model on one of the datasets:
python main.py --mode infer --model_mode 1 --ckpt_path ckpt/mutex/mutex__7683.pkl --dataset mutex #dataset:[mutex, ordering, bridge, mutex_real, ordering_real] mode:[train, infer] model_mode[1, 2, 3, 4]
- Training To train the model on the benchmark:
python main.py --mode train --model_mode 1 --dataset mutex #dataset:[mutex, ordering, bridge, mutex_real, ordering_real] mode:[train, infer] model_mode[1, 2, 3, 4]
Training/inference Mode 2, 3 and 4 correspond to the "Semantic Only", "Temporal Only" and "PEL4VAD" configurations in the ablation study and baseline comparison, respectively.
| Dataset | AP | AR | F1 | ckpt |
|---|---|---|---|---|
| Bridge | 49.72 | 33.77 | 40.22 | link |
| Mutex | 76.83 | 35.89 | 40.1 | link |
| Ordering | 48.71 | 36.89 | 33.45 | link |
| Mutex Real | 72.01 | 23.64 | 23.91 | link |
| Ordering Real | 19.87 | 12.12 | 7.92 | link |
Bridge Prediction (Green is ours) |
Proximity Real Videos Prediction (Green is ours) |
Ordering Real Videos Prediction (Green is ours) |
This work is under AGPL-3.0 license.
@inproceedings{gallego2026timid,
title={TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions},
author={Gallego, Nerea and Salanova, Fernando and Mannarano, Claudio and Mahulea, Cristian and Montijano, Eduardo},
year={2026}
}This work was partially supported by grants AIA2025-163563-C31, PID2024-159284NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF, the Office of Naval Research Global grant N62909-24-1-2081 and DGA project T45_23R, the work was also supperted by a 2024 DGA scholarship.


