🤖 ApBot: Robot Operation of Home Appliances by Reading User Manuals

This repository accompanies the paper "ApBot: Robot Operation of Home Appliances by Reading User Manuals", and contains open-sourced code, datasets, simulation tools, and baseline experiments to support research on robotic operation of household appliances.

ApBot enables robots to operate previously unseen appliances by “reading” their user manuals, grounding symbolic actions to real-world control elements, and executing policies robustly and interactively with textual or visual feedback.

📦 What's in this Repository?

The repository is organized into three main parts:

code/ – Core implementation for foundation models, symbolic reasoning, visual grounding, simulation, and real-world execution. It consists of:
- foundation_models/ – GPT, OWL, and SAM2 integration
- simulated/ – Scripts for parsing user manuals, grounding, generating test cases, and running experiments. Includes:
  - paper_exp/ – Scripts and outputs for reproducing experiments and baselines reported in the paper
- real_world/ – Scripts for running real-world experiments
dataset/ – Structured datasets for training and evaluation. It consists of:
- simulated/ – Includes six simulated appliances (e.g., microwave, washer), each with user manuals, control panel images, and symbolic simulators
- real_world/ – Includes real-world recordings of five appliances, structured in the same format as the simulated data
README – Documentation and usage guide

🧠 Key Features

Structured Appliance Modeling: Automatically extract and build symbolic models (variables, features, actions) from unstructured manuals.
Vision-Language Grounding: Ground control instructions to appliance control panels using large VLMs (e.g., OWL-V2).
Closed-loop Execution: Monitor execution visually and update world models based on feedback.
Simulated + Real-world Evaluation: Test and benchmark baseline methods in both controlled and real scenarios.

💻 Code

This repository contains structured code, foundational models, and scripts for appliance control research. It is organized into foundational models, real-world, and simulated directories.

📁 code/

The heart of the project, containing the logic for visual grounding, reasoning, and simulation.

📁 foundation_models/

Contains foundational models such as GPT, OWL, and SAM2.

📄 gpt_4o_model.py – Used to call GPT
📄 owlv2_crane5_api.py – Starts a microservice to recognize appliance panel buttons and dials
📄 owlv2_crane5_query.py – Once the API is running, functions in this script can be used to call OWLv2

📁 simulated/

Simulation-related scripts for processing user manuals, grounding control panel elements, and generating test cases.

📄 init.py
📄 _0_t2a_config.py – Specifies the root code path
📄 _1_pdf_to_text.py – Converts PDF user manuals to text
📄 _2_extract_control_panel_element_names_from_user_manual.py – Extracts button/dial name lists from manuals
📄 _3_detect_bbox_from_photos.py – Detects bounding boxes from appliance panel images
📄 _4_map_control_panel_element_names_to_bbox_indexes.py – Maps button/dial names to possible bounding box indices
📄 _5_json_map_control_panel_element_names_to_bbox_indexes.py – Formats the previous mapping results into JSON
📄 _6_remove_duplicate_bboxes.py – Ensures one-to-one mapping between buttons/dials and bounding box indices
📄 _7_json_map_control_panel_element_names_to_bbox_indexes.py – (Appears redundant; formats mapping results into JSON again)
📄 _8_visualise_grounding_control_element_name_result.py – Generates visualization images showing all buttons/dials as labeled bounding boxes
📄 _10_propose_action_names.py – Proposes actions based on user manuals
📄 _11_generated_grounded_action_json.py – Maps proposed actions to bounding box coordinates
📄 _12_match_proposed_action_to_oracle_action.py – Checks if proposed actions match oracle execution regions using bounding box coordinates
📄 _13_visualisation_grounding_action_result.py – Generates visualization images displaying grounded actions as labeled bounding boxes
📄 _18_generate_testcases.py – (Deprecated) Generates ambiguous instructions that can be satisfied by one or multiple policies
📄 _18_generate_testcases_v2.py – Generates instructions requiring different step sizes to achieve goals
📄 _19_generate_target_state.py – For each test instruction, generates the target state in the appliance simulator

📁 samples/

Used as example appliance models.

📄 _0_logic_units.py – Appliance model templates, combines the content of variables, features and actions
📄 _0_sample_machine.py – Example appliance models
📄 _0_search.py – Macro action logics
📄 _1_variables.py – Appliance variable templates
📄 _2_features.py – Appliance feature templates, used to specify the macro actions
📄 _3_actions.py – Appliance dynamics templates, used to specify action effects

📁 prompts/

Stores all the prompts used.

📄 various prompt text file: – Is used for the python files in the simulated/ folder
📄 paper_exp/baselines_v1/: – Is used for various baselines in the paper. _4_HV_M_SR_MA_agnostic refers to the ApBot algorithm.
📄 _17_testcase/: – The prompt used to generate instructions that are ambiguous or specific
📄 _17_testcase_v2/: – The prompt used to generate instructions that require different numbers of steps to finish

📁 paper_exp/

📁 baselines_v1/
- 📄 code for running paper baselines. ApBot has the file name _4_M_SR_MA_agnostic.
📁 real_world/
- 📄 code for running real world experiments

📁 utils/

Some helper functions to load files.
📁 task/
- 📄 calibrate_current_value.py – During close loop execution, if the display is an unexpected value, calibrates the appliance model according to predefined routines and templates
- 📄 check_reasoning_file_existance.py – Checks file existence
- 📄 check_state_reached.py – Checks if the appliance state has reached those required by an instruction
- 📄 compare_visual_grounding.py – Compares the results between ApBot and Molmo visual grounding performance
- 📄 derive_variable_definition.py – Helper function to help update the variable in the appliance model when there is a feedback mismatch
- 📄 generate_report.py – After baseline results are out, use it to generate instruction execution results
- 📄 generate_updated_goal.py – Used to update the ApBot appliance model target state
- 📄 interact_with_simulator.py – Used for ApBot to interact with appliance simulators
- 📄 mathces_run_action_format.py – Used to check command formats used by ApBot to interact with appliance simulators.
- 📄 prepare_simulator.py – Used to load appliance simulators
- 📄 propose_actions.py – During close-loop execution, ApBot proposes a suitable action given the current appliance state
- 📄 propose_feature_list.py – Proposes macro actions of appliance models
- 📄 propose_goal_state.py – Proposes the appliance model's target state
- 📄 propose_variables.py – Proposes appliance model variables
- 📄 propose_world_model_agnostic_to_command.py – Proposes the appliance model's action dynamics

📁 real_world/

Scripts for real-world execution. Code structure is the combined content of the dataset folder and the simulated code folder.

💾 Appliance Simulation Dataset

This repository contains structured datasets, simulators, and baseline experiment results for appliance control research. It is organized into simulated and real-world datasets.

1. Simulated

The simulated dataset includes data for the following appliances, with 5 instances each:

Dehumidifier
Bottle Washer
Rice Cooker
Microwave Oven
Bread Maker
Washing Machine

📁 simulated/

The root directory for all simulated appliance data.

📁 0_websites/

Appliance panel images and user manuals (source websites).

📁 1_user_manual/

Contains different forms of user manuals (PDF, image, text, and parsed elements).

📁 0_pdf/ – Raw PDF manuals, manually added
📁 1_image/ – Image versions of manuals, generated via code
📁 2_text/ – Extracted text from PDFs, generated via code
📁 3_extracted_control_panel_element_names/ – Control panel element names extracted from manuals, generated via code

📁 2_control_panel_images/

Control panel images and grounded elements.

📁 0_raw/ – Images copied from websites, manually added
📁 1_selected/ – Final selected single image, manually added
📁 2_ground_control_panel_elements/
- 📁 1_validity_control_panel/ – One bounding box per image circling a button or dial
- 📁 2_bboxes_on_control_panel/ – JSON files of button/dial bounding boxes
- 📁 3_bboxes_on_control_panel_visualisation/ – Visual summary of all bounding boxes
- 📁 4_query_images_bbox_to_name/ – Red box for query, green boxes as references
- 📁 5_query_images_unique_bbox_id/ – Indexed candidate boxes per button/dial

📁 3_simulators/

Contains simulators and their associated assets.

📁 0_control_panel_images_groundtruth_annotation/ – COCO annotations (manual)
📁 1_oracle_available_actions/ – Oracle action lists (manual)
📁 2_map_oracle_action_to_annotation_label/ – Dict: action names → labels + types (manual)
📁 3_oracle_simulator_action_to_bbox_mapping/ – Auto: action names → bbox coords
📁 4_simulators_with_states_and_display/ – Sim: action → text display (manual)
📁 5_testcases/
- 📁 1_testcases_var_raw/ – Generated task instructions
- 📁 2_testcases_var/ – Selected instructions (manual)
- 📁 3_testcases_var_with_target_states/ – Target states (generated)
- 📁 4_testcases_var_formatted/ – Generated + manually corrected

📁 4_visual_grounding/

Visual grounding data for control panel elements and action names.

📁 0_control_panel_element_bbox/
- 📁 0_control_panel_element_index/ – Bbox index → element names
- 📁 1_control_panel_element_index_json/ – Element names → bbox indices
- 📁 2_control_panel_element_index_json_unique/ – Unique name → bbox index
- 📁 3_proposed_control_panel_element_bbox/ – Element name → coordinates
- 📁 4_visualised_proposed_control_panel_element_bbox/ – Indexed bbox visualization
- 📁 5_visualised_localised_buttons_no_label/ – Center-indexed bbox visualizations
- 📁 6_visualised_localised_buttons_legends/ – Bbox index → coordinates
- 📁 7_proposed_buttons_to_oracle_action_mapping/ – Proposed bbox → oracle actions
- 📁 9_extracted_control_panel_labels/ – Element names from manual
📁 1_action_names/
- 📁 1_proposed_action_names/ – Proposed actions
- 📁 2_proposed_world_model_action_bbox/ – Action name → bbox
- 📁 3_proposed_to_oracle_action_mapping/ – Action name → oracle mapping (ApBot)
- 📁 4_molmo_proposed_action_coords/ – Action name → image coords
- 📁 5_molmo_proposed_actions_visualisation/ – Visualized action positions
- 📁 6_molmo_proposed_to_oracle_action_mapping/ – Action name → oracle mapping (Molmo)

📁 6_results/

📁 1_visual_grounding_action_results/ – Metrics for action visual grounding
📁 3_visualize_proposed_actions/ – Visual results of grounded actions
📁 6_paper_exp/
- Baseline experiment results
- 📁 11_visualisation/ – Baseline output visualizations

2. Real World

The real-world dataset includes data for the following appliances, with 1 instance each:

Washing Machine
Rice Cooker
Blender
Water Dispenser
Induction Cooker

The internal folder structure is identical to the simulated dataset.

🧪 Baseline Experiments

The baseline experiments evaluate different combinations of perception, appliance model, reasoning, and policy.

ID	Name	Visual Grounding (Perception)	Have Access To User Manual	Reasoning	Action Policy	Name in Paper
1	NV_M_UR_LP	Oracle	Yes	Unstructured	LLM	LLM as policy w/ grounded actions
2	HV_M_UR_LP	Proposed	Yes	Unstructured	LLM	LLM as policy w/ image
4	M_SR_MA	Proposed	Yes	Structured	Macro-actions	ApBot
5	HV_M_SR_LP	Proposed	Yes	Structured	LLM	ApBot w/o button policy
7	HV_M_UR_MA	Proposed	Yes	Unstructured	Macro-actions	ApBot w/o model
8	HV_M_SR_MA_OL	Proposed	Yes	Structured	Macro-actions	ApBot w/o close-loop update
9	oracle_V_oracle_M	Oracle	Yes	Structured & Oracle	Macro-actions	N.A
10	oracle_V_proposed_M	Oracle	Yes	Structured	Macro-actions	N.A

📊 Results

Visual Grounding Metrics: Stored in dataset/simulated/6_results/1_visual_grounding_action_results/
Action Visualizations: Stored in dataset/simulated/6_results/3_visualize_proposed_actions/
Baseline Experiment Outputs: Stored in dataset/simulated/6_results/6_paper_exp/
Comparative Visualisations: Stored in dataset/simulated/6_results/6_paper_exp/11_visualisation/

🚀 Usage

To understand and run the full ApBot pipeline, start from the code/simulated/ directory. The pipeline includes reading PDFs, visual grounding, building symbolic models, and testing the model. Below is the step-by-step process:

1. 📘 Read and Parse User Manuals

Convert appliance manuals from PDF to structured elements.

Step	Description	Script
1.1	Convert PDF manuals into raw text	`_1_pdf_to_text.py`
1.2	Extract control panel element names from text	`_2_extract_control_panel_element_names_from_user_manual.py`

2. 🖼️ Perform Visual Grounding

Use control panel images and align them with textual instructions.

Step	Description	Script
2.1	Detect bounding boxes from panel images	`_3_detect_bbox_from_photos.py`
2.2	Map detected boxes to control element names	`_4_map_control_panel_element_names_to_bbox_indexes.py`
2.3	Format mappings into structured JSON	`_5_json_map_control_panel_element_names_to_bbox_indexes.py`, `_7_json_map_control_panel_element_names_to_bbox.py`
2.4	Remove duplicate or conflicting mappings	`_6_remove_duplicate_bboxes.py`
2.5	Visualize grounding results	`_8_visualise_grounding_control_element_name_result.py`

3. 🧠 Build the Appliance World Model

Generate appliance variables, features, and macro-actions.

Step	Description	Script
3.1	Propose actions from manuals	`_10_propose_action_names.py`
3.2	Map actions to bounding boxes	`_11_generated_grounded_action_json.py`
3.3	Match proposed actions to oracle ground-truth	`_12_match_proposed_action_to_oracle_action.py`
3.4	Visualize grounded actions	`_13_visualisation_grounding_action_result.py`

4. 🧪 Test Using Simulated Instructions

Generate test instructions and target appliance states.

Step	Description	Script
4.1	Generate ambiguous or multistep instructions	`_18_generate_testcases_v2.py`
4.2	Generate target states for each test	`_19_generate_target_state.py`

5. 🧹 Run ApBot (Symbolic Reasoning + Policy Execution)

Run the main ApBot experiment using the generated appliance model and test instructions.

Step	Description	Script
5.1	Main file for symbolic reasoning and macro-action execution (ApBot pipeline)	`paper_exp/baselines_v1/_4_M_SR_MA_agnostic.py`

🛠️ Optional: World Model Construction and Reasoning Helpers

Located in code/utils/task/, these helper scripts support reasoning, calibration, and simulator interaction during closed-loop execution. For example:

interact_with_simulator.py – ApBot interacting with simulator
generate_updated_goal.py – Update goals dynamically based on feedback
propose_goal_state.py – Generate target states
check_state_reached.py – Check if state matches the goal
calibrate_current_value.py – Adjust model if observed feedback is off

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 ApBot: Robot Operation of Home Appliances by Reading User Manuals

📦 What's in this Repository?

🧠 Key Features

💻 Code

💾 Appliance Simulation Dataset

1. Simulated

2. Real World

🧪 Baseline Experiments

📊 Results

🚀 Usage

1. 📘 Read and Parse User Manuals

2. 🖼️ Perform Visual Grounding

3. 🧠 Build the Appliance World Model

4. 🧪 Test Using Simulated Instructions

5. 🧹 Run ApBot (Symbolic Reasoning + Policy Execution)

🛠️ Optional: World Model Construction and Reasoning Helpers

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🤖 ApBot: Robot Operation of Home Appliances by Reading User Manuals

📦 What's in this Repository?

🧠 Key Features

💻 Code

💾 Appliance Simulation Dataset

1. Simulated

2. Real World

🧪 Baseline Experiments

📊 Results

🚀 Usage

1. 📘 Read and Parse User Manuals

2. 🖼️ Perform Visual Grounding

3. 🧠 Build the Appliance World Model

4. 🧪 Test Using Simulated Instructions

5. 🧹 Run ApBot (Symbolic Reasoning + Policy Execution)

🛠️ Optional: World Model Construction and Reasoning Helpers