Scene Investigation Agent: LLM-Guided Adaptive Exploration

An agentic framework for autonomous scene understanding. This project leverages Large Language Models (LLMs) to guide an agent through an interactive, simulated environment (ALFWorld), gathering clues to infer the inhabitant's occupation via Bayesian-style belief updates.

Overview

Traditional SLAM focuses on geometry. This project explores semantic exploration: using pre-trained LLMs to reason about an environment, form hypotheses, and dynamically generate commands to gather information.

The agent operates in the ALFWorld benchmark (AI2-THOR based), exploring a kitchen environment to identify one of four possible inhabitant profiles (Professor, Assassin, Student, Billionaire). It utilizes DSPy for structured LM interactions, employing Chain-of-Thought (CoT) reasoning to drive exploration and maintain a probabilistic belief state.

Key Features

LLM-Driven Navigation: No hardcoded heuristics. The agent generates admissible ALFWorld commands (go to, examine, open) based on context and current beliefs.
Structured Belief Updates: Maintains and updates a probability distribution over possible occupations after every observation.
Adaptive Exploration: The agent decides what to examine next based on what it has already learned, aiming to maximize information gain.
Custom Clue Injection: Intercepts simulation calls to inject occupation-specific object descriptions (generated via separate LLMs), testing the agent's ability to ground textual clues in decision-making.

Tech Stack

Frameworks: DSPy (LM programming), ALFWorld (Embodied simulation).
Models Tested: Gemini 2.0 Flash Lite (API), Gemma 3 4B/12B (Local via Ollama).
Environment: Python 3.x, WSL 2 (Ubuntu).

Methodology

The agent loop consists of three primary modules managed via DSPy signatures:

ChooseNextCommand: Analyzes the interaction history and current belief state to select the next optimal action from the environment's valid action space.
Execute: Runs the command in ALFWorld. If the command is examine, the system retrieves context-specific descriptive clues (e.g., "A worn, coffee-stained textbook" for a Student vs. "An pristine, antique quill" for a Professor).
UpdateBeliefs: Re-evaluates the probability distribution of occupations based on the new observation using CoT reasoning.

Termination: The episode ends when confidence in a single occupation exceeds a threshold (0.8) or the step limit is reached.

Setup & Usage

Prerequisites

Python 3.x
Ollama (for local inference with smaller language models)
An ALFWorld-compatible environment (instructions below assume a standard setup).

Installation

Clone the repository:

git clone https://github.com/tancredelg/scene-investigation-agent.git
cd scene-investigation-agent

Install dependencies:
```
pip install -r requirements.txt
```
(Optional) Pull local models if using Ollama:
```
ollama pull gemma3:4b-it-qat
```

Configuration & Running

All experimental parameters are configured in main.py:

Mode: Select between single-episode debugging or batch evaluation.
Model: Switch between local (Ollama) and API (Gemini) backends.
Parameters: Adjust confidence_threshold, context_length, and use_cot.

To run the agent:

python main.py <path_to_alfworld_config.yaml>

Results

Experiments compared local models (Gemma 3 4B) vs. larger API models (Gemini 2.0 Flash Lite) across different context lengths and reasoning modes.

Best Performance: Gemini 2.0 Flash Lite with Chain-of-Thought and full history context achieved an 82.5% success rate across all occupations.
Findings: Full interaction history was crucial for preventing loops. CoT significantly improved the agent's ability to correlate subtle object descriptions with specific occupations.

For detailed analysis, ablation studies, and more, please refer to the Project Report.

File Structure

agents.py: Core DSPy agent logic and signatures.
main.py: Main loop, environment initialization, and experiment runner.
utils.py: Helpers for environment restriction and metadata handling.
object_attributes3.json: The dictionary of occupation-specific clues injected during simulation.
output/: Logs of agent reasoning traces and episode results.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
output OLD		output OLD
output		output
Object Construction Prompt.txt		Object Construction Prompt.txt
Project Description.pdf		Project Description.pdf
Project Report.pdf		Project Report.pdf
README.md		README.md
agents.py		agents.py
main.py		main.py
object_attributes1.json		object_attributes1.json
object_attributes2.json		object_attributes2.json
object_attributes3.json		object_attributes3.json
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene Investigation Agent: LLM-Guided Adaptive Exploration

Overview

Key Features

Tech Stack

Methodology

Setup & Usage

Prerequisites

Installation

Configuration & Running

Results

File Structure

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scene Investigation Agent: LLM-Guided Adaptive Exploration

Overview

Key Features

Tech Stack

Methodology

Setup & Usage

Prerequisites

Installation

Configuration & Running

Results

File Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages