Open-source VendingBench Recreation

A open-source recreation of the VendingBench environment for testing AI agent long-term coherence through autonomous vending machine management.

Project Overview

This project recreates the benchmark environment described in the paper "Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents". The benchmark tests AI agents' ability to maintain coherent decision-making over extended periods (20M+ tokens) by managing a simulated vending machine business. I did not come up with this concept all credit to the authors of that paper.

THIS IS STILL IN PROGRESS

Implemented Features

Core Simulation Loop - Time progression, daily cycles, weather simulation
AI Agent System - LLM-based agent with conversation history and context management
Email System - Full inbox/outbox with recipient profiles and AI-generated supplier responses
Search Integration - Perplexity API for real-world supplier and product research
Economic Environment - Customer behavior modeling with price elasticity
Vending Machine - Physical machine simulation with inventory slots
Database Logging - State tracking and analytics
Inventory Management - Orders, storage, vending machine storage

To finish

Lots of the tools & sub-agent - Scratchpad, key-value store, vector database, there are many tools not ready yet you can find in the paper
Stocking the vending machine - Not currently stocking & selling items
Analytics harness - Measuring the outcomes of the simulation

Ideas for future

Plug-and-play Make a frictionless way to pick a model/api & test it even if its new
Extend out functionality Add tooling for the agent, possibly allow it to code its own tooling
Posttraining tests Doing posttraining optimizing for net worth/sales to test for misaligned behaviors

How to run

Prerequisites

Python 3.8+
API Keys for:
- Anthropic Claude (for the AI agent)
- Perplexity (for web search capabilities)
- Eventually more models, and you can just add the keys for the models you want to test

Installation & Setup

Clone the repository

git clone <repository-url>
cd vending-bench-recreation

Set up environment

# Use the provided script for easy setup
./run_simulation.sh

Or manually:

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up API keys in .env file
cp .env.example .env
# Edit .env with your API keys

Configure API Keys

Create a .env file with:

ANTHROPIC_API_KEY=your_claude_api_key_here
PERPLEXITY_API_KEY=your_perplexity_api_key_here

Run the simulation
```
./run_simulation
```
Change max_messages to modify

How It Works

Look at the paper for a more detailed breakdown of the agent behavior

Agent Initialization - AI agent starts with $500 and basic business knowledge
Daily Cycles - Each day at 6:00 AM, agent receives:
- Weather updates
- Sales reports
- New supplier emails
- Business status summary
Agent Actions - Agent can:
- Send emails to suppliers for product orders
- Search for supplier information and pricing
- Read incoming supplier responses
- Advance time to next day
Simulation - Sim automatically handles:
- Customer purchasing behavior (weather/season dependent)
- Supplier email responses (AI-generated with real market data)
- Financial accounting and daily fees

🔧 Development

Key Configuration

Simulation duration:
Agent model: Change model_type in agent configuration
Starting conditions: Adjust STARTING_BALANCE and DAILY_FEE
Business address: Update delivery address in agent.py system prompt

Monitoring & Analytics

Right now store_state flag is False but if set to true in main_simulation.py it will log all activities to vending_simulation.db:

Agent conversations and tool usage
Financial transactions and balance changes
Email communications and supplier relationships
Inventory levels and sales patterns
Time progression and environmental factors

Contributing - not yet but once it reaches a stable state that represents a proper v1 I would appreciate contribution

📄 License

[License information to be added]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.claude		.claude
__pycache__		__pycache__
.env.example		.env.example
.gitignore		.gitignore
Planning.md		Planning.md
README.md		README.md
agent.py		agent.py
database.py		database.py
economic_environment.py		economic_environment.py
email_system.py		email_system.py
main_simulation.py		main_simulation.py
model_client.py		model_client.py
requirements.txt		requirements.txt
run_simulation.sh		run_simulation.sh
search.py		search.py
storage.py		storage.py
tools.py		tools.py
vending_machine.py		vending_machine.py
vending_simulation.db		vending_simulation.db
weather.py		weather.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-source VendingBench Recreation

Project Overview

Implemented Features

To finish

Ideas for future

How to run

Prerequisites

Installation & Setup

How It Works

🔧 Development

Key Configuration

Monitoring & Analytics

Contributing - not yet but once it reaches a stable state that represents a proper v1 I would appreciate contribution

📄 License

🔗 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open-source VendingBench Recreation

Project Overview

Implemented Features

To finish

Ideas for future

How to run

Prerequisites

Installation & Setup

How It Works

🔧 Development

Key Configuration

Monitoring & Analytics

Contributing - not yet but once it reaches a stable state that represents a proper v1 I would appreciate contribution

📄 License

🔗 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages