Learning Game-Playing Agents with Generative Code Optimization

Code for Learning Game-Playing Agents with Generative Code Optimization (ICML 2025 PRAL Workshop). We use Trace LLM optimizers (OptoPrime) to optimize Python policies to play Atari games via object-centric representations (OC_Atari). This repo provides the framework where you can let LLM play with Atari games via annotated text interfaces (no image/video required).

Paper that includes 3 initial games (Pong, Breakout, Space Invaders): https://openreview.net/forum?id=ZM65X3NoTd

Paper that includes 8 games: http://arxiv.org/abs/2603.23994

Supported Games

Asterix, Breakout, Enduro, Freeway, Pong, Q*bert, Seaquest, Space Invaders

Deep RL Baselines

We compare LLM-optimized policies against deep RL baselines that also use object-centric representations. Our CleanRL fork includes:

We share the training logs on Wandb:

Setup

1. Install dependencies

bash install.sh

This will:

Install uv if not already present
Clone the OC_Atari library into external/OC_Atari/
Install all Python dependencies via uv sync

2. Configure environment variables

Follow the LLM API Setup of Trace to use OptoPrime as the supported optimizer.

Running Training

Each game has a corresponding training script. Run with:

uv run python <game>_training.py

For example:

uv run python asterix_training.py
uv run python breakout_training.py
uv run python pong_training.py

Project Structure

├── *_training.py          # Training scripts (one per game)
├── trace_envs/            # Traced environment wrappers (one per game)
├── training_utils.py      # Shared training utilities
├── logging_util.py        # Logging configuration
├── plotting_game_perf.py  # Performance visualization
├── install.sh             # Setup script
├── pyproject.toml         # Dependencies (managed by uv)
├── external/OC_Atari/     # Object-centric Atari library
├── logs/                  # Training logs
└── trace_ckpt/            # Optimizer checkpoints

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
best_policies		best_policies
results/aggregated		results/aggregated
trace_envs		trace_envs
.gitignore		.gitignore
README.md		README.md
asterix_training.py		asterix_training.py
breakout_training.py		breakout_training.py
chess_LLM_agent.py		chess_LLM_agent.py
enduro_training.py		enduro_training.py
freeway_training.py		freeway_training.py
install.sh		install.sh
logging_util.py		logging_util.py
plotting_game_perf.py		plotting_game_perf.py
pong_training.py		pong_training.py
pyproject.toml		pyproject.toml
qbert_training.py		qbert_training.py
riverraid_ocatari_LLM_agent.py		riverraid_ocatari_LLM_agent.py
seaquest_training.py		seaquest_training.py
space_invaders_training.py		space_invaders_training.py
training_utils.py		training_utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Game-Playing Agents with Generative Code Optimization

Supported Games

Deep RL Baselines

Setup

1. Install dependencies

2. Configure environment variables

Running Training

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning Game-Playing Agents with Generative Code Optimization

Supported Games

Deep RL Baselines

Setup

1. Install dependencies

2. Configure environment variables

Running Training

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages