GitHub - cavaunpeu/frozen-lake-rl: Hand-rolled classical RL algorithms on the Frozen Lake Gym environment

Frozen Lake RL

This repository contains a small collection of planning / reinforcement-learning style algorithms implemented on top of the classic FrozenLake-v1 environment from Gymnasium.

The focus is to provide clear, self-contained reference implementations of:

Policy iteration
Value iteration (synchronous and asynchronous)
Finite-horizon forward search
Branch and bound forward search
Monte Carlo Tree Search (MCTS)

All algorithms operate on the tabular 4x4 Frozen Lake MDP using the transition model exposed by env.unwrapped.P.

Project structure

run.py: Command-line entry point. Creates the FrozenLake environment and runs a selected algorithm to build a policy, with an optional visualization of a single rollout.
algorithms/base.py: Base class used by all algorithms, plus a shared one_step_lookahead utility.
algorithms/policy_iteration.py: Classic policy evaluation + policy improvement loop.
algorithms/value_iteration.py:
- Core value-iteration routine.
- ValueIterationAlgorithm, SynchronousValueIterationAlgorithm, and AsynchronousValueIterationAlgorithm wrappers that expose a build_policy() method.
algorithms/forward_search.py: Simple depth-limited expectimax-style forward search.
algorithms/branch_and_bound.py: Forward search with a value-iteration-based upper bound for pruning.
algorithms/mcts.py: A basic Monte Carlo Tree Search implementation for Frozen Lake, maintaining visit counts and action-value estimates over states.

All algorithms implement a common interface:

Constructor: Algorithm(env, ...)
Policy builder: policy = algorithm.build_policy()
Policy: a callable that takes a state index and returns an action index.

Installation (with uv)

This project targets Python 3.10+ and uses uv for dependency management and virtual environments.

Clone the repo

git clone <your-fork-or-origin-url> frozen-lake-rl
cd frozen-lake-rl

Install uv (if you don’t already have it)

On macOS / Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then restart your shell (or source the profile changes) so that uv is on your PATH, and verify:

uv --version

Let uv handle the environment and dependencies

You don’t need to create a virtualenv or run pip manually. The first time you run any uv run ... command, uv will:

Create an isolated environment for the project.
Install the dependencies declared in pyproject.toml (including gymnasium[toy-text] and numpy).

Usage

From the project root, you can run everything through uv run:

uv run python run.py --algorithm PolicyIteration --visualize

Available algorithms

The --algorithm flag accepts any of the following values:

PolicyIteration: Classic dynamic-programming policy iteration.
SynchronousValueIteration: Value iteration where all states are updated from the value function of the previous sweep.
AsynchronousValueIteration: Value iteration where updates are written back to the same value function as they are computed.
ForwardSearch: Depth-limited forward search over the tabular transition model.
BranchAndBound: Forward search that uses a value-iteration-based upper bound to prune branches.
MonteCarloTreeSearch: A basic MCTS planner using simulated rollouts and an exploration policy over actions.

Example commands:

# Policy iteration with visualization
uv run python run.py --algorithm PolicyIteration --visualize

# Synchronous value iteration (no visualization)
uv run python run.py --algorithm SynchronousValueIteration

# Monte Carlo Tree Search with visualization
uv run python run.py --algorithm MonteCarloTreeSearch --visualize

Environment configuration

By default, run.py creates a deterministic 4x4 Frozen Lake environment:

map_name="4x4"
is_slippery=False
render_mode="human"

If you want to experiment with other maps or stochastic dynamics, you can edit the gym.make(...) call in run.py and re-run the script.

Extending the code

To add a new algorithm:

Create a new file in algorithms/ (or extend an existing one).
Subclass BaseAlgorithm from algorithms.base and implement build_policy(self) so that it returns a callable policy(state) -> action.
Register your algorithm in ALGORITHMS in run.py:

from algorithms.your_algorithm import YourAlgorithm

ALGORITHMS = {
    # ...
    "YourAlgorithm": YourAlgorithm,
}

After that, you can run it via:

uv run python run.py --algorithm YourAlgorithm --visualize

Requirements

Runtime dependencies are declared in pyproject.toml and are automatically installed by uv the first time you run a uv run ... command.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
algorithms		algorithms
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frozen Lake RL

Project structure

Installation (with uv)

Usage

Available algorithms

Environment configuration

Extending the code

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Frozen Lake RL

Project structure

Installation (with uv)

Usage

Available algorithms

Environment configuration

Extending the code

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages