Skip to content

cavaunpeu/frozen-lake-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Frozen Lake RL

This repository contains a small collection of planning / reinforcement-learning style algorithms implemented on top of the classic FrozenLake-v1 environment from Gymnasium.

The focus is to provide clear, self-contained reference implementations of:

  • Policy iteration
  • Value iteration (synchronous and asynchronous)
  • Finite-horizon forward search
  • Branch and bound forward search
  • Monte Carlo Tree Search (MCTS)

All algorithms operate on the tabular 4x4 Frozen Lake MDP using the transition model exposed by env.unwrapped.P.


Project structure

  • run.py: Command-line entry point. Creates the FrozenLake environment and runs a selected algorithm to build a policy, with an optional visualization of a single rollout.
  • algorithms/base.py: Base class used by all algorithms, plus a shared one_step_lookahead utility.
  • algorithms/policy_iteration.py: Classic policy evaluation + policy improvement loop.
  • algorithms/value_iteration.py:
    • Core value-iteration routine.
    • ValueIterationAlgorithm, SynchronousValueIterationAlgorithm, and AsynchronousValueIterationAlgorithm wrappers that expose a build_policy() method.
  • algorithms/forward_search.py: Simple depth-limited expectimax-style forward search.
  • algorithms/branch_and_bound.py: Forward search with a value-iteration-based upper bound for pruning.
  • algorithms/mcts.py: A basic Monte Carlo Tree Search implementation for Frozen Lake, maintaining visit counts and action-value estimates over states.

All algorithms implement a common interface:

  • Constructor: Algorithm(env, ...)
  • Policy builder: policy = algorithm.build_policy()
  • Policy: a callable that takes a state index and returns an action index.

Installation (with uv)

This project targets Python 3.10+ and uses uv for dependency management and virtual environments.

  1. Clone the repo
git clone <your-fork-or-origin-url> frozen-lake-rl
cd frozen-lake-rl
  1. Install uv (if you don’t already have it)

On macOS / Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then restart your shell (or source the profile changes) so that uv is on your PATH, and verify:

uv --version
  1. Let uv handle the environment and dependencies

You don’t need to create a virtualenv or run pip manually. The first time you run any uv run ... command, uv will:

  • Create an isolated environment for the project.
  • Install the dependencies declared in pyproject.toml (including gymnasium[toy-text] and numpy).

Usage

From the project root, you can run everything through uv run:

uv run python run.py --algorithm PolicyIteration --visualize

Available algorithms

The --algorithm flag accepts any of the following values:

  • PolicyIteration: Classic dynamic-programming policy iteration.
  • SynchronousValueIteration: Value iteration where all states are updated from the value function of the previous sweep.
  • AsynchronousValueIteration: Value iteration where updates are written back to the same value function as they are computed.
  • ForwardSearch: Depth-limited forward search over the tabular transition model.
  • BranchAndBound: Forward search that uses a value-iteration-based upper bound to prune branches.
  • MonteCarloTreeSearch: A basic MCTS planner using simulated rollouts and an exploration policy over actions.

Example commands:

# Policy iteration with visualization
uv run python run.py --algorithm PolicyIteration --visualize

# Synchronous value iteration (no visualization)
uv run python run.py --algorithm SynchronousValueIteration

# Monte Carlo Tree Search with visualization
uv run python run.py --algorithm MonteCarloTreeSearch --visualize

Environment configuration

By default, run.py creates a deterministic 4x4 Frozen Lake environment:

  • map_name="4x4"
  • is_slippery=False
  • render_mode="human"

If you want to experiment with other maps or stochastic dynamics, you can edit the gym.make(...) call in run.py and re-run the script.


Extending the code

To add a new algorithm:

  1. Create a new file in algorithms/ (or extend an existing one).
  2. Subclass BaseAlgorithm from algorithms.base and implement build_policy(self) so that it returns a callable policy(state) -> action.
  3. Register your algorithm in ALGORITHMS in run.py:
from algorithms.your_algorithm import YourAlgorithm

ALGORITHMS = {
    # ...
    "YourAlgorithm": YourAlgorithm,
}

After that, you can run it via:

uv run python run.py --algorithm YourAlgorithm --visualize

Requirements

Runtime dependencies are declared in pyproject.toml and are automatically installed by uv the first time you run a uv run ... command.

About

Hand-rolled classical RL algorithms on the Frozen Lake Gym environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages