This repository contains a small collection of planning / reinforcement-learning style algorithms implemented on top of the classic FrozenLake-v1 environment from Gymnasium.
The focus is to provide clear, self-contained reference implementations of:
- Policy iteration
- Value iteration (synchronous and asynchronous)
- Finite-horizon forward search
- Branch and bound forward search
- Monte Carlo Tree Search (MCTS)
All algorithms operate on the tabular 4x4 Frozen Lake MDP using the transition model exposed by env.unwrapped.P.
run.py: Command-line entry point. Creates the FrozenLake environment and runs a selected algorithm to build a policy, with an optional visualization of a single rollout.algorithms/base.py: Base class used by all algorithms, plus a sharedone_step_lookaheadutility.algorithms/policy_iteration.py: Classic policy evaluation + policy improvement loop.algorithms/value_iteration.py:- Core value-iteration routine.
ValueIterationAlgorithm,SynchronousValueIterationAlgorithm, andAsynchronousValueIterationAlgorithmwrappers that expose abuild_policy()method.
algorithms/forward_search.py: Simple depth-limited expectimax-style forward search.algorithms/branch_and_bound.py: Forward search with a value-iteration-based upper bound for pruning.algorithms/mcts.py: A basic Monte Carlo Tree Search implementation for Frozen Lake, maintaining visit counts and action-value estimates over states.
All algorithms implement a common interface:
- Constructor:
Algorithm(env, ...) - Policy builder:
policy = algorithm.build_policy() - Policy: a callable that takes a state index and returns an action index.
This project targets Python 3.10+ and uses uv for dependency management and virtual environments.
- Clone the repo
git clone <your-fork-or-origin-url> frozen-lake-rl
cd frozen-lake-rl- Install
uv(if you don’t already have it)
On macOS / Linux:
curl -LsSf https://astral.sh/uv/install.sh | shThen restart your shell (or source the profile changes) so that uv is on your PATH, and verify:
uv --version- Let
uvhandle the environment and dependencies
You don’t need to create a virtualenv or run pip manually. The first time you run any uv run ... command, uv will:
- Create an isolated environment for the project.
- Install the dependencies declared in
pyproject.toml(includinggymnasium[toy-text]andnumpy).
From the project root, you can run everything through uv run:
uv run python run.py --algorithm PolicyIteration --visualizeThe --algorithm flag accepts any of the following values:
PolicyIteration: Classic dynamic-programming policy iteration.SynchronousValueIteration: Value iteration where all states are updated from the value function of the previous sweep.AsynchronousValueIteration: Value iteration where updates are written back to the same value function as they are computed.ForwardSearch: Depth-limited forward search over the tabular transition model.BranchAndBound: Forward search that uses a value-iteration-based upper bound to prune branches.MonteCarloTreeSearch: A basic MCTS planner using simulated rollouts and an exploration policy over actions.
Example commands:
# Policy iteration with visualization
uv run python run.py --algorithm PolicyIteration --visualize
# Synchronous value iteration (no visualization)
uv run python run.py --algorithm SynchronousValueIteration
# Monte Carlo Tree Search with visualization
uv run python run.py --algorithm MonteCarloTreeSearch --visualizeBy default, run.py creates a deterministic 4x4 Frozen Lake environment:
map_name="4x4"is_slippery=Falserender_mode="human"
If you want to experiment with other maps or stochastic dynamics, you can edit the gym.make(...) call in run.py and re-run the script.
To add a new algorithm:
- Create a new file in
algorithms/(or extend an existing one). - Subclass
BaseAlgorithmfromalgorithms.baseand implementbuild_policy(self)so that it returns a callablepolicy(state) -> action. - Register your algorithm in
ALGORITHMSinrun.py:
from algorithms.your_algorithm import YourAlgorithm
ALGORITHMS = {
# ...
"YourAlgorithm": YourAlgorithm,
}After that, you can run it via:
uv run python run.py --algorithm YourAlgorithm --visualizeRuntime dependencies are declared in pyproject.toml and are automatically installed by uv the first time you run a uv run ... command.