Minimalistic implementation of Hierarchical Recurrent Model (HRM).
Ensure Python and PyTorch is installed, and your machine have at least 1 GPU and total 40 GiB VRAM. Then install pip dependencies, it should be done in 10 minutes:
pip install -r requirements.txtThis project uses Weights & Biases for experiment tracking and metric visualization. Ensure you're logged in:
wandb loginThe following commands pulls the required datasets from HuggingFace repositories.
mkdir downloaded-datasets
hf download --repo-type dataset --local-dir ./downloaded-datasets/maze-30x30-hard-1k sapientinc/maze-30x30-hard-1k
hf download --repo-type dataset --local-dir ./downloaded-datasets/sudoku-extreme-1k sapientinc/sudoku-extreme-1kRun the commands below to load trained Sudoku checkpoint for the dynamics analysis.
hf download --repo-type model --local-dir ./checkpoints/1000_tuned_hrm_new cl-agi/hrm-miniThe original experiments run on one node with 8 H100 GPUs. Sudoku takes about 30 minutes to run. If you want to run on a single GPU, set --nproc-per-node 1 in the command line. Also multiply local batch size by 8, e.g. local_batch_size=768. Sudoku will take ~4 hours per experiment on a single H100. Besides, the script by default runs 3 seeds, append seeds=[1] to run a single seed.
Sudoku-Extreme 1000 examples. It should take about 4 GPU*hours for H100 (~30 min for 8 H100 GPUs, ~4 hr for 1 H100 GPU).
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc-per-node 8 train.py --config-name tuned_hrmHRM Full: See above
Recurrent Transformer
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc-per-node 8 train.py --config-name tuned_rtNo dual timescale
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc_per_node 8 train.py --config-name tuned_hrm arch.name=hrm_ablations@HRM arch.L_cycles=1 arch.H_cycles=7Tied H-L parameters (TRM-style)
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc_per_node 8 train.py --config-name tuned_hrm arch.name=hrm_ablations@HRM +arch.dual_module=FalseNo H-H links
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc_per_node 8 train.py --config-name tuned_hrm arch.name=hrm_ablations@HRM +arch.hh_link=FalseMLP Mixer
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc_per_node 8 train.py --config-name tuned_hrm +arch.is_mlp_mixer=TrueInstall Jupyter and load visualizations.ipynb. If you want to evaluate other checkpoint, change the checkpoint path in the first cell. It should take several minutes.
Maze 30x30
OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 torchrun --nproc-per-node 8 train.py --config-name tuned_hrm data=mazeFor 3-SAT, please switch to SAT branch to train.
We use this docker image for experiments. You can use this image for exact reproducing.
You can check the exact software version in this image.
docker pull sapientai/pytorch-docker:26.02.14.hopper