Associated code for (Replacing Normalization in Deep Transformers with Lipschitz and Manifold Geometry)
Read our paper here! Replacing Norm.pdf
(Currently under submission)
git clone https://github.com/RahulBirCodes/bluey.git
cd bluey
uv lock
or if uv uninstalled
pip install .
python3 scripts/sweep.py \
--xy_size 5 \
--add_fake_dim True \
--add_input_noise True \
--project_name bluey-merdifold \
--last_k 50 \
--output_dir jobs
run_job.sh OPTIMIZER ARCH JOB_ID PHASE DEVICE CKPT_ROOT NUM_STEPS CHECKPOINT_EVERY
You can vary the number of devices by adding an optional number of GPU flag
chmod +x run_job.sh run_jobs.sh
run_jobs.sh OPTIMIZER ARCH START_ID END_ID PHASE DEVICE CKPT_ROOT NUM_STEPS CHECKPOINT_EVERY [NUM_GPU]
scripts/run_job.sh ManifoldMuonW none 1 sweep cpu checkpoints 200 200