Code for "Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks", (2024)
Remark: The code is cloned from https://github.com/karpathy/nanoGPT. We have made modifications based on the code and our arithmetic tasks.
Utilize the code located in the "data" folder to produce the task data.
For Addition and Modular Addition:
python data/addition/prepare_linebyline.pyFor Addition and Modular Addition:
python data/multiply/prepare_linebyline.pyThis creates a meta.pkl, train.pkl and val.pkl in that data directory.
Use GPU for additon/modular addtion training:
python train.py config/train_addition_char.py \
--tensorboard_log=True \
--run_id=1 \
--device=cuda:0 \
--line_train=True \
--ckpt_name=ckpt_add.ptUse GPU for multiplication/modular multiplication training:
python train.py config/train_multiply_char.py \
--tensorboard_log=True \
--run_id=1 \
--device=cuda:0 \
--line_train=True \
--ckpt_name=ckpt_multiply.pt