Skip to content

xingchengxu/ArithmeticLLM

Repository files navigation

ArithmeticLLM

Code for "Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks", (2024)

Remark: The code is cloned from https://github.com/karpathy/nanoGPT. We have made modifications based on the code and our arithmetic tasks.

Use Cases

Generate data

Utilize the code located in the "data" folder to produce the task data.

Prepare data

For Addition and Modular Addition:

python data/addition/prepare_linebyline.py

For Addition and Modular Addition:

python data/multiply/prepare_linebyline.py

This creates a meta.pkl, train.pkl and val.pkl in that data directory.

Model Training

Use GPU for additon/modular addtion training:

python train.py config/train_addition_char.py \
--tensorboard_log=True \
--run_id=1 \
--device=cuda:0 \
--line_train=True \
--ckpt_name=ckpt_add.pt

Use GPU for multiplication/modular multiplication training:

python train.py config/train_multiply_char.py \
--tensorboard_log=True \
--run_id=1 \
--device=cuda:0 \
--line_train=True \
--ckpt_name=ckpt_multiply.pt

About

Code for "Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks", (2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors