Second-Order Fine-Tuning without Pain for LLMs: a Hessian Informed Zeroth-Order Optimizer(ICLR 2025)
In this work, we propose a diagonal hessian-informed zeroth-order optimizer(HiZOO) without computing first-order or second-order derivatives. To our knowledge, this is the first work that leverages hessian to enhance zeroth-order optimizer for fine-tuning LLMs. What’s more, HiZOO avoids the heavy memory cost brought by backpropagation while only increases one forward pass per step. Extensive experiments on various models(350M∼66B parameters) indicate that HiZOO efficiently improves model convergence, reducing training steps and enhancing model accuracy.
conda create -n HiZOO python==3.9.19
conda activate HiZOO
pip install -r requirements.txtThis environment can support the OPT, LLaMA, Phi and other latest models.
Use run.py for all functions (zero-shot/ICL/fine-tuning/MeZO/HiZOO):
python run.py {ARGUMENTS}Please read run.py for a complete list of arguments.
We provide example script below for reproducing our experiments. All our examples sample 1,000 training examples, 500 validation examples, and 1,000 testing examples.
# HiZOO (full-parameter fine-tune OPT-13B on CB dataset)
CUDA_VISIBLE_DEVICES=0 MODEL=facebook/opt-13b TASK=WSC MODE=ft LR=1e-6 EPS=1e-3 HESSIAN_SMOOTH_TYPE=constant1e-8 bash HiZOO.sh
@article{zhao2024second,
title={Second-order fine-tuning without pain for llms: A hessian informed zeroth-order optimizer},
author={Zhao, Yanjun and Dang, Sizhe and Ye, Haishan and Dai, Guang and Qian, Yi and Tsang, Ivor W},
journal={arXiv preprint arXiv:2402.15173},
year={2024}
}