PyTorch implementation of CoLA.
For reproducibility, we open-source our data and code; the open-source list is as follows.
- Conda environment configuration file (
LLaMA-Factory/environment.yaml) - Option fill prompt and its code (
tool/convert.py) - The complete converted dataset, including GSM8K and BBH (
LLaMA-Factory/data/gsm8k_train.json,LLaMA-Factory/evaluation/gsm8k/test/gsm8k_test.csvandLLaMA-Factory/evaluation/bbh/test/*.csv) - The training and testing datasets after Fineval translation into English (
LLaMA-Factory/data/fineval_train-en.jsonandLLaMA-Factory/evaluation/fineval/test/*.csv) - Baseline code for single-domain (peft, please place it in your own conda environment, e.g.,
/home/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft). Due to inconsistencies across multiple modified Transformers repositories, please refer to the corresponding paper repositories for multi-domain baselines. - Complete fine-tuning code (
LLaMA-Factory)
Enter the LLaMA-Factory directory, download the Llama-3.2-3B and Llama-3.1-8B model weights to the models directory, modify the prefix field in the environment.yaml file, and set it to the directory where your Miniconda3/Anaconda is installed, and then create the conda environment:
conda env create -f environment.yaml
conda activate llama_factoryIf you encounter any issues during the installation steps above, you can refer to this: #1 (🌹Many thanks to ArkNightmaster for sharing information about the environment setup)
- Generality
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset databricks-dolly-15k --dataset_dir ./data --template llama3 --finetuning_type cola --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Generality --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --num_A 2 --num_B 3 - Law
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset us_terms,Lawyer-Instruct --dataset_dir ./data --template llama3 --finetuning_type cola --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Law --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --num_A 2 --num_B 3- Medicine
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset iCliniq,GenMedGPT-5k --dataset_dir ./data --template llama3 --finetuning_type cola --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Medicine --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --num_A 2 --num_B 3- Math
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset gsm8k --dataset_dir ./data --template llama3 --finetuning_type cola --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Math --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --num_A 2 --num_B 3- Finance
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset fineval-en --dataset_dir ./data --template llama3 --finetuning_type cola --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Finance --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --num_A 2 --num_B 3- Multi-tasking
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset openorca --dataset_dir ./data --template llama3 --finetuning_type cola --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Multi-tasking --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --num_A 2 --num_B 3Other baselines (LoRA, DoRA, PiSSA, HydraLoRA), taking Generality as an example.
- LoRA
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset databricks-dolly-15k --dataset_dir ./data --template llama3 --finetuning_type lora --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Generality/lora --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --lora_rank 8- DoRA
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset databricks-dolly-15k --dataset_dir ./data --template llama3 --finetuning_type lora --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Generality/dora --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --lora_rank 8 --use_dora- PiSSA
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset databricks-dolly-15k --dataset_dir ./data --template llama3 --finetuning_type lora --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Generality/pissa --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --lora_rank 8 --pissa_init- HydraLoRA
CUDA_VISIBLE_DEVICES=0 llamafactory-cli train --stage sft --do_train --model_name_or_path ./models/[Llama-3.2-3B or Llama-3.1-8B] --dataset databricks-dolly-15k --dataset_dir ./data --template llama3 --finetuning_type hydralora --output_dir ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Generality/hydralora --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --preprocessing_num_workers 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 50 --warmup_steps 20 --save_steps 100 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 5.0 --val_size 0.1 --plot_loss --fp16 --max_samples 1000 --lora_rank 8 --lora_num 3- Generality
CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval --model_name_or_path ./[Llama-3.2-3B or Llama-3.1-8B] --template llama3 --task mmlu_test_None --lang en --n_shot 0 --batch_size 8 --trust_remote_code --adapter_name_or_path ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Generality- Law
CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval --model_name_or_path ./[Llama-3.2-3B or Llama-3.1-8B] --template llama3 --task mmlu_test_Law --lang en --n_shot 0 --batch_size 8 --trust_remote_code --adapter_name_or_path ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Law- Medicine
CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval --model_name_or_path ./[Llama-3.2-3B or Llama-3.1-8B] --template llama3 --task mmlu_test_Medicine --lang en --n_shot 0 --batch_size 8 --trust_remote_code --adapter_name_or_path ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Medicine- Math
CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval --model_name_or_path ./[Llama-3.2-3B or Llama-3.1-8B] --template llama3 --task gsm8k_test_None --lang en --n_shot 0 --batch_size 8 --trust_remote_code --adapter_name_or_path ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Math- Finance
CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval --model_name_or_path ./[Llama-3.2-3B or Llama-3.1-8B] --template llama3 --task fineval_test_None --lang en --n_shot 0 --batch_size 8 --trust_remote_code --adapter_name_or_path ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Finance- Multi-tasking
CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval --model_name_or_path ./[Llama-3.2-3B or Llama-3.1-8B] --template llama3 --task bbh_test_None --lang en --n_shot 0 --batch_size 8 --trust_remote_code --adapter_name_or_path ./saves/[Llama-3.2-3B or Llama-3.1-8B]/Multi-taskingIf you find our work valuable, we would appreciate your citation:
@misc{zhou2025colacollaborativelowrankadaptation,
title={CoLA: Collaborative Low-Rank Adaptation},
author={Yiyun Zhou and Chang Yao and Jingyuan Chen},
year={2025},
eprint={2505.15471},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.15471},
}
