|
Dr. Yang Li Department of Computer Science, Iowa State University Website |
Daniel Agyei Asante Department of Computer Science, Iowa State University Website |
📃 Paper
Basel is a principled low-rank compression framework designed to operate directly on the semantic structure of large language model weight matrices. It identifies the bases that encode high-impact semantic features for the target task and removes those with negligible contribution. This reduces weight parameters and memory footprint and improves inference throughput while preserving task accuracy. Basel achieves up to 2.7× model size reduction compared to state-of-the-art techniques and enables efficient deployment of language models on edge devices and in cost-sensitive environments. Basel is validated across mathematical reasoning (GSM8K, MATH), code generation (HumanEval, MBPP), and on language modeling (WikiText-2). Basel plays well with other compression methods and often beats them at their own game.
🔸 Basel with 8-bit achieves better accuracy than 4-bit quantization
🔸 Basel outperforms many existing low-rank and pruning-based compression methods
🔸 Models compressed with Basel remain stable under deeper compression levels where pruning accuracy collapses.
git clone https://github.com/Iowa-State-University-AI-System-Group/Basel.git
cd Basel
pip install --index-url https://download.pytorch.org/whl/cu124 \
torch==2.4.0+cu124 torchvision==0.19.0+cu124 torchaudio==2.4.0+cu124
pip install -r requirements.txt
Part 1 takes a dense model as input and generates a low-rank factorized model together with a dimension file specifying the shapes of its weight matrices.
Compression code train_bs_part1.py
Part 2 takes the factorized model and dimension file as input, fine-tunes the factorized model, and decompresses it into an equivalent dense model. The decompressed dense model is generated solely to facilitate convenient performance evaluation of the fine-tuned low-rank model.
Fine-tuning + decompression code train_bs_part2.py
@article{basel,
title={{Streamlining Language Models via Semantic Basis Analysis}},
author={Li, Yang and Asante, Daniel Agyei and Zhao, Changsheng and Chang, Ernie and Shi, Yangyang and Chandra, Vikas},
journal={Transactions on Machine Learning Research},
year={2025},
}
@article{basis_selection,
title={{Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications}},
author={Yang Li and Daniel Agyei Asante and Changsheng Zhao and Ernie Chang and Yangyang Shi and Vikas Chandra},
year={2024},
journal={arxiv preprint arXiv:2405.15877},
}



