Skip to content

Iowa-State-University-AI-System-Group/Basel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dr. Yang Li
Department of Computer Science, Iowa State University
Website
Daniel Agyei Asante
Department of Computer Science, Iowa State University
Website

📖 Streamlining Language Models via Semantic Basis Analysis

📃 Paper

Basel is a principled low-rank compression framework designed to operate directly on the semantic structure of large language model weight matrices. It identifies the bases that encode high-impact semantic features for the target task and removes those with negligible contribution. This reduces weight parameters and memory footprint and improves inference throughput while preserving task accuracy. Basel achieves up to 2.7× model size reduction compared to state-of-the-art techniques and enables efficient deployment of language models on edge devices and in cost-sensitive environments. Basel is validated across mathematical reasoning (GSM8K, MATH), code generation (HumanEval, MBPP), and on language modeling (WikiText-2). Basel plays well with other compression methods and often beats them at their own game.

🔸 Basel with 8-bit achieves better accuracy than 4-bit quantization

🔸 Basel outperforms many existing low-rank and pruning-based compression methods

🔸 Models compressed with Basel remain stable under deeper compression levels where pruning accuracy collapses.

🔍 Table of Contents

🖥️ Software Dependencies

1) Get the repo

git clone https://github.com/Iowa-State-University-AI-System-Group/Basel.git
cd Basel

2) Create & activate a virtual environment

3) Install PyTorch

pip install --index-url https://download.pytorch.org/whl/cu124 \
  torch==2.4.0+cu124 torchvision==0.19.0+cu124 torchaudio==2.4.0+cu124

4) Install the project requirements

pip install -r requirements.txt

🧩 Basel Part 1: Compression

Part 1 takes a dense model as input and generates a low-rank factorized model together with a dimension file specifying the shapes of its weight matrices.

Compression code train_bs_part1.py

🚀 Basel Part 2: Finetuning + Decompression

Part 2 takes the factorized model and dimension file as input, fine-tunes the factorized model, and decompresses it into an equivalent dense model. The decompressed dense model is generated solely to facilitate convenient performance evaluation of the fine-tuned low-rank model.

Fine-tuning + decompression code train_bs_part2.py

💪 What Basel Delivers

1. Mathematical Reasoning Task

2. Programming Task

3. Language Modeling Task

📝 Citation

@article{basel,
      title={{Streamlining Language Models via Semantic Basis Analysis}}, 
      author={Li, Yang and Asante, Daniel Agyei and Zhao, Changsheng and Chang, Ernie and Shi, Yangyang and Chandra, Vikas},
      journal={Transactions on Machine Learning Research}, 
      year={2025},
}

@article{basis_selection,
      title={{Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications}}, 
      author={Yang Li and Daniel Agyei Asante and Changsheng Zhao and Ernie Chang and Yangyang Shi and Vikas Chandra},
      year={2024},
      journal={arxiv preprint arXiv:2405.15877}, 
}

About

A target-aware low-rank factorization approach for compressing large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages