BGEMM & BWTA (Binary Weight and Ternary Activation)

This is a repository for Binary General Matrix Multiply (BGEMM) and BWTA (Binary Weight and Ternary Activation) GEMM by customized CUDA kernels. Thank FP6-LLM for the wheels!

Still developing...

Installation

Tested on SM80, SM86 architecture.

cd lowbit_kernel && make bgemm
cd .. && pip3 install .

Speed and correctness test

cd tests/python
# directly test GEMM loops
python3 test_kernel.py  
# tiny training demo of a 3-layer MLP
python3 test_model_demo.py --model=[bnn_bgemm, bnn_fp16, fp16]

TODO

Pytorch extension and layers (linear and matmul layers (standard matmul with {-1, 1} $\times$ {-1, 1} and $A \times V$ with {0, 1} $\times$ {-1, 1} ) in bgemm_linear.py).
Simple MLP demo.
BERT model demo using BGEMM kernel.
More bitwidth support, e.g., $W_1A_{f16}$, $W_1A_{f8}$.
Support arbitrarily $N$ (batch size).
Optimize Share Memory Usage.
Larger bandwidth instruction support (m16n8k256) for further speedup.

Reference

FP6-LLM. Arxiv

@misc{xia2024fp6llm,
      title={FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design}, 
      author={Haojun Xia and Zhen Zheng and Xiaoxia Wu and Shiyang Chen and Zhewei Yao and Stephen Youn and Arash Bakhtiari and Michael Wyatt and Donglin Zhuang and Zhongzhu Zhou and Olatunji Ruwase and Yuxiong He and Shuaiwen Leon Song},
      year={2024},
      eprint={2401.14112},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs/figures		docs/figures
examples		examples
lowbit_kernel		lowbit_kernel
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_cpp.sh		build_cpp.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BGEMM & BWTA (Binary Weight and Ternary Activation)

Still developing...

Installation

Speed and correctness test

TODO

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BGEMM & BWTA (Binary Weight and Ternary Activation)

Still developing...

Installation

Speed and correctness test

TODO

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages