Highlights
- Pro
Pinned Loading
-
Awesome-Edge-LLMs
Awesome-Edge-LLMs PublicThis is a repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve a…
-
BGEMM-CUDA
BGEMM-CUDA PublicBGEMM-CUDA is a CUDA-based low-bit GEMM kernel library for efficient neural network inference. It implements optimized binary and ternary matrix multiplication primitives, including binary-weight a…
-
MoE-Slimming
MoE-Slimming PublicOfficial ICML 2026 Spotlight implementation for structural MoE compression, including attribution-guided channel scoring, coverage-maximized pruning, compact checkpoint construction, and fine-tunin…
Python
-
MP-Sparse-Attn
MP-Sparse-Attn PublicMP-Sparse-Attn provides Triton kernels for Diagonal-Tiled Mixed-Precision Attention, targeting efficient low-bit MXFP inference for Transformer models. It combines tile-level mixed-precision comput…
Python 2
If the problem persists, check the GitHub status page or contact support.




