Skip to content
#

gdn

Here are 9 public repositories matching this topic...

Production-grade runtime patches for vLLM (45+ patches) — Qwen3.6-35B-A3B-FP8 hybrid GDN+MoE on NVIDIA Ampere (SM 80-86). 127 tok/s MTP free-form, 99 tok/s suffix tool-call (max 175). TurboQuant k8v4 KV cache, 256K context verified to 252K. P67 multi-query kernel + Suffix Decoding + adaptive ngram K. Zero source modifications.

  • Updated Apr 27, 2026
  • Python

Improve this page

Add a description, image, and links to the gdn topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gdn topic, visit your repo's landing page and select "manage topics."

Learn more