Skip to content

BudEcosystem/Awesome-vLLM-plugins

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ Awesome vLLM Plugins

Awesome vLLM PRs Welcome License

A curated list of plugins and extensions built on top of vLLM โ€” hardware backends, custom model integrations, and general plugins that extend vLLM without forking the core.

๐ŸŽ‰ PRs welcome! See Contributing for how to add new plugins.

๐Ÿ”Œ Hardware / Platform Plugins

โœ… Officially Listed Hardware Plugins

These plugins live under the vLLM hardware plugin RFC and are listed in the official installation docs.

Accelerator Package / Name Repository Notes
๐Ÿ”ท Ascend NPU vllm-ascend GitHub Community-maintained Ascend NPU plugin; reference design for hardware plugins.
๐Ÿ’  Intel Gaudi (HPU) (from source) GitHub Gaudi hardware plugin; optimized kernels and integration.
๐Ÿ”ต IBM Spyre AIU vllm-spyre GitHub vLLM backend for IBM Spyre AIU.
๐Ÿ’  Intel OpenVINO (from source) GitHub OpenVINO backend for vLLM to run LLMs efficiently on Intel CPUs and GPUs. Focuses on AVX2-class CPUs and Intel iGPU/dGPU, with support for features like prefix caching and chunked prefill.

๐Ÿ“ Note: The table above is intentionally kept aligned with the hardware plugin list in the official vLLM docs. Check the vLLM installation page for the latest status and compatibility notes.

๐Ÿงฉ Additional Platform Plugins

Plugins that expose additional accelerators or platform backends using vllm.platform_plugins (or that act as deโ€‘facto hardware integrations).

Platform Package / Name Repository Notes
๐ŸŽฎ MetaX MACA GPU (from source) GitHub MACA GPU backend with custom platform implementation.
๐Ÿ”ถ Tenstorrent (from source) GitHub Platform plugin that extracts the Tenstorrent platform implementation from a vLLM fork and packages it as a standalone plugin, enabling vLLM v1 architecture models to run on Tenstorrent hardware.
๐ŸŸฃ Cambricon MLU (from source) GitHub Hardware plugin for Cambricon MLU accelerators.
โšก Rebellions ATOM / REBEL NPU vllm-rbln GitHub Plugin for Rebellions NPUs (ATOM / REBEL).
๐Ÿ”ด Baidu Kunlun XPU (from source) GitHub Hardware plugin for Kunlun XPU (Kunlun3 P800). Supports transformer, MoE, embedding, and multimodal models with quantization and LoRA.

๐Ÿง  Model & Architecture Plugins

Plugins that register new model architectures or large out-of-tree projects using vllm.general_plugins (or similar).

Model / Architecture Name Repository / Distribution Notes
๐ŸŒ€ Diffusion LMs (LLaDA, Dream) diffuplug GitHub Registers diffusion LMs such as LLaDA and Dream with vLLM. Implements custom samplers (e.g. LLaDASampler) and adapters so diffusion language models can run through vLLM's engine.
๐Ÿค– OpenVLA (Vision-Language-Action) openvla-vllm GitHub vLLM plugin for OpenVLA that accelerates offline batch inference for visionโ€‘languageโ€‘action models, without needing custom compilation.

๐Ÿ’ก If you know of other modelโ€‘registration plugins (e.g. for custom multimodal architectures), please open a PR and add them here!

๐ŸŽฏ Decoder Plugins

Plugins that implement custom decoding strategies, sampling methods, and logits processors for advanced text generation.

Decoder Name Repository Notes
๐Ÿง  Chain-of-Thought Decoder vllm-cot-decoder GitHub Confidence-weighted CoT decoding that adapts sampling based on model certainty. Sharpens predictions when confident, enables exploration when uncertain. ~1.1ร— overhead.
๐Ÿ“Š Entropy Decoder vllm-entropy-decoder GitHub Adaptive entropy-based decoding using entropy + varentropy metrics. Dynamically adjusts temperature and filtering based on model uncertainty patterns.

๐Ÿ”ฌ These decoder plugins are part of the BudEcosystem vLLM Plugins collection.

โšก Performance / Optimization Plugins

Plugins that provide inference optimizations, speculative decoding, parallelism strategies, and performance enhancements for vLLM.

Plugin Name Repository Notes
โ„๏ธ Arctic Inference arctic-inference GitHub Snowflake's comprehensive optimization suite: Arctic Ulysses (sequence parallelism), Shift Parallelism, speculative decoding (Arctic Speculator, Suffix Decoding), SwiftKV, and 16x faster embeddings. Achieves 3.4x faster request completion.

โš™๏ธ Feature / IO / Integration Plugins

Plugins that focus on features, scheduling, logging, or IO processing rather than new hardware or models. Many of these are small code packages that hook into:

  • ๐Ÿ”ง vllm.general_plugins โ€“ for patches and model registration.
  • ๐Ÿ“ฅ vllm.io_processor_plugins โ€“ for custom request pre/postโ€‘processing.
  • ๐Ÿ“Š vllm.stat_logger_plugins โ€“ for metrics exporters and custom stats.

Examples you might encounter:

  • ๐Ÿ“… Custom scheduling / patch packages

    • Blog & reference implementation: "vLLM Custom Patches" style plugins that add features like priority-based scheduling via a general plugin, typically wiring to VLLM_PLUGINS or custom env vars (e.g. VLLM_CUSTOM_PATCHES).
  • ๐Ÿ“ˆ Logging / metrics plugins

    • Packages that register stat loggers via vllm.stat_logger_plugins to export metrics to systems like Prometheus, custom APM, or internal dashboards.

๐Ÿšง Most of these are still emerging and tend to be internal to companies. This section is intentionally left open for community contributions.

๐Ÿ“š Official Docs & Specs

๐Ÿ“– Tutorials, Examples & Blog Posts

๐Ÿ›๏ธ Official Resources

๐ŸŒ Community Guides

๐Ÿค Contributing

Spotted a new plugin, blog post, or tutorial?

  1. โž• Add an entry under the appropriate section:
    • Include:
      • Name
      • Short one-line description
      • Link (GitHub, docs, PyPI, or HF card)
      • Plugin type (general / platform / IO / stats / etc.)
  2. ๐Ÿ”ค Sort alphabetically within each subsection.
  3. ๐Ÿ“ฌ Open a PR with a brief explanation:
    • What does the plugin do?
    • Which plugin group(s) does it register under?
    • Minimum vLLM version (if known).

This repo aims to list plugins, not generic vLLM forks or random scripts. As a rule of thumb: if it doesn't register entry points under a vllm.*_plugins group, it probably doesn't belong here.


โญ Star this repo if you find it useful! โญ

About

A curated list of plugins built on top of vLLM

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •