Skip to content

Update vllm/vllm-openai Docker tag to v0.22.0#79

Open
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/vllm-vllm-openai-0.x
Open

Update vllm/vllm-openai Docker tag to v0.22.0#79
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/vllm-vllm-openai-0.x

Conversation

@renovate

@renovate renovate Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This PR contains the following updates:

Package Update Change Pending
vllm/vllm-openai (source) minor v0.21.0v0.22.0 v0.23.0 (+1)

Release Notes

vllm-project/vllm (vllm/vllm-openai)

v0.22.0

Compare Source

Highlights

This release features 459 commits from 230 contributors (63 new)!

  • DeepSeek V4 maturity: DeepSeek V4 received a major hardening pass this cycle — the model was reorganized into a dedicated vllm/models/deepseek_v4/ package (#​43004, #​43039, #​43073, #​43077, #​43149), gained NVFP4 fused MoE support (#​42209), full + piecewise CUDA graph (#​42604), and MTP speculative decoding (#​43385). A large set of fused kernels (MegaMoE, mhc, Q-norm, indexer, sparse MLA) and ROCm parity fixes landed alongside accuracy fixes (#​42810, #​43710).
  • Model Runner V2 advances toward default: MRv2 is now default for Qwen3 dense models. vLLM will fall back to MRv1 for features that aren't yet supported in MRv2 (#​39337). sleep-mode weight reload (#​42673), update_config (#​42783), and shared KV-cache layers (#​35045), plus many correctness fixes.
  • Experimental Rust frontend: A new Rust front-end integration landed (#​40848), with the implementation moved into the tree (#​43283) and a DP Supervisor for data-parallel serving (#​40841).
  • Batch invariance, faster: Batch-invariant inference gained Cutlass FP8 support for a 28.9% end-to-end latency improvement (#​40408), compile-mode support on SM80 (#​42456), and an NVFP4 Cutlass linear path (#​39912).
  • Multi-tier KV cache offloading: A new multi-tier KV cache offloading framework (#​40020) with a Python filesystem secondary tier (#​41735), DSv4 support (#​43142), and Mooncake disk offloading (#​42689) extends offloading beyond CPU memory.
Model Support
Engine Core
Hardware & Performance
Large Scale Serving
  • Disaggregated serving (NIXL): lease-renewal TTL for KV blocks on P (#​41383), handshake-failure policy honoring (#​40364), GDN support for PD with NIXL (#​41869), multi-node TP>8 fix (#​39907), side-channel host-selection fix (#​41806).
  • Mooncake: disk offloading in MooncakeStoreConnector (#​42689), HMA support for DSV4 (#​42828), operation metrics (#​43392), load-failure propagation (#​42788), block-aligned full hits (#​43494), finish-after-preemption handling (#​43281).
  • Data parallel: DP Supervisor (#​40841), publish request counts at engine-step start (#​41626), forward X-data-parallel-rank header (#​42330).
  • EPLB: change default EPLB communicator (#​43110), VLM-wrapper init fix (#​39805), remove dead torch.accelerator.synchronize() (#​40733).
  • LoRA: one-shot Triton kernel for MoE LoRA (#​42290), simultaneous 2D & 3D MoE LoRA adapters (#​42242), reduced 2D-weight memory under EP (#​42737), MoE LoRA align-kernel grid fix (#​40131).
Quantization
  • MXFP4: linear layers + compressed-tensors integration (#​41664), CPU W4A16 MoE (#​41922), XPU mxfp8 MoE (#​41918).
  • NVFP4: DeepSeek V4 fused MoE (#​42209), ModelOpt W4A16 NVFP4 fused MoE + mixed-precision dispatch (#​42566), batch-invariant NVFP4 Cutlass linear (#​39912), FlashInfer TRTLLM NvFP4 monolithic MoE routing fix (#​43223), TRTLLM NVFP4 MoE chunking fix (#​43599).
  • Quark: load Quark NVFP4 checkpoints (#​35859), W8A8 INT8 garbage-output fix on Step-3.5-Flash (#​41892), W4A4 oracle refactor (#​41436).
  • AutoRound: W4A16 support (#​39778).
  • ModelOpt: Qwen3.5/3.6 VLM quantized prefix mapping (#​42546).
  • Framework: rework quantization_config to use QuantKey with activation override (#​41566), MoE W4A8 CT migrated to oracle (#​42680), AWQ Marlin MoE onto modular WNA16 oracle (#​42483), GPTQ consolidation (gptq_marlinauto_gptq) (#​38288).
API & Frontend
  • Rust frontend: integration (#​40848), in-tree code move (#​43283), utility call-ID newtype (#​43405), simplified AuthenticationMiddleware path extraction (#​43426).
  • Responses API: chat_template_kwargs support (#​42272), message-merging fix (#​42189), empty channel/recipient harmony fix (#​35540).
  • Completions: thinking_token_budget support (#​42116) with inverted-condition fix (#​41674); map reasoning_effort to enable_thinking (#​43401).
  • Frontend: truncation side for OpenAI endpoints (#​43260), normalize reasoning_contentreasoning (#​42664), reworked fastokens integration (#​43168), consolidated Speech-to-Text entrypoints (#​42370, #​42274), beam-search consolidation via BeamSearchMixin (#​42946), score/rerank chat-template instructions (#​42412).
  • Auth: API-key authorization for /v2 endpoints (#​42594).
  • Offline API: pooling offline API split into PoolingOfflineMixin (#​42267), split offline inference APIs/utils (#​43553).
Build & Dependencies
Deprecations & Removals
  • Removed old locations of get_tokenizer and resolve_hf_chat_template (#​35024).
  • Marked env vars now covered by --moe-backend / --linear-backend (#​43148).
  • Removed deprecated MLA prefill arguments (#​42555).
  • Removed dead CUDA kernels and dead code (#​42767, #​42889, #​43144).

Contributors

@​yewentao256, @​haosdent, @​njhill, @​mgoin, @​jeejeelee, @​AndreasKaratzas, @​NickLucche, @​sfeng33, @​noooop, @​WoosukKwon, @​khluu, @​taneem-ibrahim, @​Dao007forever, @​vadiklyutiy, @​bnellnm, @​ivanium, @​tjtanaa, @​mmangkad, @​hmellor, @​DarkLight1337, @​hickeyma, @​zhenwei-intel, @​jikunshang, @​ronensc, @​benchislett, @​hao-aaron, @​arpera, @​zyongye, @​gau-nernst, @​frida-andersson, @​ZhanqiuHu, @​cleonard530, @​akii96, @​bedeks, @​Isotr0py, @​JasonKeyiL, @​bigPYJ1151, @​zhewenl, @​weizhoublue, @​zxd1997066, @​gnovack, @​chaojun-zhang, @​majian4work, @​chaunceyjiang, @​pschlan-amd, @​amitz-nv, @​yma11, @​dsikka, @​tc-mb, @​shanjiaz, @​jperezdealgaba, @​yzong-rh, @​viktorpusTT, @​TheEpicDolphin, @​MatthewBonanni, @​shen-shanshan, @​hallerite, @​zufangzhu, @​bbrowning, @​divakar-amd, @​ianliuy, @​esmeetu, @​rasmith, @​louie-tsai, @​pmaybank, @​liulanze, @​ZJY0516, @​TheDuyIT, @​wzhao18, @​jinzhen-lin, @​BugenZhao, @​ashwing, @​fuergaosi233, @​hqhq1025, @​shaharmor98, @​pisceskkk, @​lkm2835, @​noa-neria, @​Rohan138, @​whx-sjtu, @​vrdn-23, @​alexagriffith, @​Flink-ddd, @​jeffreywang-anyscale, @​skyloevil, @​ymoslem, @​Lucaskabela, @​kg6-sleipnir, @​woernfl, @​tdoublep, @​GOavi101, @​jmamou, @​PeaBrane, @​KaivalyaMDabhadkar, @​BWAAEEEK, @​MrZ20, @​afierka-intel, @​JoursBleu, @​hissu-hyvarinen, @​mwawrzos, @​CynicDora, @​NoeliaBentancor, @​johncalesp, @​fynnsu, @​fxmarty-amd, @​walterbm, @​liangel-02, @​lgeiger, @​he-yufeng, @​abinggo, @​KrxGu, @​hks-9697-v2, @​Sarah-Salah, @​rebklee, @​aoshen02, @​haic0, @​libinta, @​Zhenzhong1, @​xhx1022, @​b-mu, @​WindChimeRan, @​tpopp, @​charlifu, @​chengyinie, @​ricky-chaoju, @​lyd1992, @​daniel-devlab, @​paulyu12, @​bobofang11235, @​laudney, @​BadrBasowid, @​maeehart, @​PatchouliTIS, @​chunxiaozheng, @​blake-snc, @​southfreebird, @​rbrugaro-amd, @​rasdani, @​dusthunter, @​qizzzh, @​ProExpertProg, @​qianlihuang, @​alec-flowers, @​JisoLya, @​gaozihao-shy, @​rishaps, @​xyang16, @​wendyliu235, @​hlin99, @​tianmu-li, @​yuwenzho, @​inisis, @​kfirtoledo, @​roikoren755, @​liranschour, @​vllm-agent, @​blancsw, @​netanel-haber, @​BowenBao, @​czhu-cohere, @​amitport, @​tuukkjs, @​revit13, @​ofirzaf, @​qyYue1389, @​junyanxu, @​gracie-guo, @​sagearc, @​xinyu-intel, @​yiwen101, @​DomBrown, @​tomeras91, @​Dogacel, @​maxdebayser, @​fadara01, @​Terrencezzj, @​izikgo, @​wangrui6, @​kebe7jun, @​rishitdholakia13, @​j9smith, @​meena-at-work, @​dllehr-amd, @​alexeldeib, @​sonusflow, @​lucianommartins, @​AAISSJ, @​DaoyuanLi2816, @​zexplorerhj, @​zhangxin81, @​velonica0, @​fuscof-ibm, @​anishesg, @​zhengluo-nv, @​ylangtsou, @​fangyuchu, @​zx3xyy, @​simondanielsson, @​ruizhang99, @​zixi-qi, @​xwu-intel, @​yufufi, @​wdhongtw, @​mrjunwan-lang, @​wangxiyuan, @​wasnertobias, @​ilmarkov, @​sychen52, @​zhandaz, @​russellb, @​SandishKumarHN, @​juhi10071998, @​itayalroy, @​djmmoss, @​SumanthRH, @​mayuyuace, @​zhougit86, @​meenchen, @​lucifer1004, @​popkart-EZ, @​jzakrzew, @​ffggs, @​huanghua1994, @​orozery, @​danisereb, @​rshavitt, @​Yihuki, @​QingZhou-YangHY, @​Jie-Fang, @​bbartels

New Contributors


Configuration

📅 Schedule: (in timezone Europe/Zurich)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Enabled.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate Bot added the renovate label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants