Update vllm/vllm-openai Docker tag to v0.22.0#79
Open
renovate[bot] wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v0.21.0→v0.22.0v0.23.0(+1)Release Notes
vllm-project/vllm (vllm/vllm-openai)
v0.22.0Compare Source
Highlights
This release features 459 commits from 230 contributors (63 new)!
vllm/models/deepseek_v4/package (#43004, #43039, #43073, #43077, #43149), gained NVFP4 fused MoE support (#42209), full + piecewise CUDA graph (#42604), and MTP speculative decoding (#43385). A large set of fused kernels (MegaMoE,mhc, Q-norm, indexer, sparse MLA) and ROCm parity fixes landed alongside accuracy fixes (#42810, #43710).update_config(#42783), and shared KV-cache layers (#35045), plus many correctness fixes.Model Support
hf_overridesdocs (#42163); EXAONE-4.5 aligned with Transformers update (#42246).extract_hidden_states(#39949), non-MTP speculation for NemotronH (#43130), shared MTP weights in MRv2 (#42538).mm_projectordtype fix (#42081).anyOf/oneOf/$refresolution re-land (#37831), sharedcoerce_to_schema_typeacross MiniMax-M2 / DeepSeek-V3.2 / Seed-OSS parsers (#43006, #43019, #43140).Engine Core
update_config(#42783), shared KV-cache layers (#35045), FP32 gumbel sampling (#41775), auto-fallback to MRv1 with connectors (#42955),logprob_token_idscorrectness (#43125, #41761), prompt-logprobs size fix (#42778).reset_cache()(#41956), per-request tracking (#42507), store-deferral fix (#41945).ExpertMapManager(#41046), experts moved toexperts/(#42334),RoutedExpertsalias for FusedMoE (#40735), EPLB refactoring for FusedMoE (#41055).Hardware & Performance
head_dim=512for FlashInfer TRTLLM attention (#38822), FlashInfer Blackwell GDN prefill (#40717), GDN prefill kernel for SM100 (#43273).compute_prefill_context/_v_up_projoptimizations (#42460, #42561), penalties Triton kernel (#40657),do_not_specializein fused FP8 RoPE (#42849), FULL CUDA graph capture for TRITON_MLA decode (#42885).--cpu-distributed-timeout-seconds(#42968).Large Scale Serving
X-data-parallel-rankheader (#42330).torch.accelerator.synchronize()(#40733).Quantization
quantization_configto useQuantKeywith activation override (#41566), MoE W4A8 CT migrated to oracle (#42680), AWQ Marlin MoE onto modular WNA16 oracle (#42483), GPTQ consolidation (gptq_marlin→auto_gptq) (#38288).API & Frontend
AuthenticationMiddlewarepath extraction (#43426).chat_template_kwargssupport (#42272), message-merging fix (#42189), empty channel/recipient harmony fix (#35540).thinking_token_budgetsupport (#42116) with inverted-condition fix (#41674); mapreasoning_efforttoenable_thinking(#43401).reasoning_content→reasoning(#42664), reworked fastokens integration (#43168), consolidated Speech-to-Text entrypoints (#42370, #42274), beam-search consolidation viaBeamSearchMixin(#42946), score/rerank chat-template instructions (#42412)./v2endpoints (#42594).PoolingOfflineMixin(#42267), split offline inference APIs/utils (#43553).Build & Dependencies
manylinux_2_28base (#41668).nvidia-cutlass-dslto 4.5.2 (#42991, #43230, #43745); llguidance to 1.7 (#42150);triton_kernelsdowngraded to v3.5.1 for gpt-oss (#43135).setuptools-rustdependency (#43287, #43377), pinnedprotocin rust-build stages (#43292).vllm-openaitarget (#40275), buildmooncake-transfer-enginefrom source (#42114), AINIC & Thor NIC support (#40453); Python-only installation made optional (#42293).hummingMoE backend dependency added, reverted, then restored with CuPy runtime fix (#42540, #43492, #43530).Deprecations & Removals
get_tokenizerandresolve_hf_chat_template(#35024).--moe-backend/--linear-backend(#43148).Contributors
@yewentao256, @haosdent, @njhill, @mgoin, @jeejeelee, @AndreasKaratzas, @NickLucche, @sfeng33, @noooop, @WoosukKwon, @khluu, @taneem-ibrahim, @Dao007forever, @vadiklyutiy, @bnellnm, @ivanium, @tjtanaa, @mmangkad, @hmellor, @DarkLight1337, @hickeyma, @zhenwei-intel, @jikunshang, @ronensc, @benchislett, @hao-aaron, @arpera, @zyongye, @gau-nernst, @frida-andersson, @ZhanqiuHu, @cleonard530, @akii96, @bedeks, @Isotr0py, @JasonKeyiL, @bigPYJ1151, @zhewenl, @weizhoublue, @zxd1997066, @gnovack, @chaojun-zhang, @majian4work, @chaunceyjiang, @pschlan-amd, @amitz-nv, @yma11, @dsikka, @tc-mb, @shanjiaz, @jperezdealgaba, @yzong-rh, @viktorpusTT, @TheEpicDolphin, @MatthewBonanni, @shen-shanshan, @hallerite, @zufangzhu, @bbrowning, @divakar-amd, @ianliuy, @esmeetu, @rasmith, @louie-tsai, @pmaybank, @liulanze, @ZJY0516, @TheDuyIT, @wzhao18, @jinzhen-lin, @BugenZhao, @ashwing, @fuergaosi233, @hqhq1025, @shaharmor98, @pisceskkk, @lkm2835, @noa-neria, @Rohan138, @whx-sjtu, @vrdn-23, @alexagriffith, @Flink-ddd, @jeffreywang-anyscale, @skyloevil, @ymoslem, @Lucaskabela, @kg6-sleipnir, @woernfl, @tdoublep, @GOavi101, @jmamou, @PeaBrane, @KaivalyaMDabhadkar, @BWAAEEEK, @MrZ20, @afierka-intel, @JoursBleu, @hissu-hyvarinen, @mwawrzos, @CynicDora, @NoeliaBentancor, @johncalesp, @fynnsu, @fxmarty-amd, @walterbm, @liangel-02, @lgeiger, @he-yufeng, @abinggo, @KrxGu, @hks-9697-v2, @Sarah-Salah, @rebklee, @aoshen02, @haic0, @libinta, @Zhenzhong1, @xhx1022, @b-mu, @WindChimeRan, @tpopp, @charlifu, @chengyinie, @ricky-chaoju, @lyd1992, @daniel-devlab, @paulyu12, @bobofang11235, @laudney, @BadrBasowid, @maeehart, @PatchouliTIS, @chunxiaozheng, @blake-snc, @southfreebird, @rbrugaro-amd, @rasdani, @dusthunter, @qizzzh, @ProExpertProg, @qianlihuang, @alec-flowers, @JisoLya, @gaozihao-shy, @rishaps, @xyang16, @wendyliu235, @hlin99, @tianmu-li, @yuwenzho, @inisis, @kfirtoledo, @roikoren755, @liranschour, @vllm-agent, @blancsw, @netanel-haber, @BowenBao, @czhu-cohere, @amitport, @tuukkjs, @revit13, @ofirzaf, @qyYue1389, @junyanxu, @gracie-guo, @sagearc, @xinyu-intel, @yiwen101, @DomBrown, @tomeras91, @Dogacel, @maxdebayser, @fadara01, @Terrencezzj, @izikgo, @wangrui6, @kebe7jun, @rishitdholakia13, @j9smith, @meena-at-work, @dllehr-amd, @alexeldeib, @sonusflow, @lucianommartins, @AAISSJ, @DaoyuanLi2816, @zexplorerhj, @zhangxin81, @velonica0, @fuscof-ibm, @anishesg, @zhengluo-nv, @ylangtsou, @fangyuchu, @zx3xyy, @simondanielsson, @ruizhang99, @zixi-qi, @xwu-intel, @yufufi, @wdhongtw, @mrjunwan-lang, @wangxiyuan, @wasnertobias, @ilmarkov, @sychen52, @zhandaz, @russellb, @SandishKumarHN, @juhi10071998, @itayalroy, @djmmoss, @SumanthRH, @mayuyuace, @zhougit86, @meenchen, @lucifer1004, @popkart-EZ, @jzakrzew, @ffggs, @huanghua1994, @orozery, @danisereb, @rshavitt, @Yihuki, @QingZhou-YangHY, @Jie-Fang, @bbartels
New Contributors
Configuration
📅 Schedule: (in timezone Europe/Zurich)
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.