[vllm] fix: reset all caches after weight updates#6522
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a clear_all_caches method in vllm_async_server.py to clear KV, multi-modal, and encoder caches, and updates vllm_rollout.py to call this new method instead of only clearing the KV cache after updating weights. Feedback was provided to replace fragile hardcoded vLLM version checks with dynamic hasattr checks to verify the presence of the cache-clearing methods on the engine.
| if self.node_rank == 0: | ||
| if _VLLM_VERSION >= version.parse("0.9.0"): | ||
| await self.engine.reset_mm_cache() | ||
| if _VLLM_VERSION >= version.parse("0.16.0"): | ||
| await self.engine.reset_encoder_cache() |
There was a problem hiding this comment.
Using hardcoded version checks like _VLLM_VERSION >= version.parse('0.16.0') is fragile and can lead to bugs. For instance, current vLLM releases are in the 0.7.x range, meaning the check for 0.16.0 will evaluate to False and prevent reset_encoder_cache() from being called even if the installed vLLM version supports it.
A much more robust and idiomatic approach is to dynamically check for the presence of these methods on the engine using hasattr(). This ensures compatibility across different vLLM versions, custom forks, or backported features without relying on hardcoded version strings.
| if self.node_rank == 0: | |
| if _VLLM_VERSION >= version.parse("0.9.0"): | |
| await self.engine.reset_mm_cache() | |
| if _VLLM_VERSION >= version.parse("0.16.0"): | |
| await self.engine.reset_encoder_cache() | |
| if self.node_rank == 0: | |
| if hasattr(self.engine, 'reset_mm_cache'): | |
| await self.engine.reset_mm_cache() | |
| if hasattr(self.engine, 'reset_encoder_cache'): | |
| await self.engine.reset_encoder_cache() |
What does this PR do?
This change adds a
clear_all_cacheshelper on the vLLM HTTP server and calls it after weight updates.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.