Skip to content

Commit 4daa543

Browse files
committed
Fix TypeError when disabling on-device sampling
Fixed incorrect import of Sampler class that caused a TypeError when NEURON_ON_DEVICE_SAMPLING_DISABLED=1 was set or when on_device_sampling_config was explicitly set to None. The bug occurred because the code was importing the sampler module instead of the Sampler class: - Before: from vllm.v1.sample import sampler as Sampler - After: from vllm.v1.sample.sampler import Sampler This caused a "TypeError: 'module' object is not callable" error when trying to instantiate the sampler at line 81. This fix enables CPU sampling mode, which is required for structured outputs and grammar-constrained generation that are not supported by on-device sampling. Tested on AWS Trainium (trn1.2xlarge) with TinyLlama-1.1B-Chat-v1.0 using structured output via response_format parameter.
1 parent 4fac6b7 commit 4daa543

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

vllm_neuron/worker/neuronx_distributed_model_loader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
)
5252
from vllm.model_executor.layers.logits_processor import LogitsProcessor
5353
from vllm.v1.outputs import SamplerOutput
54-
from vllm.v1.sample import sampler as Sampler
54+
from vllm.v1.sample.sampler import Sampler
5555

5656
from vllm_neuron.worker.constants import (
5757
NEURON_MULTI_MODAL_MODELS,

0 commit comments

Comments
 (0)