Fix TypeError when disabling on-device sampling

tmuttaki · tmuttaki · commit 4daa5436ec30 · 2026-04-03T16:59:55.000-04:00
Fixed incorrect import of Sampler class that caused a TypeError when
NEURON_ON_DEVICE_SAMPLING_DISABLED=1 was set or when on_device_sampling_config
was explicitly set to None.

The bug occurred because the code was importing the sampler module instead of
the Sampler class:
  - Before: from vllm.v1.sample import sampler as Sampler
  - After: from vllm.v1.sample.sampler import Sampler

This caused a "TypeError: 'module' object is not callable" error when trying
to instantiate the sampler at line 81.

This fix enables CPU sampling mode, which is required for structured outputs
and grammar-constrained generation that are not supported by on-device sampling.

Tested on AWS Trainium (trn1.2xlarge) with TinyLlama-1.1B-Chat-v1.0 using
structured output via response_format parameter.
diff --git a/vllm_neuron/worker/neuronx_distributed_model_loader.py b/vllm_neuron/worker/neuronx_distributed_model_loader.py
@@ -51,7 +51,7 @@
 )
 from vllm.model_executor.layers.logits_processor import LogitsProcessor
 from vllm.v1.outputs import SamplerOutput
-from vllm.v1.sample import sampler as Sampler
+from vllm.v1.sample.sampler import Sampler
 
 from vllm_neuron.worker.constants import (
     NEURON_MULTI_MODAL_MODELS,

Original file line number	Diff line number	Diff line change
`@@ -51,7 +51,7 @@`
`51`	`51`	`)`
`52`	`52`	`from vllm.model_executor.layers.logits_processor import LogitsProcessor`
`53`	`53`	`from vllm.v1.outputs import SamplerOutput`
`54`		`-from vllm.v1.sample import sampler as Sampler`
	`54`	`+from vllm.v1.sample.sampler import Sampler`
`55`	`55`
`56`	`56`	`from vllm_neuron.worker.constants import (`
`57`	`57`	`NEURON_MULTI_MODAL_MODELS,`