Commit 4daa543
committed
Fix TypeError when disabling on-device sampling
Fixed incorrect import of Sampler class that caused a TypeError when
NEURON_ON_DEVICE_SAMPLING_DISABLED=1 was set or when on_device_sampling_config
was explicitly set to None.
The bug occurred because the code was importing the sampler module instead of
the Sampler class:
- Before: from vllm.v1.sample import sampler as Sampler
- After: from vllm.v1.sample.sampler import Sampler
This caused a "TypeError: 'module' object is not callable" error when trying
to instantiate the sampler at line 81.
This fix enables CPU sampling mode, which is required for structured outputs
and grammar-constrained generation that are not supported by on-device sampling.
Tested on AWS Trainium (trn1.2xlarge) with TinyLlama-1.1B-Chat-v1.0 using
structured output via response_format parameter.1 parent 4fac6b7 commit 4daa543
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
0 commit comments