Fix TypeError when disabling on-device sampling by tmuttaki · Pull Request #28 · vllm-project/vllm-neuron

tmuttaki · 2026-04-03T21:01:23Z

Summary

Fixed incorrect import of Sampler class that caused a TypeError when on-device sampling was disabled.

Fixes #29

Problem

When NEURON_ON_DEVICE_SAMPLING_DISABLED=1 environment variable is set or when on_device_sampling_config is explicitly set to None in the Neuron config overrides, the server fails to start with:

TypeError: 'module' object is not callable
  File "vllm_neuron/worker/neuronx_distributed_model_loader.py", line 81, in __init__
    self.sampler = Sampler()

Root Cause

Line 54 in vllm_neuron/worker/neuronx_distributed_model_loader.py was importing the sampler module instead of the Sampler class:

# Before (buggy):
from vllm.v1.sample import sampler as Sampler

This imported the module object, which cannot be called as a constructor.

Solution

Changed the import to correctly import the Sampler class:

# After (fixed):
from vllm.v1.sample.sampler import Sampler

Testing

Tested on AWS Trainium (trn1.2xlarge) with TinyLlama-1.1B-Chat-v1.0:

Server startup: Successfully starts with NEURON_ON_DEVICE_SAMPLING_DISABLED=1
CPU sampling: Confirmed via logs: "CPU sampling enabled: on_device_sampling_config is None"
Structured outputs: Successfully tested with response_format parameter using JSON schemas
Generation quality: Validated JSON output conforms to provided schemas

Example structured output test:

curl http://localhost:8009/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "prompt": "Generate a person profile with name, age, and city",
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person_profile",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "city": {"type": "string"}
          },
          "required": ["name", "age", "city"]
        }
      }
    }
  }'

Result:

{
  "name": "John Doe",
  "age": 30,
  "city": "New York"
}

Impact

Enables CPU sampling mode: Required for structured outputs and grammar-constrained generation
No breaking changes: Only affects code path when on-device sampling is explicitly disabled
Backward compatible: Default behavior (on-device sampling enabled) is unchanged

Related Issues

This fix enables users to:

Use structured outputs via response_format parameter
Apply grammar constraints that aren't supported by on-device sampling
Access all vLLM sampling features beyond temperature, top_k, and top_p

Fixed incorrect import of Sampler class that caused a TypeError when NEURON_ON_DEVICE_SAMPLING_DISABLED=1 was set or when on_device_sampling_config was explicitly set to None. The bug occurred because the code was importing the sampler module instead of the Sampler class: - Before: from vllm.v1.sample import sampler as Sampler - After: from vllm.v1.sample.sampler import Sampler This caused a "TypeError: 'module' object is not callable" error when trying to instantiate the sampler at line 81. This fix enables CPU sampling mode, which is required for structured outputs and grammar-constrained generation that are not supported by on-device sampling. Tested on AWS Trainium (trn1.2xlarge) with TinyLlama-1.1B-Chat-v1.0 using structured output via response_format parameter. Signed-off-by: Tahmid Muttaki <tmuttaki@redhat.com>

tmuttaki · 2026-04-03T21:05:09Z

Fixes #29

tmuttaki force-pushed the fix/cpu-sampling-import-error branch from 4daa543 to 0d6b1cb Compare April 3, 2026 21:03

tmuttaki mentioned this pull request Apr 3, 2026

TypeError when disabling on-device sampling prevents server startup #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TypeError when disabling on-device sampling#28

Fix TypeError when disabling on-device sampling#28
tmuttaki wants to merge 1 commit intovllm-project:release-0.4.1from
tmuttaki:fix/cpu-sampling-import-error

tmuttaki commented Apr 3, 2026 •

edited

Loading

Uh oh!

tmuttaki commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmuttaki commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause

Solution

Testing

Impact

Related Issues

Uh oh!

tmuttaki commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmuttaki commented Apr 3, 2026 •

edited

Loading