Skip to content

[Bug]: [NCC_INLA001] neuronx-cc fails to compile Llama-3.3-70B context_encoding_model with NxDI 0.7 — "type must be boolean, but is null" #16

@EzioEzi0

Description

@EzioEzi0

Your current environment

bug Description:

neuronx-cc fails to compile context_encoding_model HLO for Llama-3.3-70B-Instruct when using vllm-neuron 0.3.0 + NxDI 0.7. The token_generation_model compiles successfully, but multiple context_encoding_model buckets fail with an internal error.

Error:

[INTERNAL_ERROR] [NCC_INLA001] Unhandled exception with message:
[json.exception.type_error.302] type must be boolean, but is null
What I've tried (all fail with the same error):

TP=32, max_model_len=8192
TP=32, max_model_len=4096
TP=16, max_model_len=4096
The bug is in the context_encoding_model HLO compilation path — it's not sensitive to TP size or sequence length. The token_generation_model always compiles fine.

Reproduction:

from vllm import LLM, SamplingParams

llm = LLM(
model="meta-llama/Llama-3.3-70B-Instruct",
max_num_seqs=4,
max_model_len=4096,
block_size=32,
num_gpu_blocks_override=1024,
tensor_parallel_size=32,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(temperature=0.8))

export NEURON_RT_VISIBLE_CORES=0-31
export VLLM_PLUGINS=neuron
python test.py
Failed compiler invocation (from logs):

neuronx-cc compile --framework=XLA
/tmp/nxd_model/context_encoding_model/tp0_bk18/model.MODULE*.hlo_module.pb
--target=trn1 --auto-cast=none --model-type=transformer
--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma
--lnc=1 -O1
--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true
Failed buckets include: _tp0_bk16, _tp0_bk17, _tp0_bk18, _tp0_bk19, _tp0_bk25–_tp0_bk29 depending on configuration.

The output of python collect_env.py

Python: 3.12.3
PyTorch: 2.9.0+cu128
OS: Linux-6.17.0-1007-aws-x86_64-with-glibc2.39

Instance Type

trn1.32xlarge

Python Environment (pip list | grep -E "torch|neuron|nki|vllm|nxdi|nixl")

libneuronxla 2.2.14584.0+06ac23d1
neuronx-cc 2.22.12471.0+b4a00d10
neuronx-distributed 0.16.25997+f431c02e
neuronx-distributed-inference 0.7.15063+bafa28d5
optimum-neuron 0.4.3
torch 2.9.0
torch-neuronx 2.9.0.2.11.19912+e48cd891
torch-xla 2.9.0
torchaudio 2.9.0
torchvision 0.24.0
vllm 0.13.0
vllm-neuron 0.3.0

🐛 Describe the bug

neuronx-cc fails to compile context_encoding_model HLO for Llama-3.3-70B-Instruct using vllm-neuron 0.3.0 + NxDI 0.7 on trn1.32xlarge. The token_generation_model compiles fine — all buckets pass. But context_encoding_model consistently fails on several buckets with:

[INTERNAL_ERROR] [NCC_INLA001] Unhandled exception with message: 
[json.exception.type_error.302] type must be boolean, but is null

Tried TP=32/16, max_model_len=8192/4096 — same error every time. The bug is in the context_encoding HLO compilation path, not sensitive to TP size or sequence length.

# Minimal reproduction
from vllm import LLM, SamplingParams

# Run with: NEURON_RT_VISIBLE_CORES=0-31 VLLM_PLUGINS=neuron python test.py
llm = LLM(
    model="meta-llama/Llama-3.3-70B-Instruct",
    max_num_seqs=4,
    max_model_len=4096,
    block_size=32,
    num_gpu_blocks_override=1024,
    tensor_parallel_size=32,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(temperature=0.8))

Failed compiler invocation from logs:

neuronx-cc compile --framework=XLA \
  /tmp/nxd_model/context_encoding_model/_tp0_bk18/model.MODULE_*.hlo_module.pb \
  --target=trn1 --auto-cast=none --model-type=transformer \
  --tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma \
  --lnc=1 -O1 \
  --internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true

Failed buckets: _tp0_bk16_tp0_bk19, _tp0_bk25_tp0_bk29 depending on config.

Python: 3.12.3
PyTorch: 2.9.0+cu128
OS: Linux-6.17.0-1007-aws-x86_64-with-glibc2.39

trn1.32xlarge

libneuronxla 2.2.14584.0+06ac23d1
neuronx-cc 2.22.12471.0+b4a00d10
neuronx-distributed 0.16.25997+f431c02e
neuronx-distributed-inference 0.7.15063+bafa28d5
optimum-neuron 0.4.3
torch 2.9.0
torch-neuronx 2.9.0.2.11.19912+e48cd891
torch-xla 2.9.0
torchaudio 2.9.0
torchvision 0.24.0
vllm 0.13.0
vllm-neuron 0.3.0

Before submitting a new issue...

  • Make sure you already searched for relevant issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions