-
Notifications
You must be signed in to change notification settings - Fork 11
[Bug]: [NCC_INLA001] neuronx-cc fails to compile Llama-3.3-70B context_encoding_model with NxDI 0.7 — "type must be boolean, but is null" #16
Description
Your current environment
bug Description:
neuronx-cc fails to compile context_encoding_model HLO for Llama-3.3-70B-Instruct when using vllm-neuron 0.3.0 + NxDI 0.7. The token_generation_model compiles successfully, but multiple context_encoding_model buckets fail with an internal error.
Error:
[INTERNAL_ERROR] [NCC_INLA001] Unhandled exception with message:
[json.exception.type_error.302] type must be boolean, but is null
What I've tried (all fail with the same error):
TP=32, max_model_len=8192
TP=32, max_model_len=4096
TP=16, max_model_len=4096
The bug is in the context_encoding_model HLO compilation path — it's not sensitive to TP size or sequence length. The token_generation_model always compiles fine.
Reproduction:
from vllm import LLM, SamplingParams
llm = LLM(
model="meta-llama/Llama-3.3-70B-Instruct",
max_num_seqs=4,
max_model_len=4096,
block_size=32,
num_gpu_blocks_override=1024,
tensor_parallel_size=32,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(temperature=0.8))
export NEURON_RT_VISIBLE_CORES=0-31
export VLLM_PLUGINS=neuron
python test.py
Failed compiler invocation (from logs):
neuronx-cc compile --framework=XLA
/tmp/nxd_model/context_encoding_model/tp0_bk18/model.MODULE*.hlo_module.pb
--target=trn1 --auto-cast=none --model-type=transformer
--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma
--lnc=1 -O1
--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true
Failed buckets include: _tp0_bk16, _tp0_bk17, _tp0_bk18, _tp0_bk19, _tp0_bk25–_tp0_bk29 depending on configuration.
The output of python collect_env.py
Python: 3.12.3
PyTorch: 2.9.0+cu128
OS: Linux-6.17.0-1007-aws-x86_64-with-glibc2.39
Instance Type
trn1.32xlarge
Python Environment (pip list | grep -E "torch|neuron|nki|vllm|nxdi|nixl")
libneuronxla 2.2.14584.0+06ac23d1
neuronx-cc 2.22.12471.0+b4a00d10
neuronx-distributed 0.16.25997+f431c02e
neuronx-distributed-inference 0.7.15063+bafa28d5
optimum-neuron 0.4.3
torch 2.9.0
torch-neuronx 2.9.0.2.11.19912+e48cd891
torch-xla 2.9.0
torchaudio 2.9.0
torchvision 0.24.0
vllm 0.13.0
vllm-neuron 0.3.0
🐛 Describe the bug
neuronx-cc fails to compile context_encoding_model HLO for Llama-3.3-70B-Instruct using vllm-neuron 0.3.0 + NxDI 0.7 on trn1.32xlarge. The token_generation_model compiles fine — all buckets pass. But context_encoding_model consistently fails on several buckets with:
[INTERNAL_ERROR] [NCC_INLA001] Unhandled exception with message:
[json.exception.type_error.302] type must be boolean, but is null
Tried TP=32/16, max_model_len=8192/4096 — same error every time. The bug is in the context_encoding HLO compilation path, not sensitive to TP size or sequence length.
# Minimal reproduction
from vllm import LLM, SamplingParams
# Run with: NEURON_RT_VISIBLE_CORES=0-31 VLLM_PLUGINS=neuron python test.py
llm = LLM(
model="meta-llama/Llama-3.3-70B-Instruct",
max_num_seqs=4,
max_model_len=4096,
block_size=32,
num_gpu_blocks_override=1024,
tensor_parallel_size=32,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(temperature=0.8))Failed compiler invocation from logs:
neuronx-cc compile --framework=XLA \
/tmp/nxd_model/context_encoding_model/_tp0_bk18/model.MODULE_*.hlo_module.pb \
--target=trn1 --auto-cast=none --model-type=transformer \
--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma \
--lnc=1 -O1 \
--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true
Failed buckets: _tp0_bk16–_tp0_bk19, _tp0_bk25–_tp0_bk29 depending on config.
Python: 3.12.3
PyTorch: 2.9.0+cu128
OS: Linux-6.17.0-1007-aws-x86_64-with-glibc2.39
trn1.32xlarge
libneuronxla 2.2.14584.0+06ac23d1
neuronx-cc 2.22.12471.0+b4a00d10
neuronx-distributed 0.16.25997+f431c02e
neuronx-distributed-inference 0.7.15063+bafa28d5
optimum-neuron 0.4.3
torch 2.9.0
torch-neuronx 2.9.0.2.11.19912+e48cd891
torch-xla 2.9.0
torchaudio 2.9.0
torchvision 0.24.0
vllm 0.13.0
vllm-neuron 0.3.0
Before submitting a new issue...
- Make sure you already searched for relevant issues.