System Info
We've noticed that when there's a mismatch between type of the lora_plugin while building the engine and the type used for the storage-type when calling hf_lora_convert, the LoRa weights are not applied at all and we get the base model response, even by passing in the correct lora_task_id. This happens without any warnings or errors, which makes it hard to know what the issue is.
Example:
trtllm-build \
--checkpoint_dir ${UNIFIED_CKPT_PATH} \
--output_dir ${ENGINE_PATH} \
--lora_plugin bfloat16
and
python3 tensorrt_llm/examples/hf_lora_convert.py -i ${ENGINE_PATH}/lora/0 -o tmp/lora_prefetch/1 --storage-type float16
will always lead to base model response during inference.
However, switching the build lora_plugin to either auto or float16 returns the right response.
Who can help?
No response
Information
Tasks
Reproduction
- run trt-llm build with
lora_plugin and hf_lora_convert with different dtypes
Expected behavior
Warning or error if LoRa doesn't work due to this mismatch
actual behavior
fails silently
additional notes
We used the llama3 example
System Info
We've noticed that when there's a mismatch between type of the
lora_pluginwhile building the engine and the type used for thestorage-typewhen callinghf_lora_convert, the LoRa weights are not applied at all and we get the base model response, even by passing in the correctlora_task_id. This happens without any warnings or errors, which makes it hard to know what the issue is.Example:
and
will always lead to base model response during inference.
However, switching the build lora_plugin to either
autoorfloat16returns the right response.Who can help?
No response
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
lora_pluginandhf_lora_convertwith different dtypesExpected behavior
Warning or error if LoRa doesn't work due to this mismatch
actual behavior
fails silently
additional notes
We used the llama3 example