Hi QeRL team, thanks for the amazing work!
I am currently running RL training using QeRL with two types of models:
- A BF16 model (e.g., Qwen2.5-3B-Instruct)
- An NVFP4 weight-only quantized model (Qwen2.5-3B-NVFP4)
While BF16 models work perfectly with PEFT LoRA (via LoraConfig + prepare_model_for_kbit_training),
the NVFP4 model crashes during PEFT injection because CompressedLinear modules do not have .weight:
AttributeError: 'CompressedLinear' object has no attribute 'weight'
I also noticed that NVFP4 models store weights as:
- weight_packed
- weight_scale
- weight_global_scale
instead of the usual weight tensor.
This leads to the following question:
❓ Question
Does QeRL intentionally use different LoRA mechanisms for BF16 vs NVFP4 models?
My current understanding:
• BF16 / FP16 models
Use the standard PEFT LoRA (weight injection into PyTorch Linear layers).
• NVFP4 models
Must rely on vLLM LoRA adapter, because NVFP4’s CompressedLinear has no .weight and PEFT cannot add LoRA matrices to the compressed FP4 format.
Error Message of bash dapo_qwen2.5-3b_nvfp4_single_gpu.sh:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/at0839/zonghan.ai12/Joe/QeRL/qerl.py", line 105, in <module>
[rank0]: main(data_args, training_args, model_args)
[rank0]: File "/home/at0839/zonghan.ai12/Joe/QeRL/qerl.py", line 88, in main
[rank0]: trainer = GRPOTrainer(
[rank0]: File "/home/at0839/zonghan.ai12/Joe/QeRL/trl_trainer/grpo_trainer.py", line 572, in __init__
[rank0]: model = get_peft_model(model, peft_config)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/mapping_func.py", line 114, in get_peft_model
[rank0]: return PeftModel(
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/peft_model.py", line 129, in __init__
[rank0]: self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 295, in __init__
[rank0]: self.inject_adapter(self.model, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage, state_dict=state_dict)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 801, in inject_adapter
[rank0]: self._create_and_replace(
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 249, in _create_and_replace
[rank0]: new_module = self._create_new_module(lora_config, adapter_name, target, device_map=device_map, **kwargs)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 336, in _create_new_module
[rank0]: new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 2282, in dispatch_default
[rank0]: new_module = Linear(target, adapter_name, **kwargs)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 619, in __init__
[rank0]: LoraLayer.__init__(self, base_layer, **kwargs)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 126, in __init__
[rank0]: in_features, out_features = _get_in_out_features(base_layer)
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in _get_in_out_features
[rank0]: if torch_supports_dtensor and isinstance(module.weight, torch.distributed.tensor.DTensor):
[rank0]: File "/home/at0839/zonghan.ai12/.conda/envs/qerl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
[rank0]: raise AttributeError(
[rank0]: AttributeError: 'CompressedLinear' object has no attribute 'weight'
Hi QeRL team, thanks for the amazing work!
I am currently running RL training using QeRL with two types of models:
While BF16 models work perfectly with PEFT LoRA (via LoraConfig + prepare_model_for_kbit_training),
the NVFP4 model crashes during PEFT injection because CompressedLinear modules do not have
.weight:I also noticed that NVFP4 models store weights as:
instead of the usual
weighttensor.This leads to the following question:
❓ Question
Does QeRL intentionally use different LoRA mechanisms for BF16 vs NVFP4 models?
My current understanding:
• BF16 / FP16 models
Use the standard PEFT LoRA (weight injection into PyTorch Linear layers).
• NVFP4 models
Must rely on vLLM LoRA adapter, because NVFP4’s
CompressedLinearhas no.weightand PEFT cannot add LoRA matrices to the compressed FP4 format.Error Message of
bash dapo_qwen2.5-3b_nvfp4_single_gpu.sh: