[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779
[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779
Conversation
Signed-off-by: Meng Xin <mxin@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR adds an optional Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #779 +/- ##
==========================================
- Coverage 73.09% 73.02% -0.07%
==========================================
Files 205 205
Lines 22301 22324 +23
==========================================
+ Hits 16300 16302 +2
- Misses 6001 6022 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt/torch/quantization/plugins/vllm.py (1)
231-248:⚠️ Potential issue | 🟠 Major
keep_attrsis accepted but ignored in vLLM fold path.Line 231 adds
keep_attrs, but the method never branches on it. This makes the new flag ineffective for these modules and inconsistent with the base fold behavior.💡 Proposed fix
`@torch.no_grad`() def fold_weight(self, keep_attrs: bool = False): @@ self.w13_weight_quantizer.disable() @@ self.w2_weight_quantizer.disable() + + if not keep_attrs: + for quantizer in (self.w13_weight_quantizer, self.w2_weight_quantizer): + for attr_name in ("_pre_quant_scale", "_amax"): + if hasattr(quantizer, attr_name): + delattr(quantizer, attr_name) torch.cuda.empty_cache()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/plugins/vllm.py` around lines 231 - 248, The fold_weight method currently ignores the keep_attrs flag; update fold_weight in vLLM plugin to honor keep_attrs by performing folding loop as is but only disabling and removing quantizer attributes when keep_attrs is False: after folding each weight (self.w13_weight and self.w2_weight using self.w13_weight_quantizer and self.w2_weight_quantizer), if not keep_attrs call .disable() and then delete or set to None the corresponding quantizer attributes (self.w13_weight_quantizer, self.w2_weight_quantizer) so the behavior matches the base fold_weight semantics; if keep_attrs is True, do not disable or remove those attributes. Ensure torch.cuda.empty_cache() remains after folding.modelopt/torch/utils/dataset_utils.py (1)
165-176:⚠️ Potential issue | 🟠 Major
.jsonl.gzis documented but not actually supported.Line 165 documents
.jsonl.gz, but Line 175 only checks.jsonl. Compressed JSONL inputs will bypass the new code path and fail as unsupported dataset names.💡 Proposed fix
@@ import copy +import gzip import json @@ -def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]: +def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]: @@ - with open(jsonl_path, encoding="utf-8") as f: + open_fn = gzip.open if jsonl_path.endswith(".gz") else open + with open_fn(jsonl_path, mode="rt", encoding="utf-8") as f: @@ - if dataset_name.endswith(".jsonl"): + if dataset_name.endswith((".jsonl", ".jsonl.gz")): return _get_jsonl_text_samples(dataset_name, num_samples)Also applies to: 261-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/utils/dataset_utils.py` around lines 165 - 176, The code documents support for ".jsonl.gz" but only checks for ".jsonl"; update the dataset_name checks (the condition that now calls _get_jsonl_text_samples) to also detect ".jsonl.gz" (e.g., endswith(".jsonl") or endswith(".jsonl.gz")) and ensure _get_jsonl_text_samples can read gzipped files (or add a gzip-aware wrapper) so compressed JSONL is handled; apply the same fix to the other occurrence that currently only checks ".jsonl" (the block around where the dataset_name is re-evaluated later).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/torch/quantization/plugins/vllm.py`:
- Around line 231-248: The fold_weight method currently ignores the keep_attrs
flag; update fold_weight in vLLM plugin to honor keep_attrs by performing
folding loop as is but only disabling and removing quantizer attributes when
keep_attrs is False: after folding each weight (self.w13_weight and
self.w2_weight using self.w13_weight_quantizer and self.w2_weight_quantizer), if
not keep_attrs call .disable() and then delete or set to None the corresponding
quantizer attributes (self.w13_weight_quantizer, self.w2_weight_quantizer) so
the behavior matches the base fold_weight semantics; if keep_attrs is True, do
not disable or remove those attributes. Ensure torch.cuda.empty_cache() remains
after folding.
In `@modelopt/torch/utils/dataset_utils.py`:
- Around line 165-176: The code documents support for ".jsonl.gz" but only
checks for ".jsonl"; update the dataset_name checks (the condition that now
calls _get_jsonl_text_samples) to also detect ".jsonl.gz" (e.g.,
endswith(".jsonl") or endswith(".jsonl.gz")) and ensure _get_jsonl_text_samples
can read gzipped files (or add a gzip-aware wrapper) so compressed JSONL is
handled; apply the same fix to the other occurrence that currently only checks
".jsonl" (the block around where the dataset_name is re-evaluated later).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
modelopt/torch/opt/plugins/megatron.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/modules/quant_linear.pymodelopt/torch/quantization/nn/modules/quant_module.pymodelopt/torch/quantization/plugins/huggingface.pymodelopt/torch/quantization/plugins/vllm.pymodelopt/torch/utils/dataset_utils.pymodelopt/torch/utils/plugins/megatron_generate.py
What does this PR do?
Type of change: improvement
Overview:
Usage
mtq.fold_weight(keep_attrs=True)will keep quantizer attrs after folding weight,Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes