Skip to content

Comments

[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779

Open
mxinO wants to merge 5 commits intomainfrom
mxin/rl-patches-2
Open

[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779
mxinO wants to merge 5 commits intomainfrom
mxin/rl-patches-2

Conversation

@mxinO
Copy link
Contributor

@mxinO mxinO commented Jan 14, 2026

What does this PR do?

Type of change: improvement

Overview:

  • For Quantization aware reinforcement learning, after folding weight of rollout, we want to keep the quantization attrs for next step.
  • Minor fix for empty extra state
  • Support getting dataloader from jsonl file, useful for using training data as calibration data. I can separate this to another PR if necessary.

Usage

mtq.fold_weight(keep_attrs=True) will keep quantizer attrs after folding weight,

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: NA
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • New Features

    • Added support for loading dataset samples directly from JSONL/JSONL.GZ files
    • Added optional parameter to skip logits return in generation prefill operations
    • Enhanced weight folding operations to optionally preserve quantization attributes during model optimization
  • Bug Fixes

    • Fixed handling of empty tensor states to prevent deserialization errors in Megatron module

Signed-off-by: Meng Xin <mxin@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

This PR adds an optional keep_attrs parameter to weight folding methods across quantization modules, enabling conditional preservation of internal quantization attributes. Additionally, it introduces JSONL file support to dataset utilities and adds optional early-return parameters for megatron generation functions. An early return guards against empty tensor deserialization in the Megatron module.

Changes

Cohort / File(s) Summary
Weight folding refactoring
modelopt/torch/quantization/model_quant.py, modelopt/torch/quantization/nn/modules/quant_module.py, modelopt/torch/quantization/nn/modules/quant_linear.py, modelopt/torch/quantization/plugins/huggingface.py, modelopt/torch/quantization/plugins/vllm.py
Added keep_attrs: bool = False parameter to fold_weight methods across quantization module hierarchy. Internal quantization attributes (_pre_quant_scale, _amax, _svdquant_lora_a, _svdquant_lora_b) are now conditionally removed based on the parameter instead of always being deleted.
Dataset utilities enhancement
modelopt/torch/utils/dataset_utils.py
Introduced _get_jsonl_text_samples helper function to load samples from JSONL files with validation for JSON objects containing "text" fields. Updated get_dataset_samples and get_dataset_dataloader to support .jsonl and .jsonl.gz file paths as inputs.
Megatron plugins
modelopt/torch/opt/plugins/megatron.py, modelopt/torch/utils/plugins/megatron_generate.py
Added early return for empty tensors in Megatron module's _modelopt_set_extra_state. Added optional skip_return_logits parameter to megatron prefill/generation functions to control whether logits are computed and returned.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: keeping attributes after folding weights for RL and fixing empty extra state for Megatron, matching the key modifications across multiple quantization modules and the Megatron plugin.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mxin/rl-patches-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 13.88889% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.02%. Comparing base (d78797b) to head (3df771a).

Files with missing lines Patch % Lines
modelopt/torch/utils/dataset_utils.py 9.52% 19 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_linear.py 14.28% 6 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_module.py 16.66% 5 Missing ⚠️
modelopt/torch/quantization/model_quant.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
- Coverage   73.09%   73.02%   -0.07%     
==========================================
  Files         205      205              
  Lines       22301    22324      +23     
==========================================
+ Hits        16300    16302       +2     
- Misses       6001     6022      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mxinO added 4 commits January 14, 2026 01:31
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO mxinO marked this pull request as ready for review February 25, 2026 07:48
@mxinO mxinO requested review from a team as code owners February 25, 2026 07:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/torch/quantization/plugins/vllm.py (1)

231-248: ⚠️ Potential issue | 🟠 Major

keep_attrs is accepted but ignored in vLLM fold path.

Line 231 adds keep_attrs, but the method never branches on it. This makes the new flag ineffective for these modules and inconsistent with the base fold behavior.

💡 Proposed fix
 `@torch.no_grad`()
 def fold_weight(self, keep_attrs: bool = False):
@@
     self.w13_weight_quantizer.disable()
@@
     self.w2_weight_quantizer.disable()
+
+    if not keep_attrs:
+        for quantizer in (self.w13_weight_quantizer, self.w2_weight_quantizer):
+            for attr_name in ("_pre_quant_scale", "_amax"):
+                if hasattr(quantizer, attr_name):
+                    delattr(quantizer, attr_name)
 
     torch.cuda.empty_cache()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/plugins/vllm.py` around lines 231 - 248, The
fold_weight method currently ignores the keep_attrs flag; update fold_weight in
vLLM plugin to honor keep_attrs by performing folding loop as is but only
disabling and removing quantizer attributes when keep_attrs is False: after
folding each weight (self.w13_weight and self.w2_weight using
self.w13_weight_quantizer and self.w2_weight_quantizer), if not keep_attrs call
.disable() and then delete or set to None the corresponding quantizer attributes
(self.w13_weight_quantizer, self.w2_weight_quantizer) so the behavior matches
the base fold_weight semantics; if keep_attrs is True, do not disable or remove
those attributes. Ensure torch.cuda.empty_cache() remains after folding.
modelopt/torch/utils/dataset_utils.py (1)

165-176: ⚠️ Potential issue | 🟠 Major

.jsonl.gz is documented but not actually supported.

Line 165 documents .jsonl.gz, but Line 175 only checks .jsonl. Compressed JSONL inputs will bypass the new code path and fail as unsupported dataset names.

💡 Proposed fix
@@
 import copy
+import gzip
 import json
@@
-def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]:
+def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]:
@@
-    with open(jsonl_path, encoding="utf-8") as f:
+    open_fn = gzip.open if jsonl_path.endswith(".gz") else open
+    with open_fn(jsonl_path, mode="rt", encoding="utf-8") as f:
@@
-    if dataset_name.endswith(".jsonl"):
+    if dataset_name.endswith((".jsonl", ".jsonl.gz")):
         return _get_jsonl_text_samples(dataset_name, num_samples)

Also applies to: 261-263

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/utils/dataset_utils.py` around lines 165 - 176, The code
documents support for ".jsonl.gz" but only checks for ".jsonl"; update the
dataset_name checks (the condition that now calls _get_jsonl_text_samples) to
also detect ".jsonl.gz" (e.g., endswith(".jsonl") or endswith(".jsonl.gz")) and
ensure _get_jsonl_text_samples can read gzipped files (or add a gzip-aware
wrapper) so compressed JSONL is handled; apply the same fix to the other
occurrence that currently only checks ".jsonl" (the block around where the
dataset_name is re-evaluated later).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/quantization/plugins/vllm.py`:
- Around line 231-248: The fold_weight method currently ignores the keep_attrs
flag; update fold_weight in vLLM plugin to honor keep_attrs by performing
folding loop as is but only disabling and removing quantizer attributes when
keep_attrs is False: after folding each weight (self.w13_weight and
self.w2_weight using self.w13_weight_quantizer and self.w2_weight_quantizer), if
not keep_attrs call .disable() and then delete or set to None the corresponding
quantizer attributes (self.w13_weight_quantizer, self.w2_weight_quantizer) so
the behavior matches the base fold_weight semantics; if keep_attrs is True, do
not disable or remove those attributes. Ensure torch.cuda.empty_cache() remains
after folding.

In `@modelopt/torch/utils/dataset_utils.py`:
- Around line 165-176: The code documents support for ".jsonl.gz" but only
checks for ".jsonl"; update the dataset_name checks (the condition that now
calls _get_jsonl_text_samples) to also detect ".jsonl.gz" (e.g.,
endswith(".jsonl") or endswith(".jsonl.gz")) and ensure _get_jsonl_text_samples
can read gzipped files (or add a gzip-aware wrapper) so compressed JSONL is
handled; apply the same fix to the other occurrence that currently only checks
".jsonl" (the block around where the dataset_name is re-evaluated later).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d78797b and 3df771a.

📒 Files selected for processing (8)
  • modelopt/torch/opt/plugins/megatron.py
  • modelopt/torch/quantization/model_quant.py
  • modelopt/torch/quantization/nn/modules/quant_linear.py
  • modelopt/torch/quantization/nn/modules/quant_module.py
  • modelopt/torch/quantization/plugins/huggingface.py
  • modelopt/torch/quantization/plugins/vllm.py
  • modelopt/torch/utils/dataset_utils.py
  • modelopt/torch/utils/plugins/megatron_generate.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant