[For RL] Keep attrs after folding weight and fix empty extra state for Megatron by mxinO · Pull Request #779 · NVIDIA/Model-Optimizer

mxinO · 2026-01-14T02:45:41Z

What does this PR do?

Type of change: improvement

Overview:

For Quantization aware reinforcement learning, after folding weight of rollout, we want to keep the quantization attrs for next step.
Minor fix for empty extra state
Support getting dataloader from jsonl file, useful for using training data as calibration data. I can separate this to another PR if necessary.

Usage

mtq.fold_weight(keep_attrs=True) will keep quantizer attrs after folding weight,

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: NA
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

New Features
- Added support for loading dataset samples directly from JSONL/JSONL.GZ files
- Added optional parameter to skip logits return in generation prefill operations
- Enhanced weight folding operations to optionally preserve quantization attributes during model optimization
Bug Fixes
- Fixed handling of empty tensor states to prevent deserialization errors in Megatron module

Signed-off-by: Meng Xin <mxin@nvidia.com>

copy-pr-bot · 2026-01-14T02:45:44Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-01-14T02:45:47Z

📝 Walkthrough

Walkthrough

This PR adds an optional keep_attrs parameter to weight folding methods across quantization modules, enabling conditional preservation of internal quantization attributes. Additionally, it introduces JSONL file support to dataset utilities and adds optional early-return parameters for megatron generation functions. An early return guards against empty tensor deserialization in the Megatron module.

Changes

Cohort / File(s)	Summary
Weight folding refactoring `modelopt/torch/quantization/model_quant.py`, `modelopt/torch/quantization/nn/modules/quant_module.py`, `modelopt/torch/quantization/nn/modules/quant_linear.py`, `modelopt/torch/quantization/plugins/huggingface.py`, `modelopt/torch/quantization/plugins/vllm.py`	Added `keep_attrs: bool = False` parameter to `fold_weight` methods across quantization module hierarchy. Internal quantization attributes (`_pre_quant_scale`, `_amax`, `_svdquant_lora_a`, `_svdquant_lora_b`) are now conditionally removed based on the parameter instead of always being deleted.
Dataset utilities enhancement `modelopt/torch/utils/dataset_utils.py`	Introduced `_get_jsonl_text_samples` helper function to load samples from JSONL files with validation for JSON objects containing "text" fields. Updated `get_dataset_samples` and `get_dataset_dataloader` to support `.jsonl` and `.jsonl.gz` file paths as inputs.
Megatron plugins `modelopt/torch/opt/plugins/megatron.py`, `modelopt/torch/utils/plugins/megatron_generate.py`	Added early return for empty tensors in Megatron module's `_modelopt_set_extra_state`. Added optional `skip_return_logits` parameter to megatron prefill/generation functions to control whether logits are computed and returned.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: keeping attributes after folding weights for RL and fixing empty extra state for Megatron, matching the key modifications across multiple quantization modules and the Megatron plugin.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mxin/rl-patches-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-14T02:58:31Z

Codecov Report

❌ Patch coverage is 13.88889% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.02%. Comparing base (d78797b) to head (3df771a).

Files with missing lines	Patch %	Lines
modelopt/torch/utils/dataset_utils.py	9.52%	19 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_linear.py	14.28%	6 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_module.py	16.66%	5 Missing ⚠️
modelopt/torch/quantization/model_quant.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
- Coverage   73.09%   73.02%   -0.07%     
==========================================
  Files         205      205              
  Lines       22301    22324      +23     
==========================================
+ Hits        16300    16302       +2     
- Misses       6001     6022      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modelopt/torch/quantization/plugins/vllm.py (1)

231-248: ⚠️ Potential issue | 🟠 Major

keep_attrs is accepted but ignored in vLLM fold path.

Line 231 adds keep_attrs, but the method never branches on it. This makes the new flag ineffective for these modules and inconsistent with the base fold behavior.

💡 Proposed fix

 `@torch.no_grad`()
 def fold_weight(self, keep_attrs: bool = False):
@@
     self.w13_weight_quantizer.disable()
@@
     self.w2_weight_quantizer.disable()
+
+    if not keep_attrs:
+        for quantizer in (self.w13_weight_quantizer, self.w2_weight_quantizer):
+            for attr_name in ("_pre_quant_scale", "_amax"):
+                if hasattr(quantizer, attr_name):
+                    delattr(quantizer, attr_name)
 
     torch.cuda.empty_cache()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/plugins/vllm.py` around lines 231 - 248, The
fold_weight method currently ignores the keep_attrs flag; update fold_weight in
vLLM plugin to honor keep_attrs by performing folding loop as is but only
disabling and removing quantizer attributes when keep_attrs is False: after
folding each weight (self.w13_weight and self.w2_weight using
self.w13_weight_quantizer and self.w2_weight_quantizer), if not keep_attrs call
.disable() and then delete or set to None the corresponding quantizer attributes
(self.w13_weight_quantizer, self.w2_weight_quantizer) so the behavior matches
the base fold_weight semantics; if keep_attrs is True, do not disable or remove
those attributes. Ensure torch.cuda.empty_cache() remains after folding.

modelopt/torch/utils/dataset_utils.py (1)

165-176: ⚠️ Potential issue | 🟠 Major

.jsonl.gz is documented but not actually supported.

Line 165 documents .jsonl.gz, but Line 175 only checks .jsonl. Compressed JSONL inputs will bypass the new code path and fail as unsupported dataset names.

💡 Proposed fix

@@
 import copy
+import gzip
 import json
@@
-def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]:
+def _get_jsonl_text_samples(jsonl_path: str, num_samples: int) -> list[str]:
@@
-    with open(jsonl_path, encoding="utf-8") as f:
+    open_fn = gzip.open if jsonl_path.endswith(".gz") else open
+    with open_fn(jsonl_path, mode="rt", encoding="utf-8") as f:
@@
-    if dataset_name.endswith(".jsonl"):
+    if dataset_name.endswith((".jsonl", ".jsonl.gz")):
         return _get_jsonl_text_samples(dataset_name, num_samples)

Also applies to: 261-263

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/utils/dataset_utils.py` around lines 165 - 176, The code
documents support for ".jsonl.gz" but only checks for ".jsonl"; update the
dataset_name checks (the condition that now calls _get_jsonl_text_samples) to
also detect ".jsonl.gz" (e.g., endswith(".jsonl") or endswith(".jsonl.gz")) and
ensure _get_jsonl_text_samples can read gzipped files (or add a gzip-aware
wrapper) so compressed JSONL is handled; apply the same fix to the other
occurrence that currently only checks ".jsonl" (the block around where the
dataset_name is re-evaluated later).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/torch/quantization/plugins/vllm.py`:
- Around line 231-248: The fold_weight method currently ignores the keep_attrs
flag; update fold_weight in vLLM plugin to honor keep_attrs by performing
folding loop as is but only disabling and removing quantizer attributes when
keep_attrs is False: after folding each weight (self.w13_weight and
self.w2_weight using self.w13_weight_quantizer and self.w2_weight_quantizer), if
not keep_attrs call .disable() and then delete or set to None the corresponding
quantizer attributes (self.w13_weight_quantizer, self.w2_weight_quantizer) so
the behavior matches the base fold_weight semantics; if keep_attrs is True, do
not disable or remove those attributes. Ensure torch.cuda.empty_cache() remains
after folding.

In `@modelopt/torch/utils/dataset_utils.py`:
- Around line 165-176: The code documents support for ".jsonl.gz" but only
checks for ".jsonl"; update the dataset_name checks (the condition that now
calls _get_jsonl_text_samples) to also detect ".jsonl.gz" (e.g.,
endswith(".jsonl") or endswith(".jsonl.gz")) and ensure _get_jsonl_text_samples
can read gzipped files (or add a gzip-aware wrapper) so compressed JSONL is
handled; apply the same fix to the other occurrence that currently only checks
".jsonl" (the block around where the dataset_name is re-evaluated later).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d78797b and 3df771a.

📒 Files selected for processing (8)

modelopt/torch/opt/plugins/megatron.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/quantization/nn/modules/quant_linear.py
modelopt/torch/quantization/nn/modules/quant_module.py
modelopt/torch/quantization/plugins/huggingface.py
modelopt/torch/quantization/plugins/vllm.py
modelopt/torch/utils/dataset_utils.py
modelopt/torch/utils/plugins/megatron_generate.py

rl patches

ddb26ac

Signed-off-by: Meng Xin <mxin@nvidia.com>

mxinO added 4 commits January 14, 2026 01:31

context parallelt support for megatron prefill

0552699

Signed-off-by: Meng Xin <mxin@nvidia.com>

skip return logits

4df05a0

Signed-off-by: Meng Xin <mxin@nvidia.com>

support custom jsonl file

70f4fcd

Signed-off-by: Meng Xin <mxin@nvidia.com>

Merge remote-tracking branch 'origin/main' into mxin/rl-patches-2

3df771a

Signed-off-by: Meng Xin <mxin@nvidia.com>

mxinO marked this pull request as ready for review February 25, 2026 07:48

mxinO requested review from a team as code owners February 25, 2026 07:48

mxinO requested review from ChenhanYu, kaix-nv and realAsma February 25, 2026 07:48

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779

[For RL] Keep attrs after folding weight and fix empty extra state for Megatron#779
mxinO wants to merge 5 commits intomainfrom
mxin/rl-patches-2

mxinO commented Jan 14, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

mxinO commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mxinO commented Jan 14, 2026 •

edited

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

codecov bot commented Jan 14, 2026 •

edited

Loading