Integrate Automated QDQ autotuner - part 3.2#838
Integrate Automated QDQ autotuner - part 3.2#838willg-nv wants to merge 11 commits intoNVIDIA:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughIntroduces a new ONNX quantization autotuning module that enables automatic Q/DQ (Quantize/Dequantize) node insertion and optimization using pattern-based region analysis. Provides a comprehensive framework for discovering optimal insertion points, profiling schemes, and exporting quantized models. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Autotuner as QDQAutotuner
participant RegionSearch as CombinedRegionSearch
participant Profiler as Profiling System
participant Inserter as Q/DQ Insertion
participant Exporter as ONNX Exporter
User->>Autotuner: initialize(config, pattern_cache)
Autotuner->>Autotuner: Load model & init state
User->>RegionSearch: discover regions
RegionSearch-->>Autotuner: return regions
loop For each region
User->>Autotuner: set_profile_region(region)
Autotuner->>Autotuner: Commit profiling outcomes
Autotuner->>Profiler: Prepare region-pattern pairs
loop Generate candidates
User->>Autotuner: generate()
Autotuner->>Inserter: Build insertion scheme
Inserter->>Inserter: Insert Q/DQ nodes
User->>Autotuner: submit(latency_ms)
Autotuner->>Autotuner: Track performance metrics
end
end
User->>Autotuner: export_onnx(best=True)
Autotuner->>Inserter: Apply best scheme
Inserter->>Exporter: Finalize Q/DQ graph
Exporter-->>User: return quantized ONNX bytes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 1024-1029: The try/except around graph.cleanup().toposort()
swallows all exceptions (except Exception as e) and merely logs a warning, which
can hide serious graph corruption; update the handler in autotuner.py to either
catch only expected exception types (e.g., specific cleanup/toposort exceptions)
or log the error and re-raise it so execution stops on unexpected failures —
locate the graph.cleanup().toposort() call and replace the broad except with
either a narrowed except for known recoverable exceptions or add a raise after
logger.warning/failure log to propagate the error.
- Line 622: Remove the redundant local import "from datetime import datetime"
(the one added at line with the single import statement) in autotuner.py; the
module already imports datetime at the top of the file, so delete this local
import to avoid duplication and potential shadowing (look for the statement
"from datetime import datetime" inside the function or block and remove it).
- Around line 912-918: The zero-point arrays q_zp_values (and the corresponding
dq_zp_values) are created with a hardcoded dtype np.int8 which can mismatch the
QuantizeLinear/DequantizeLinear output type when quant_type is "uint8" or other
types; update their construction to use the same dtype as the computed
quant_dtype instead of np.int8 so q_zp_values and dq_zp_values match the
quantized output element type used when building q_inputs and dq_inputs (refer
to q_scale_values, q_zp_values, q_inputs and the corresponding dq_* variables to
locate where to change the dtype).
- Around line 1013-1021: The import of get_tensor_consumer_node_indices is wrong
and causes an import error; replace that import with get_tensor_consumer_nodes
and update any usage names accordingly (the code that uses tensor_users_map
already expects a defaultdict(list) so no KeyError handling is needed).
Specifically, change the symbol imported from
modelopt.onnx.quantization.graph_utils from get_tensor_consumer_node_indices to
get_tensor_consumer_nodes and ensure tensor_users_map is assigned from
get_tensor_consumer_nodes(...) where used in the autotuner (references:
get_tensor_consumer_node_indices, get_tensor_consumer_nodes, tensor_users_map).
🧹 Nitpick comments (4)
modelopt/onnx/quantization/autotune/autotuner.py (4)
229-229: Consider defining config attributes explicitly.Using
getattr(self.config, "maximum_generation_attempts", 100)with defaults (also seen at lines 718-719 and 744) suggests these attributes may not be formally defined on theConfigclass. This pattern makes it harder to discover available configuration options.💡 Suggestion
Consider adding these attributes to the
Configclass with documented defaults rather than relying ongetattrfallbacks:# In Config class maximum_generation_attempts: int = 100 top_percent_to_mutate: float = 0.1 minimum_schemes_to_mutate: int = 1 maximum_mutations: int = 3
333-335: Replace assertions with explicit checks for runtime validation.Assertions on lines 333-335 (and similarly at line 314) are used for validating runtime conditions. Since assertions can be disabled with
python -O, these should be explicit checks for production code.🛡️ Proposed fix
- full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph) - assert full_insertion_scheme is not None - all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme) - assert isinstance(all_region_ips, set) + full_insertion_scheme = pattern.get_full_insertion_scheme(region, self.graph) + if full_insertion_scheme is None: + logger.warning(f"Failed to get full insertion scheme for region {region.id}") + continue + all_region_ips = pattern.matches(region, self.graph, full_insertion_scheme) + if not isinstance(all_region_ips, set): + raise TypeError(f"Expected set from pattern.matches, got {type(all_region_ips)}")
972-985: Assertions used for critical runtime validation.These assertions validate critical invariants (node index bounds, input index bounds, tensor name matching) but can be disabled with
python -O. Consider using explicit checks withValueError/IndexErrorfor production safety.🛡️ Proposed fix
if node_index is not None: - assert node_index < len(graph.nodes), "Node index out of range" + if node_index >= len(graph.nodes): + raise IndexError(f"Node index {node_index} out of range (max: {len(graph.nodes) - 1})") target_node = graph.nodes[node_index] - assert input_index is not None, "Input index must be set when node index is set" - assert input_index < len(target_node.inputs), ( - f"Input index out of range for node {target_node.name}" - ) + if input_index is None: + raise ValueError("Input index must be set when node index is set") + if input_index >= len(target_node.inputs): + raise IndexError(f"Input index {input_index} out of range for node {target_node.name}") original_tensor = target_node.inputs[input_index] - assert tensor_name == original_tensor.name, ( - f"Tensor name mismatch for node {target_node.name} input {input_index}" - ) + if tensor_name != original_tensor.name: + raise ValueError(f"Tensor name mismatch: expected '{tensor_name}', got '{original_tensor.name}'") else: - assert tensor_name in tensor_map, f"Tensor {tensor_name} not found in tensor map" - assert input_index is None, "Input index must be None when node index is None" + if tensor_name not in tensor_map: + raise KeyError(f"Tensor {tensor_name} not found in tensor map") + if input_index is not None: + raise ValueError("Input index must be None when node index is None")
1042-1049: Consider iterative approach for deep region hierarchies.
_visit_region_recursivelyuses recursion which could hit Python's stack limit for very deep region hierarchies. While this is unlikely for typical ONNX models, an iterative approach would be more robust.♻️ Iterative alternative
def _visit_region_recursively(self, region: Region) -> list[Region]: """Iteratively traverse region hierarchy and collect all regions.""" regions = [] stack = [region] while stack: current = stack.pop() regions.append(current) stack.extend(current.get_children()) return regions
1ffcf7f to
bd18dfa
Compare
tests/unit/onnx/quantization/autotune/autotune/test_autotuner.py
Outdated
Show resolved
Hide resolved
|
/ok to test b02fef1 |
c5340bb to
68ffa5a
Compare
|
/ok to test 68ffa5a |
68ffa5a to
ebba6e6
Compare
|
/ok to test 6124ac9 |
b610760 to
4ad0b32
Compare
There was a problem hiding this comment.
Pull request overview
Implements the core QDQAutotuner workflow for ONNX quantization autotuning, including region discovery, scheme generation/submission, Q/DQ insertion and export, plus state/pattern-cache persistence and unit tests. This builds the central driver class that later CLI/integration pieces can orchestrate.
Changes:
- Added
QDQAutotunerBase/QDQAutotunerimplementation (region search, scheme generation, submit/export, save/load state, pattern-cache seeding/import). - Extended autotune “common” structures with
PatternSchemes,PatternCache, and expandedConfig. - Added unit tests and shared test model utilities; updated insertion point handling to treat linear ops uniformly.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
modelopt/onnx/quantization/autotune/autotuner.py |
Adds the main autotuner implementation, including export/insertion, profiling workflow, and persistence. |
modelopt/onnx/quantization/autotune/common.py |
Introduces/extends core data structures: PatternSchemes, PatternCache, and Config. |
modelopt/onnx/quantization/autotune/insertion_points.py |
Updates insertion-point resolution/skipping to use is_linear_op (Conv/ConvTranspose/Gemm/MatMul). |
modelopt/onnx/quantization/autotune/__init__.py |
Exposes public API for autotune package. |
tests/unit/onnx/quantization/autotune/test_autotuner.py |
Adds unit coverage for initialization, region discovery, export, scheme workflow, and state I/O. |
tests/_test_utils/onnx/quantization/autotune/models.py |
Adds a minimal shared ONNX model builder for autotuner tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## What does this PR do? This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf. This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No, document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are merged. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components. * Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations. * Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
4ad0b32 to
9a94dc4
Compare
What does this PR do?
This PR implements QDQAutotuner class. This class is used to drive the main Autotuner workflow.
The workflow is:
This PR is part 2/4 of #703.
PR 3.1: #837
PR 3.2 #838
PR 3.3: #839
Overview: ?
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.