feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985
Open
DingmaomaoBJTU wants to merge 6 commits into
Open
feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985DingmaomaoBJTU wants to merge 6 commits into
DingmaomaoBJTU wants to merge 6 commits into
Conversation
- Add passes/ sub-package with BaseQuantPass ABC - Implement FP16Pass, RTNPass, QDQPass — each accepts WinMLQuantizationConfig and reads only the fields relevant to that pass - Add Quantizer class: chains passes sequentially, uses tempfile for intermediates, merges QuantizeResult stats across passes - Add expand_precision(mode, config) to map precision strings to pass lists (supports 'fp16', 'rtn', 'static', 'dynamic', 'w4a16') - Keep quantize_onnx() as backward-compatible entry point - Add tests/unit/test_quant_passes.py (19 tests, all passing)
- QDQPass.run(): forward use_external_data to final save_onnx call - WinMLQuantizationConfig: add 'w4a16' to mode Literal; to_dict() now serialises rtn_* and fp16_* fields when mode is 'w4a16' - quantize_onnx(): raise TypeError on unrecognised kwargs instead of silently discarding them - Tests: add TestW4a16Config (3 cases) and TestQuantizeOnnxKwargsGuard (1 case)
- Add TYPE_CHECKING import block for Quantizer, expand_precision, and quantize_onnx so mypy resolves their types instead of falling back to Any? (fixes 'Any? not callable [misc]' in hf.py and onnx.py) - Same TYPE_CHECKING imports satisfy CodeQL's 'Explicit export is not defined' alerts for those names in __all__ - Remove trailing ... after docstring in BaseQuantPass.run() to fix CodeQL 'Statement has no effect' alert
… onto main w4a16 is a composite pipeline concept, not a single-pass quantization mode. Removing it from the mode Literal keeps config.py focused on atomic pass modes (static, dynamic, rtn, fp16). Multi-pass pipelines are expressed through Quantizer + expand_precision at a higher level. Changes: - config.py: revert mode Literal to [static, dynamic, rtn, fp16], revert to_dict() guards back to equality checks, remove w4a16 docstring example - quantizer.py: remove w4a16 from _COMPOSITE_PRECISIONS and docstrings - __init__.py: update module docstring example - commands/build.py: fix stale 'single-pass' comment - tests: remove TestW4a16Config and test_w4a16_returns_rtn_then_fp16
5a4da8c to
519b562
Compare
xieofxie
reviewed
Jun 26, 2026
xieofxie
reviewed
Jun 26, 2026
xieofxie
reviewed
Jun 26, 2026
… pipeline
Rename:
- passes/qdq.py → passes/static.py; QDQPass → StaticPass throughout
- Update all imports, __all__, quantizer.py pass_factories, and tests
Multi-precision --precision:
- precision_option() gains multiple=True support
- quantize command accepts repeated --precision flags; len > 1 routes to
_run_multi_precision() which chains expand_precision() calls into a
single Quantizer pipeline
- Default output path for multi-pass: {stem}_{p1}_{p2}.onnx
- Calibration-unused warning emitted when no static pass is in pipeline
E2E tests (TestMultiPrecision):
- test_int4_then_fp16_pipeline: verifies MatMulNBits nodes (RTN) and
FLOAT16 initializers (FP16 pass) are both present in output
- test_pipeline_default_output_path: verifies auto-named output file
- static.py: fix model_name -> model_id (WinMLQuantizationConfig has no model_name field; correct field is model_id) — fixes mypy [attr-defined] - cli.py: widen precision_option default type to str | tuple[str,...] | None so passing default=() for multiple=True passes mypy — fixes [arg-type] - quantizer.py: make expand_precision mode optional, falling back to config.mode when not provided; removes redundant arg from quantize_onnx caller (addresses reviewer: 'why have mode param when config has it') - quantize.py: remove redundant 'from typing import cast' inside _run_multi_precision (cast already imported at module level) - passes/base.py: add note in run() docstring explaining why file-based I/O is used (addresses reviewer suggestion about in-memory model proto) - tests: add test_no_mode_uses_config_mode to cover expand_precision(config=) path
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors the quantizer from a flat dispatch function into an extensible pass-based pipeline, as tracked in #964.
Changes
New:
passes/sub-packagepasses/base.pyBaseQuantPass__init__(config)+ abstractrun(model_path, output_path) -> QuantizeResultpasses/fp16.pyFP16Passfp16_keep_io_types,fp16_op_block_listfrom configpasses/rtn.pyRTNPassrtn_bits,rtn_block_size,rtn_symmetric,rtn_accuracy_levelfrom configpasses/static.pyStaticPassAll passes accept a single
WinMLQuantizationConfig— each reads only its relevant fields.Refactored:
quantizer.pyQuantizer(passes)— chains passes sequentially; single-pass takes the direct path, multi-pass routes intermediates through aTemporaryDirectory; mergesQuantizeResultstats across passesexpand_precision(mode, config)— maps mode strings to pass lists;modeis optional and falls back toconfig.mode:"fp16"→[FP16Pass(config)]"rtn"→[RTNPass(config)]"static"/"dynamic"→[StaticPass(config)]quantize_onnx()— kept as backward-compatible entry point; now delegates toQuantizerUpdated:
commands/quantize.py--precisionnow accepts multiple values to compose a pass pipeline (e.g.--precision int4 --precision fp16runs RTN then FP16){stem}_int4_fp16.onnxTests
tests/unit/test_quant_passes.py— 19 tests, all passing:TestExpandPrecision(7) — mapping correctness, unknown mode, None config, mode-from-config fallbackTestQuantizerSinglePass(4) — path routing, missing model, exception handling, empty passes guardTestQuantizeOnnxKwargsGuard(1) — unexpected kwargs raiseTypeErrorTestQuantizerMultiPass(4) — chaining, stat merging, abort on failure, warning accumulationTestFP16PassConfig(1) — config field wiringTestRTNPassConfig(2) — config field wiring, accuracy_level=0 → Nonetests/e2e/test_quantize_e2e.py—TestMultiPrecision(2 tests):test_int4_then_fp16_pipeline— verifiesMatMulNBitsnodes (RTN) +FLOAT16initializers (FP16) are both presenttest_pipeline_default_output_path— verifies auto-named output file