feat(quant): Quantizer class with BaseQuantPass pipeline (#964) by DingmaomaoBJTU · Pull Request #985 · microsoft/winml-cli

DingmaomaoBJTU · 2026-06-26T05:42:07Z

Summary

Refactors the quantizer from a flat dispatch function into an extensible pass-based pipeline, as tracked in #964.

Changes

New: `passes/` sub-package

File	Class	What it does
`passes/base.py`	`BaseQuantPass`	ABC — `__init__(config)` + abstract `run(model_path, output_path) -> QuantizeResult`
`passes/fp16.py`	`FP16Pass`	Reads `fp16_keep_io_types`, `fp16_op_block_list` from config
`passes/rtn.py`	`RTNPass`	Reads `rtn_bits`, `rtn_block_size`, `rtn_symmetric`, `rtn_accuracy_level` from config
`passes/static.py`	`StaticPass`	Reads all QDQ/calibration fields from config

All passes accept a single WinMLQuantizationConfig — each reads only its relevant fields.

Refactored: `quantizer.py`

Quantizer(passes) — chains passes sequentially; single-pass takes the direct path, multi-pass routes intermediates through a TemporaryDirectory; merges QuantizeResult stats across passes
expand_precision(mode, config) — maps mode strings to pass lists; mode is optional and falls back to config.mode:
- "fp16" → [FP16Pass(config)]
- "rtn" → [RTNPass(config)]
- "static" / "dynamic" → [StaticPass(config)]
quantize_onnx() — kept as backward-compatible entry point; now delegates to Quantizer

Updated: `commands/quantize.py`

--precision now accepts multiple values to compose a pass pipeline (e.g. --precision int4 --precision fp16 runs RTN then FP16)
Default output name for multi-pass: {stem}_int4_fp16.onnx

Tests

tests/unit/test_quant_passes.py — 19 tests, all passing:

TestExpandPrecision (7) — mapping correctness, unknown mode, None config, mode-from-config fallback
TestQuantizerSinglePass (4) — path routing, missing model, exception handling, empty passes guard
TestQuantizeOnnxKwargsGuard (1) — unexpected kwargs raise TypeError
TestQuantizerMultiPass (4) — chaining, stat merging, abort on failure, warning accumulation
TestFP16PassConfig (1) — config field wiring
TestRTNPassConfig (2) — config field wiring, accuracy_level=0 → None

tests/e2e/test_quantize_e2e.py — TestMultiPrecision (2 tests):

test_int4_then_fp16_pipeline — verifies MatMulNBits nodes (RTN) + FLOAT16 initializers (FP16) are both present
test_pipeline_default_output_path — verifies auto-named output file

- Add passes/ sub-package with BaseQuantPass ABC - Implement FP16Pass, RTNPass, QDQPass — each accepts WinMLQuantizationConfig and reads only the fields relevant to that pass - Add Quantizer class: chains passes sequentially, uses tempfile for intermediates, merges QuantizeResult stats across passes - Add expand_precision(mode, config) to map precision strings to pass lists (supports 'fp16', 'rtn', 'static', 'dynamic', 'w4a16') - Keep quantize_onnx() as backward-compatible entry point - Add tests/unit/test_quant_passes.py (19 tests, all passing)

- QDQPass.run(): forward use_external_data to final save_onnx call - WinMLQuantizationConfig: add 'w4a16' to mode Literal; to_dict() now serialises rtn_* and fp16_* fields when mode is 'w4a16' - quantize_onnx(): raise TypeError on unrecognised kwargs instead of silently discarding them - Tests: add TestW4a16Config (3 cases) and TestQuantizeOnnxKwargsGuard (1 case)

- Add TYPE_CHECKING import block for Quantizer, expand_precision, and quantize_onnx so mypy resolves their types instead of falling back to Any? (fixes 'Any? not callable [misc]' in hf.py and onnx.py) - Same TYPE_CHECKING imports satisfy CodeQL's 'Explicit export is not defined' alerts for those names in __all__ - Remove trailing ... after docstring in BaseQuantPass.run() to fix CodeQL 'Statement has no effect' alert

… onto main w4a16 is a composite pipeline concept, not a single-pass quantization mode. Removing it from the mode Literal keeps config.py focused on atomic pass modes (static, dynamic, rtn, fp16). Multi-pass pipelines are expressed through Quantizer + expand_precision at a higher level. Changes: - config.py: revert mode Literal to [static, dynamic, rtn, fp16], revert to_dict() guards back to equality checks, remove w4a16 docstring example - quantizer.py: remove w4a16 from _COMPOSITE_PRECISIONS and docstrings - __init__.py: update module docstring example - commands/build.py: fix stale 'single-pass' comment - tests: remove TestW4a16Config and test_w4a16_returns_rtn_then_fp16

… pipeline Rename: - passes/qdq.py → passes/static.py; QDQPass → StaticPass throughout - Update all imports, __all__, quantizer.py pass_factories, and tests Multi-precision --precision: - precision_option() gains multiple=True support - quantize command accepts repeated --precision flags; len > 1 routes to _run_multi_precision() which chains expand_precision() calls into a single Quantizer pipeline - Default output path for multi-pass: {stem}_{p1}_{p2}.onnx - Calibration-unused warning emitted when no static pass is in pipeline E2E tests (TestMultiPrecision): - test_int4_then_fp16_pipeline: verifies MatMulNBits nodes (RTN) and FLOAT16 initializers (FP16 pass) are both present in output - test_pipeline_default_output_path: verifies auto-named output file

- static.py: fix model_name -> model_id (WinMLQuantizationConfig has no model_name field; correct field is model_id) — fixes mypy [attr-defined] - cli.py: widen precision_option default type to str | tuple[str,...] | None so passing default=() for multiple=True passes mypy — fixes [arg-type] - quantizer.py: make expand_precision mode optional, falling back to config.mode when not provided; removes redundant arg from quantize_onnx caller (addresses reviewer: 'why have mode param when config has it') - quantize.py: remove redundant 'from typing import cast' inside _run_multi_precision (cast already imported at module level) - passes/base.py: add note in run() docstring explaining why file-based I/O is used (addresses reviewer suggestion about in-memory model proto) - tests: add test_no_mode_uses_config_mode to cover expand_precision(config=) path

DingmaomaoBJTU requested a review from a team as a code owner June 26, 2026 05:42

github-advanced-security AI found potential problems Jun 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/quant/__init__.py Fixed

Comment thread src/winml/modelkit/quant/__init__.py Fixed

Comment thread src/winml/modelkit/quant/passes/base.py Fixed

github-actions Bot added 4 commits June 26, 2026 15:53

DingmaomaoBJTU force-pushed the dingmaomaobjtu-feat-quantizer-pass-pipeline branch from 5a4da8c to 519b562 Compare June 26, 2026 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985
DingmaomaoBJTU wants to merge 6 commits into
mainfrom
dingmaomaobjtu-feat-quantizer-pass-pipeline

DingmaomaoBJTU commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

DingmaomaoBJTU commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New: passes/ sub-package

Refactored: quantizer.py

Updated: commands/quantize.py

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DingmaomaoBJTU commented Jun 26, 2026 •

edited

Loading

New: `passes/` sub-package

Refactored: `quantizer.py`

Updated: `commands/quantize.py`