Skip to content

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985

Open
DingmaomaoBJTU wants to merge 6 commits into
mainfrom
dingmaomaobjtu-feat-quantizer-pass-pipeline
Open

feat(quant): Quantizer class with BaseQuantPass pipeline (#964)#985
DingmaomaoBJTU wants to merge 6 commits into
mainfrom
dingmaomaobjtu-feat-quantizer-pass-pipeline

Conversation

@DingmaomaoBJTU

@DingmaomaoBJTU DingmaomaoBJTU commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Refactors the quantizer from a flat dispatch function into an extensible pass-based pipeline, as tracked in #964.

Changes

New: passes/ sub-package

File Class What it does
passes/base.py BaseQuantPass ABC — __init__(config) + abstract run(model_path, output_path) -> QuantizeResult
passes/fp16.py FP16Pass Reads fp16_keep_io_types, fp16_op_block_list from config
passes/rtn.py RTNPass Reads rtn_bits, rtn_block_size, rtn_symmetric, rtn_accuracy_level from config
passes/static.py StaticPass Reads all QDQ/calibration fields from config

All passes accept a single WinMLQuantizationConfig — each reads only its relevant fields.

Refactored: quantizer.py

  • Quantizer(passes) — chains passes sequentially; single-pass takes the direct path, multi-pass routes intermediates through a TemporaryDirectory; merges QuantizeResult stats across passes
  • expand_precision(mode, config) — maps mode strings to pass lists; mode is optional and falls back to config.mode:
    • "fp16"[FP16Pass(config)]
    • "rtn"[RTNPass(config)]
    • "static" / "dynamic"[StaticPass(config)]
  • quantize_onnx() — kept as backward-compatible entry point; now delegates to Quantizer

Updated: commands/quantize.py

  • --precision now accepts multiple values to compose a pass pipeline (e.g. --precision int4 --precision fp16 runs RTN then FP16)
  • Default output name for multi-pass: {stem}_int4_fp16.onnx

Tests

tests/unit/test_quant_passes.py — 19 tests, all passing:

  • TestExpandPrecision (7) — mapping correctness, unknown mode, None config, mode-from-config fallback
  • TestQuantizerSinglePass (4) — path routing, missing model, exception handling, empty passes guard
  • TestQuantizeOnnxKwargsGuard (1) — unexpected kwargs raise TypeError
  • TestQuantizerMultiPass (4) — chaining, stat merging, abort on failure, warning accumulation
  • TestFP16PassConfig (1) — config field wiring
  • TestRTNPassConfig (2) — config field wiring, accuracy_level=0 → None

tests/e2e/test_quantize_e2e.pyTestMultiPrecision (2 tests):

  • test_int4_then_fp16_pipeline — verifies MatMulNBits nodes (RTN) + FLOAT16 initializers (FP16) are both present
  • test_pipeline_default_output_path — verifies auto-named output file

@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner June 26, 2026 05:42
Comment thread src/winml/modelkit/quant/__init__.py Fixed
Comment thread src/winml/modelkit/quant/__init__.py Fixed
Comment thread src/winml/modelkit/quant/passes/base.py Fixed
github-actions Bot added 4 commits June 26, 2026 15:53
- Add passes/ sub-package with BaseQuantPass ABC
- Implement FP16Pass, RTNPass, QDQPass — each accepts WinMLQuantizationConfig
  and reads only the fields relevant to that pass
- Add Quantizer class: chains passes sequentially, uses tempfile for
  intermediates, merges QuantizeResult stats across passes
- Add expand_precision(mode, config) to map precision strings to pass lists
  (supports 'fp16', 'rtn', 'static', 'dynamic', 'w4a16')
- Keep quantize_onnx() as backward-compatible entry point
- Add tests/unit/test_quant_passes.py (19 tests, all passing)
- QDQPass.run(): forward use_external_data to final save_onnx call
- WinMLQuantizationConfig: add 'w4a16' to mode Literal; to_dict() now
  serialises rtn_* and fp16_* fields when mode is 'w4a16'
- quantize_onnx(): raise TypeError on unrecognised kwargs instead of
  silently discarding them
- Tests: add TestW4a16Config (3 cases) and TestQuantizeOnnxKwargsGuard (1 case)
- Add TYPE_CHECKING import block for Quantizer, expand_precision, and
  quantize_onnx so mypy resolves their types instead of falling back to
  Any? (fixes 'Any? not callable [misc]' in hf.py and onnx.py)
- Same TYPE_CHECKING imports satisfy CodeQL's 'Explicit export is not
  defined' alerts for those names in __all__
- Remove trailing ... after docstring in BaseQuantPass.run() to fix
  CodeQL 'Statement has no effect' alert
… onto main

w4a16 is a composite pipeline concept, not a single-pass quantization mode.
Removing it from the mode Literal keeps config.py focused on atomic pass
modes (static, dynamic, rtn, fp16). Multi-pass pipelines are expressed
through Quantizer + expand_precision at a higher level.

Changes:
- config.py: revert mode Literal to [static, dynamic, rtn, fp16], revert
  to_dict() guards back to equality checks, remove w4a16 docstring example
- quantizer.py: remove w4a16 from _COMPOSITE_PRECISIONS and docstrings
- __init__.py: update module docstring example
- commands/build.py: fix stale 'single-pass' comment
- tests: remove TestW4a16Config and test_w4a16_returns_rtn_then_fp16
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu-feat-quantizer-pass-pipeline branch from 5a4da8c to 519b562 Compare June 26, 2026 07:56
Comment thread src/winml/modelkit/quant/passes/base.py
Comment thread src/winml/modelkit/quant/passes/static.py
Comment thread src/winml/modelkit/quant/quantizer.py Outdated
github-actions Bot added 2 commits June 26, 2026 16:16
… pipeline

Rename:
- passes/qdq.py → passes/static.py; QDQPass → StaticPass throughout
- Update all imports, __all__, quantizer.py pass_factories, and tests

Multi-precision --precision:
- precision_option() gains multiple=True support
- quantize command accepts repeated --precision flags; len > 1 routes to
  _run_multi_precision() which chains expand_precision() calls into a
  single Quantizer pipeline
- Default output path for multi-pass: {stem}_{p1}_{p2}.onnx
- Calibration-unused warning emitted when no static pass is in pipeline

E2E tests (TestMultiPrecision):
- test_int4_then_fp16_pipeline: verifies MatMulNBits nodes (RTN) and
  FLOAT16 initializers (FP16 pass) are both present in output
- test_pipeline_default_output_path: verifies auto-named output file
- static.py: fix model_name -> model_id (WinMLQuantizationConfig has no
  model_name field; correct field is model_id) — fixes mypy [attr-defined]
- cli.py: widen precision_option default type to str | tuple[str,...] | None
  so passing default=() for multiple=True passes mypy — fixes [arg-type]
- quantizer.py: make expand_precision mode optional, falling back to
  config.mode when not provided; removes redundant arg from quantize_onnx
  caller (addresses reviewer: 'why have mode param when config has it')
- quantize.py: remove redundant 'from typing import cast' inside
  _run_multi_precision (cast already imported at module level)
- passes/base.py: add note in run() docstring explaining why file-based I/O
  is used (addresses reviewer suggestion about in-memory model proto)
- tests: add test_no_mode_uses_config_mode to cover expand_precision(config=)
  path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants