Context
Following the architecture pattern established by optim/optimizer.py (pipe-based pipeline), refactor the quantizer from standalone functions into a class-based pass pipeline for better extensibility and custom multi-pass support.
Current State
quant/quantizer.py has quantize_onnx() function with _quantize_fp16, _quantize_rtn, _quantize_qdq helper functions
- Multi-pass logic (
w4a16 → [int4, fp16]) is handled internally via _run_multi_pass()
- Users cannot customize pass ordering or per-pass parameters in multi-pass scenarios
Proposed Design
# quant/passes/base.py
class BaseQuantPass:
def should_run(self, config) -> bool: ...
def build_config(self, precision, config) -> PassConfig: ...
def run(self, model_path, config) -> Path: ...
# quant/passes/rtn.py, fp16.py, qdq.py
class RTNPass(BaseQuantPass): ...
class FP16Pass(BaseQuantPass): ...
class QDQPass(BaseQuantPass): ...
# quant/quantizer.py
class Quantizer:
passes = [RTNPass, QDQPass, FP16Pass]
def quantize(self, model_path, *, precision=None, config=None, passes=None):
resolved_passes = passes or self._resolve_passes(precision, config)
for qpass in resolved_passes:
model_path = qpass.run(model_path, ...)
return result
Benefits
- Extensibility — adding a new quantization algorithm = adding a new Pass class
- Custom multi-pass — users can pass explicit
passes=[RTNPass(bits=4), FP16Pass()]
- Consistency — mirrors
optim/optimizer.py pipe architecture
- Per-pass config — each pass can have different parameters in multi-pass scenarios
- Pass coordination — passes can inspect previous pass outputs (e.g., FP16 skips already-quantized ops)
Scope
Related
Context
Following the architecture pattern established by
optim/optimizer.py(pipe-based pipeline), refactor the quantizer from standalone functions into a class-based pass pipeline for better extensibility and custom multi-pass support.Current State
quant/quantizer.pyhasquantize_onnx()function with_quantize_fp16,_quantize_rtn,_quantize_qdqhelper functionsw4a16 → [int4, fp16]) is handled internally via_run_multi_pass()Proposed Design
Benefits
passes=[RTNPass(bits=4), FP16Pass()]optim/optimizer.pypipe architectureScope
_quantize_fp16,_quantize_rtn,_quantize_qdqinto Pass classesBaseQuantPassABCquantize_onnx()intoQuantizer.quantize()quantize_onnx()as a thin wrapperRelated
src/winml/modelkit/optim/optimizer.py— reference architecture