Skip to content

Refactor quantizer into Quantizer class with BaseQuantPass pipeline #964

Description

@DingmaomaoBJTU

Context

Following the architecture pattern established by optim/optimizer.py (pipe-based pipeline), refactor the quantizer from standalone functions into a class-based pass pipeline for better extensibility and custom multi-pass support.

Current State

  • quant/quantizer.py has quantize_onnx() function with _quantize_fp16, _quantize_rtn, _quantize_qdq helper functions
  • Multi-pass logic (w4a16 → [int4, fp16]) is handled internally via _run_multi_pass()
  • Users cannot customize pass ordering or per-pass parameters in multi-pass scenarios

Proposed Design

# quant/passes/base.py
class BaseQuantPass:
    def should_run(self, config) -> bool: ...
    def build_config(self, precision, config) -> PassConfig: ...
    def run(self, model_path, config) -> Path: ...

# quant/passes/rtn.py, fp16.py, qdq.py
class RTNPass(BaseQuantPass): ...
class FP16Pass(BaseQuantPass): ...
class QDQPass(BaseQuantPass): ...

# quant/quantizer.py
class Quantizer:
    passes = [RTNPass, QDQPass, FP16Pass]

    def quantize(self, model_path, *, precision=None, config=None, passes=None):
        resolved_passes = passes or self._resolve_passes(precision, config)
        for qpass in resolved_passes:
            model_path = qpass.run(model_path, ...)
        return result

Benefits

  1. Extensibility — adding a new quantization algorithm = adding a new Pass class
  2. Custom multi-pass — users can pass explicit passes=[RTNPass(bits=4), FP16Pass()]
  3. Consistency — mirrors optim/optimizer.py pipe architecture
  4. Per-pass config — each pass can have different parameters in multi-pass scenarios
  5. Pass coordination — passes can inspect previous pass outputs (e.g., FP16 skips already-quantized ops)

Scope

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions