Refactor quantizer into Quantizer class with BaseQuantPass pipeline

## Context

Following the architecture pattern established by `optim/optimizer.py` (pipe-based pipeline), refactor the quantizer from standalone functions into a class-based pass pipeline for better extensibility and custom multi-pass support.

## Current State

- `quant/quantizer.py` has `quantize_onnx()` function with `_quantize_fp16`, `_quantize_rtn`, `_quantize_qdq` helper functions
- Multi-pass logic (`w4a16 → [int4, fp16]`) is handled internally via `_run_multi_pass()`
- Users cannot customize pass ordering or per-pass parameters in multi-pass scenarios

## Proposed Design

```python
# quant/passes/base.py
class BaseQuantPass:
    def should_run(self, config) -> bool: ...
    def build_config(self, precision, config) -> PassConfig: ...
    def run(self, model_path, config) -> Path: ...

# quant/passes/rtn.py, fp16.py, qdq.py
class RTNPass(BaseQuantPass): ...
class FP16Pass(BaseQuantPass): ...
class QDQPass(BaseQuantPass): ...

# quant/quantizer.py
class Quantizer:
    passes = [RTNPass, QDQPass, FP16Pass]

    def quantize(self, model_path, *, precision=None, config=None, passes=None):
        resolved_passes = passes or self._resolve_passes(precision, config)
        for qpass in resolved_passes:
            model_path = qpass.run(model_path, ...)
        return result
```

## Benefits

1. **Extensibility** — adding a new quantization algorithm = adding a new Pass class
2. **Custom multi-pass** — users can pass explicit `passes=[RTNPass(bits=4), FP16Pass()]`
3. **Consistency** — mirrors `optim/optimizer.py` pipe architecture
4. **Per-pass config** — each pass can have different parameters in multi-pass scenarios
5. **Pass coordination** — passes can inspect previous pass outputs (e.g., FP16 skips already-quantized ops)

## Scope

- Extract `_quantize_fp16`, `_quantize_rtn`, `_quantize_qdq` into Pass classes
- Create `BaseQuantPass` ABC
- Refactor `quantize_onnx()` into `Quantizer.quantize()`
- Keep backward-compatible `quantize_onnx()` as a thin wrapper
- Also addresses #963 (cross-pass op coordination)

## Related

- PR #872 (precision-driven quantization — current implementation)
- #963 (fp16_op_block_list cross-pass sharing)
- `src/winml/modelkit/optim/optimizer.py` — reference architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor quantizer into Quantizer class with BaseQuantPass pipeline #964

Context

Current State

Proposed Design

Benefits

Scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Refactor quantizer into Quantizer class with BaseQuantPass pipeline #964

Description

Context

Current State

Proposed Design

Benefits

Scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions