Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/commands/quantize.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ $ winml quantize [options]
|---|---|---|---|---|
| `--model` | `-m` | path | *(required)* | Input ONNX model file. |
| `--output` | `-o` | path | `{input}_qdq.onnx` | Output path for the quantized model. |
| `--task` | | string | — | Task name (e.g., `image-classification`, `text-classification`) used to select a task-appropriate calibration dataset. Pair with `--model-name` so the dataset is preprocessed exactly the way the model expects. Without `--task`, calibration falls back to synthetic random data. |
| `--model-name` | | string | — | HuggingFace model ID (e.g., `microsoft/resnet-50`) used to load the matching preprocessor/tokenizer for calibration. Only used when `--task` is provided. |
| `--task` | | string | — | Task name (e.g., `image-classification`, `text-classification`) used to select a task-appropriate calibration dataset. Pair with `--model-id` so the dataset is preprocessed exactly the way the model expects. Without `--task`, calibration falls back to synthetic random data. |
| `--model-id` | | string | — | HuggingFace model ID (e.g., `microsoft/resnet-50`) used to load the matching preprocessor/tokenizer for calibration. Only used when `--task` is provided. |
| `--precision` | `-p` | string | `None` | Precision shorthand: `int8`, `int16`, or mixed-precision like `w8a16`. Overridden by explicit `--weight-type` / `--activation-type`. |
| `--samples` | | integer | `10` | Number of calibration samples used to compute quantization ranges. |
| `--method` | | choice | `minmax` | Calibration algorithm: `minmax`, `entropy`, or `percentile`. |
Expand All @@ -44,14 +44,14 @@ Precision can be set at a coarse level with `--precision` or tuned per tensor
type with `--weight-type` and `--activation-type`; explicit type flags always
override `--precision`.

Calibration data is selected from `--task` and `--model-name`. For a supported
Calibration data is selected from `--task` and `--model-id`. For a supported
task, a built-in default calibration dataset is loaded and preprocessed through
the model's own tokenizer or image processor, so the calibration tensors match
what the model will see at inference time. For an unsupported task — or when
`--task` is omitted entirely — calibration falls back to synthetic random data
synthesized from the ONNX input specification. Random-data calibration is fast
and always works, but the resulting scales are typically less accurate than
dataset-driven calibration, so always provide `--task` and `--model-name` when
dataset-driven calibration, so always provide `--task` and `--model-id` when
the model task is supported.

## Examples
Expand Down Expand Up @@ -79,7 +79,7 @@ Total time: 4.31s

```bash
# Task-aware calibration: real samples preprocessed through the model's own image processor
winml quantize -m resnet50.onnx --task image-classification --model-name microsoft/resnet-50 --samples 128
winml quantize -m resnet50.onnx --task image-classification --model-id microsoft/resnet-50 --samples 128
```

```bash
Expand All @@ -104,7 +104,7 @@ winml quantize -m bert-base-uncased.onnx --precision int16

## Common pitfalls

- **Calibration uses synthetic random data by default.** Without `--task` and `--model-name`, scales and zero-points are computed from random tensors synthesized from the ONNX input specification — the model never sees realistic activations, so accuracy after quantization can degrade noticeably. Always pass `--task` and `--model-name` for supported tasks (e.g., `--task image-classification --model-name microsoft/resnet-50`) so calibration runs on real samples preprocessed through the model's own tokenizer or image processor.
- **Calibration uses synthetic random data by default.** Without `--task` and `--model-id`, scales and zero-points are computed from random tensors synthesized from the ONNX input specification — the model never sees realistic activations, so accuracy after quantization can degrade noticeably. Always pass `--task` and `--model-id` for supported tasks (e.g., `--task image-classification --model-id microsoft/resnet-50`) so calibration runs on real samples preprocessed through the model's own tokenizer or image processor.
- **`--weight-type` / `--activation-type` silently override `--precision`.** If you pass both, the explicit type flags win. Omit `--precision` when setting types explicitly to avoid confusion.
- **Low sample counts can hurt accuracy.** The default of 10 samples is sufficient for quick testing, but production models typically need 64–256 representative samples for good calibration.
- **`--per-channel` increases model size.** Per-channel quantization stores a separate scale and zero-point per output channel; this can noticeably inflate the model file size compared to per-tensor mode.
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Set to `null` to skip quantization.
| `per_channel` | `bool` | `false` | Per-channel quantization. |
| `symmetric` | `bool` | `false` | Symmetric quantization. |
| `task` | `str \| null` | `null` | Task for dataset-aware calibration. |
| `model_name` | `str \| null` | `null` | Model ID for calibration dataset resolution. |
| `model_id` | `str \| null` | `null` | Model ID for calibration dataset resolution. |
| `dataset_name` | `str \| null` | `null` | Override calibration dataset. |
| `distribution` | `str` | `"uniform"` | Random distribution for dummy data. |
| `seed` | `int \| null` | `null` | Random seed for reproducibility. |
Expand Down
2 changes: 1 addition & 1 deletion docs/samples/bert-config-build.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ This writes a `WinMLBuildConfig` JSON file to `bert_config.json`. The file captu
"samples": 10,
"calibration_method": "minmax",
"task": "text-classification",
"model_name": "bert-base-uncased"
"model_id": "bert-base-uncased"
... // truncated: per_channel, symmetric, distribution, ...
},
"compile": null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "sentence-similarity",
"model_name": "BAAI/bge-large-en-v1.5"
"model_id": "BAAI/bge-large-en-v1.5"
},
"loader": {
"task": "sentence-similarity",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "feature-extraction",
"model_name": "BAAI/bge-m3"
"model_id": "BAAI/bge-m3"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "sentence-similarity",
"model_name": "BAAI/bge-m3"
"model_id": "BAAI/bge-m3"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "FacebookAI/roberta-base"
"model_id": "FacebookAI/roberta-base"
},
"loader": {
"task": "fill-mask",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "FacebookAI/roberta-large"
"model_id": "FacebookAI/roberta-large"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "FacebookAI/xlm-roberta-base"
"model_id": "FacebookAI/xlm-roberta-base"
},
"loader": {
"task": "fill-mask",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "token-classification",
"model_name": "Isotonic/distilbert_finetuned_ai4privacy_v2"
"model_id": "Isotonic/distilbert_finetuned_ai4privacy_v2"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "StanfordAIMI/dinov2-base-xray-224"
"model_id": "StanfordAIMI/dinov2-base-xray-224"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "ahotrod/electra_large_discriminator_squad2_512"
"model_id": "ahotrod/electra_large_discriminator_squad2_512"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-classification",
"model_name": "apple/mobilevit-small"
"model_id": "apple/mobilevit-small"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "text-classification",
"model_name": "cardiffnlp/twitter-roberta-base-sentiment-latest"
"model_id": "cardiffnlp/twitter-roberta-base-sentiment-latest"
},
"loader": {
"task": "text-classification",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "token-classification",
"model_name": "dbmdz/bert-large-cased-finetuned-conll03-english"
"model_id": "dbmdz/bert-large-cased-finetuned-conll03-english"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "deepset/bert-large-uncased-whole-word-masking-squad2"
"model_id": "deepset/bert-large-uncased-whole-word-masking-squad2"
},
"loader": {
"task": "question-answering",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "deepset/roberta-base-squad2"
"model_id": "deepset/roberta-base-squad2"
},
"loader": {
"task": "question-answering",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "deepset/tinyroberta-squad2"
"model_id": "deepset/tinyroberta-squad2"
},
"loader": {
"task": "question-answering",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-classification",
"model_name": "dima806/fairface_age_image_detection"
"model_id": "dima806/fairface_age_image_detection"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "distilbert/distilbert-base-cased-distilled-squad"
"model_id": "distilbert/distilbert-base-cased-distilled-squad"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "distilbert/distilbert-base-uncased-distilled-squad"
"model_id": "distilbert/distilbert-base-uncased-distilled-squad"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "text-classification",
"model_name": "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
"model_id": "distilbert/distilbert-base-uncased-finetuned-sst-2-english"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "distilbert/distilbert-base-uncased"
"model_id": "distilbert/distilbert-base-uncased"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "facebook/dino-vitb16"
"model_id": "facebook/dino-vitb16"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "facebook/dino-vits16"
"model_id": "facebook/dino-vits16"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "facebook/dinov2-base"
"model_id": "facebook/dinov2-base"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "facebook/dinov2-large"
"model_id": "facebook/dinov2-large"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "facebook/dinov2-small"
"model_id": "facebook/dinov2-small"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "google-bert/bert-base-multilingual-cased"
"model_id": "google-bert/bert-base-multilingual-cased"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "google-bert/bert-base-multilingual-uncased"
"model_id": "google-bert/bert-base-multilingual-uncased"
},
"loader": {
"task": "fill-mask",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "fill-mask",
"model_name": "google-bert/bert-base-uncased"
"model_id": "google-bert/bert-base-uncased"
},
"loader": {
"task": "fill-mask",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "question-answering",
"model_name": "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad"
"model_id": "google-bert/bert-large-uncased-whole-word-masking-finetuned-squad"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "google/vit-base-patch16-224-in21k"
"model_id": "google/vit-base-patch16-224-in21k"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-classification",
"model_name": "google/vit-base-patch16-224"
"model_id": "google/vit-base-patch16-224"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "zero-shot-classification",
"model_name": "joeddav/xlm-roberta-large-xnli"
"model_id": "joeddav/xlm-roberta-large-xnli"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "feature-extraction",
"model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
"model_id": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
},
"loader": {
"task": "feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-segmentation",
"model_name": "mattmdjaga/segformer_b2_clothes"
"model_id": "mattmdjaga/segformer_b2_clothes"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-feature-extraction",
"model_name": "microsoft/rad-dino"
"model_id": "microsoft/rad-dino"
},
"loader": {
"task": "image-feature-extraction",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-classification",
"model_name": "microsoft/resnet-18"
"model_id": "microsoft/resnet-18"
},
"compile": null,
"loader": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
"op_types_to_quantize": null,
"nodes_to_exclude": null,
"task": "image-classification",
"model_name": "microsoft/resnet-50"
"model_id": "microsoft/resnet-50"
},
"compile": null,
"loader": {
Expand Down
Loading
Loading