Measure which layers are most sensitive to quantization using COCO prompts and image quality metrics (FID and CLIP).
# Full analysis (all 314 layers)
python main.py --eval_sensitivity --device cuda --sensitivity_bits 4
# Custom configuration
python main.py --eval_sensitivity \
--device cuda \
--sensitivity_bits 8 \
--num_prompts 100 \
--batch_size 50
# Analyze subset of layers (for testing)
python main.py --eval_sensitivity --device cuda --max_layers 10| Flag | Type | Default | Description |
|---|---|---|---|
--max_layers |
int | None (all) |
Maximum layers to analyze |
--sensitivity_bits |
int | 4 |
Quantization bits for testing (4, 8, or 16) |
--num_prompts |
int | 100 |
Number of COCO prompts to use |
--coco_path |
str | None |
Path to COCO prompts file |
--prompt_seed |
int | 42 |
Random seed for prompt shuffling |
--batch_size |
int | 1 |
Batch size for generation |
Sensitivity Analysis Summary
============================================================
Model: CompVis/stable-diffusion-v1-4
Quantization: 4-bit
Test prompts: 5
Layers analyzed: 314
Top 10 Most Sensitive Layers:
Rank Layer Name Sensitivity
-------------------------------------------------------------------
1 conv_in 0.9845
2 time_embedding.linear_2 0.8723
3 down_blocks.0.attentions.0.transformer_blocks... 0.7654
...
Result File: results/sensitivity_analysis_<model>_<bits>bit.json
Min-max normalization with weighted metrics:
- FID (60%): Distribution similarity (higher FID = more degradation)
- CLIP (40%): Semantic consistency (higher degradation = worse)
[0.0, 1.0] after normalization:
- 0.0 - 0.2: Low sensitivity (safe for 4-bit quantization)
- 0.2 - 0.5: Medium sensitivity (consider 8-bit)
- 0.5 - 0.8: High sensitivity (keep at 8-bit or 16-bit)
- 0.8 - 1.0: Critical sensitivity (keep at 16-bit)
- Input/output layers: Highly sensitive (conv_in, conv_out)
- Time embeddings: Critical for diffusion quality
- Attention layers: Variable sensitivity (some robust, others critical)
- Residual blocks: Generally moderate sensitivity
- Higher sensitivity = needs higher precision: Keep these layers at 8-bit or 16-bit
- Lower sensitivity = can use lower precision: Safe to quantize to 4-bit
- Use for mixed-precision planning: Allocate bits wisely in Phase 3
- Review the sensitivity ranking to identify critical layers
- Use these results as input for mixed-precision optimization (Phase 3)
- Consider different bit widths to find the sweet spot for your use case