Skip to content

Latest commit

 

History

History
88 lines (65 loc) · 2.88 KB

File metadata and controls

88 lines (65 loc) · 2.88 KB

Phase 2: Sensitivity Analysis

Measure which layers are most sensitive to quantization using COCO prompts and image quality metrics (FID and CLIP).

Usage

# Full analysis (all 314 layers)
python main.py --eval_sensitivity --device cuda --sensitivity_bits 4

# Custom configuration
python main.py --eval_sensitivity \
    --device cuda \
    --sensitivity_bits 8 \
    --num_prompts 100 \
    --batch_size 50

# Analyze subset of layers (for testing)
python main.py --eval_sensitivity --device cuda --max_layers 10

CLI Options

Flag Type Default Description
--max_layers int None (all) Maximum layers to analyze
--sensitivity_bits int 4 Quantization bits for testing (4, 8, or 16)
--num_prompts int 100 Number of COCO prompts to use
--coco_path str None Path to COCO prompts file
--prompt_seed int 42 Random seed for prompt shuffling
--batch_size int 1 Batch size for generation

Output

Sensitivity Analysis Summary
============================================================
Model: CompVis/stable-diffusion-v1-4
Quantization: 4-bit
Test prompts: 5
Layers analyzed: 314

Top 10 Most Sensitive Layers:
Rank  Layer Name                                       Sensitivity
-------------------------------------------------------------------
1     conv_in                                          0.9845
2     time_embedding.linear_2                          0.8723
3     down_blocks.0.attentions.0.transformer_blocks... 0.7654
...

Result File: results/sensitivity_analysis_<model>_<bits>bit.json

Understanding Sensitivity Scores

Calculation Method

Min-max normalization with weighted metrics:

  • FID (60%): Distribution similarity (higher FID = more degradation)
  • CLIP (40%): Semantic consistency (higher degradation = worse)

Score Range

[0.0, 1.0] after normalization:

  • 0.0 - 0.2: Low sensitivity (safe for 4-bit quantization)
  • 0.2 - 0.5: Medium sensitivity (consider 8-bit)
  • 0.5 - 0.8: High sensitivity (keep at 8-bit or 16-bit)
  • 0.8 - 1.0: Critical sensitivity (keep at 16-bit)

Typical Patterns

  • Input/output layers: Highly sensitive (conv_in, conv_out)
  • Time embeddings: Critical for diffusion quality
  • Attention layers: Variable sensitivity (some robust, others critical)
  • Residual blocks: Generally moderate sensitivity

Interpretation

  • Higher sensitivity = needs higher precision: Keep these layers at 8-bit or 16-bit
  • Lower sensitivity = can use lower precision: Safe to quantize to 4-bit
  • Use for mixed-precision planning: Allocate bits wisely in Phase 3

Next Steps

  1. Review the sensitivity ranking to identify critical layers
  2. Use these results as input for mixed-precision optimization (Phase 3)
  3. Consider different bit widths to find the sweet spot for your use case