Automatically generate optimal mixed-precision quantization configurations within a computational budget. The greedy "Bang-for-Buck" algorithm maximizes quality per unit of computational cost.
The optimizer solves the Multiple-Choice Knapsack Problem for bit allocation:
- Start at minimum cost: All layers initialized to 4-bit quantization
- Calculate efficiency ratios: For each possible upgrade, compute
sensitivity / cost_increase - Greedy selection: Iteratively upgrade layers with highest efficiency ratio
- Budget constraint: Stop when computational budget is exhausted
Key Insight: Layers with high sensitivity and low computational cost provide the best "bang for buck" when upgraded to higher precision.
# Run Phase 3 optimization
python main.py --optimize_mixed_precision \
--flops_file results/flops_analysis/flops_analysis_unet.json \
--sensitivity_file results/sensitivity_analysis/sensitivity_100_prompts.json \
--budget_multiplier 0.5
# Try different budget levels
python main.py --optimize_mixed_precision --budget_multiplier 0.3 # Conservative
python main.py --optimize_mixed_precision --budget_multiplier 0.5 # Balanced
python main.py --optimize_mixed_precision --budget_multiplier 0.7 # AggressiveThe --budget_multiplier parameter controls the computational budget as a percentage of maximum BOPs:
Formula: budget = multiplier × max_cost
Where max_cost is the total BOPs when all layers are at 16-bit (~3989 GBOPs for SD 1.4)
- 0.0: 0 GBOPs (0% of max)
- 0.3: 1,197 GBOPs (30% of max)
- 0.5: 1,995 GBOPs (50% of max)
- 0.75: 2,992 GBOPs (75% of max)
- 1.0: 3,989 GBOPs (100% of max, all layers at 16-bit)
The multiplier is simply a percentage. The greedy algorithm will automatically allocate bits to maximize quality within the given budget. Lower budgets will result in more 4-bit layers, higher budgets will allow more 8-bit and 16-bit layers.
======================================================================
GREEDY BANG-FOR-BUCK OPTIMIZATION
======================================================================
Budget multiplier: 0.50
Target budget: 997361.03 GBOPs
Min cost (all 4-bit): 249340.26 GBOPs
Max cost (all 16-bit): 3989444.14 GBOPs
Total possible upgrades: 628
Running greedy selection...
Iteration 50: Cost = 249340.74 / 997361.03 GBOPs
Iteration 100: Cost = 249342.16 / 997361.03 GBOPs
...
Iteration 600: Cost = 748633.61 / 997361.03 GBOPs
Optimization complete after 611 upgrades
Final cost: 967934.64 GBOPs
Budget utilization: 97.0%
Bit-width Distribution:
4-bit: 3 layers ( 1.0%)
8-bit: 11 layers ( 3.5%)
16-bit: 300 layers ( 95.5%)
Mixed-precision optimization completed!
Config saved to: results/mixed_precision/quantization_config_0.50.json
{
"metadata": {
"budget_multiplier": 0.5,
"num_layers": 314,
"flops_file": "results/flops_analysis/flops_analysis_unet.json",
"sensitivity_file": "results/sensitivity_analysis/sensitivity_100_prompts.json"
},
"bit_allocation": {
"conv_in": 16,
"time_embedding.linear_1": 8,
"down_blocks.0.resnets.0.conv1": 4,
...
},
"quantization_config": {
"conv_in": {"wbit": 16, "abit": 16},
"time_embedding.linear_1": {"wbit": 8, "abit": 12},
"down_blocks.0.resnets.0.conv1": {"wbit": 4, "abit": 8},
...
}
}Note: Activation bits (abit) are automatically set using the heuristic abit = min(wbit + 4, 16) for better quality.
Measures how efficiently the budget is used:
- >95%: Excellent - budget fully utilized
- 80-95%: Good - some budget left unused
- <80%: Suboptimal - may indicate issue with constraints
Shows allocation across layers:
- Conservative (0.3): Most layers at 4-bit, critical layers upgraded
- Balanced (0.5): Mix of 4/8/16-bit based on sensitivity
- Aggressive (0.7): Most layers at 8/16-bit for maximum quality
- Budget 0.3: ~5-10% layers at 8/16-bit, rest at 4-bit
- Budget 0.5: ~10-20% layers at 8/16-bit, rest at 4-bit
- Budget 0.7: ~30-50% layers at 8/16-bit, rest at 4-bit
For each possible upgrade (e.g., 4→8 or 8→16):
efficiency = sensitivity_score / (cost_new - cost_old)
Upgrades from 8→16 bit provide less quality improvement:
efficiency_8to16 = (sensitivity * 0.2) / cost_increase
At each iteration:
- Find all valid upgrades (layer at correct starting bit-width)
- Filter upgrades that fit within remaining budget
- Select upgrade with highest efficiency ratio
- Apply upgrade and update current cost
Termination: Algorithm stops when no more upgrades fit within budget.
| Flag | Type | Default | Description |
|---|---|---|---|
--optimize_mixed_precision |
flag | - | Run Phase 3 optimization (required) |
--flops_file |
str | results/flops_analysis/flops_analysis_unet.json |
Path to FLOPs analysis JSON |
--sensitivity_file |
str | results/sensitivity_analysis/sensitivity_100_prompts.json |
Path to sensitivity analysis JSON |
--budget_multiplier |
float | 0.5 |
Budget multiplier (0.0 to 1.0) |
--output_dir |
str | results |
Output directory |
--no_save |
flag | - | Don't save results |
--quiet |
flag | - | Suppress detailed output |
After generating optimal configurations:
- Apply quantization to your Stable Diffusion model using the generated config
- Generate images with the quantized model (see Phase 4)
- Measure quality using FID and CLIP scores
- Compare results across different budget levels
- Select optimal budget that balances quality and computational cost