Skip to content

Latest commit

 

History

History
180 lines (132 loc) · 5.94 KB

File metadata and controls

180 lines (132 loc) · 5.94 KB

Phase 3: Bang-for-Buck Mixed-Precision Optimization

Automatically generate optimal mixed-precision quantization configurations within a computational budget. The greedy "Bang-for-Buck" algorithm maximizes quality per unit of computational cost.

How It Works

The optimizer solves the Multiple-Choice Knapsack Problem for bit allocation:

  1. Start at minimum cost: All layers initialized to 4-bit quantization
  2. Calculate efficiency ratios: For each possible upgrade, compute sensitivity / cost_increase
  3. Greedy selection: Iteratively upgrade layers with highest efficiency ratio
  4. Budget constraint: Stop when computational budget is exhausted

Key Insight: Layers with high sensitivity and low computational cost provide the best "bang for buck" when upgraded to higher precision.

Usage

# Run Phase 3 optimization
python main.py --optimize_mixed_precision \
    --flops_file results/flops_analysis/flops_analysis_unet.json \
    --sensitivity_file results/sensitivity_analysis/sensitivity_100_prompts.json \
    --budget_multiplier 0.5

# Try different budget levels
python main.py --optimize_mixed_precision --budget_multiplier 0.3  # Conservative
python main.py --optimize_mixed_precision --budget_multiplier 0.5  # Balanced
python main.py --optimize_mixed_precision --budget_multiplier 0.7  # Aggressive

Budget Multiplier

The --budget_multiplier parameter controls the computational budget as a percentage of maximum BOPs:

Formula: budget = multiplier × max_cost

Where max_cost is the total BOPs when all layers are at 16-bit (~3989 GBOPs for SD 1.4)

Examples (for SD 1.4 U-Net)

  • 0.0: 0 GBOPs (0% of max)
  • 0.3: 1,197 GBOPs (30% of max)
  • 0.5: 1,995 GBOPs (50% of max)
  • 0.75: 2,992 GBOPs (75% of max)
  • 1.0: 3,989 GBOPs (100% of max, all layers at 16-bit)

The multiplier is simply a percentage. The greedy algorithm will automatically allocate bits to maximize quality within the given budget. Lower budgets will result in more 4-bit layers, higher budgets will allow more 8-bit and 16-bit layers.

Output

======================================================================
GREEDY BANG-FOR-BUCK OPTIMIZATION
======================================================================

Budget multiplier: 0.50
Target budget: 997361.03 GBOPs
Min cost (all 4-bit): 249340.26 GBOPs
Max cost (all 16-bit): 3989444.14 GBOPs

Total possible upgrades: 628
Running greedy selection...
  Iteration 50: Cost = 249340.74 / 997361.03 GBOPs
  Iteration 100: Cost = 249342.16 / 997361.03 GBOPs
  ...
  Iteration 600: Cost = 748633.61 / 997361.03 GBOPs

Optimization complete after 611 upgrades
Final cost: 967934.64 GBOPs
Budget utilization: 97.0%

Bit-width Distribution:
  4-bit:    3 layers (  1.0%)
  8-bit:   11 layers (  3.5%)
  16-bit: 300 layers ( 95.5%)

Mixed-precision optimization completed!
   Config saved to: results/mixed_precision/quantization_config_0.50.json

Output Files

1. Quantization Config

{
  "metadata": {
    "budget_multiplier": 0.5,
    "num_layers": 314,
    "flops_file": "results/flops_analysis/flops_analysis_unet.json",
    "sensitivity_file": "results/sensitivity_analysis/sensitivity_100_prompts.json"
  },
  "bit_allocation": {
    "conv_in": 16,
    "time_embedding.linear_1": 8,
    "down_blocks.0.resnets.0.conv1": 4,
    ...
  },
  "quantization_config": {
    "conv_in": {"wbit": 16, "abit": 16},
    "time_embedding.linear_1": {"wbit": 8, "abit": 12},
    "down_blocks.0.resnets.0.conv1": {"wbit": 4, "abit": 8},
    ...
  }
}

Note: Activation bits (abit) are automatically set using the heuristic abit = min(wbit + 4, 16) for better quality.

Interpretation

Budget Utilization

Measures how efficiently the budget is used:

  • >95%: Excellent - budget fully utilized
  • 80-95%: Good - some budget left unused
  • <80%: Suboptimal - may indicate issue with constraints

Bit-width Distribution

Shows allocation across layers:

  • Conservative (0.3): Most layers at 4-bit, critical layers upgraded
  • Balanced (0.5): Mix of 4/8/16-bit based on sensitivity
  • Aggressive (0.7): Most layers at 8/16-bit for maximum quality

Typical Results (314-layer U-Net)

  • Budget 0.3: ~5-10% layers at 8/16-bit, rest at 4-bit
  • Budget 0.5: ~10-20% layers at 8/16-bit, rest at 4-bit
  • Budget 0.7: ~30-50% layers at 8/16-bit, rest at 4-bit

Algorithm Details

Efficiency Ratio

For each possible upgrade (e.g., 4→8 or 8→16):

efficiency = sensitivity_score / (cost_new - cost_old)

Diminishing Returns

Upgrades from 8→16 bit provide less quality improvement:

efficiency_8to16 = (sensitivity * 0.2) / cost_increase

Greedy Selection

At each iteration:

  1. Find all valid upgrades (layer at correct starting bit-width)
  2. Filter upgrades that fit within remaining budget
  3. Select upgrade with highest efficiency ratio
  4. Apply upgrade and update current cost

Termination: Algorithm stops when no more upgrades fit within budget.

CLI Options

Flag Type Default Description
--optimize_mixed_precision flag - Run Phase 3 optimization (required)
--flops_file str results/flops_analysis/flops_analysis_unet.json Path to FLOPs analysis JSON
--sensitivity_file str results/sensitivity_analysis/sensitivity_100_prompts.json Path to sensitivity analysis JSON
--budget_multiplier float 0.5 Budget multiplier (0.0 to 1.0)
--output_dir str results Output directory
--no_save flag - Don't save results
--quiet flag - Suppress detailed output

Next Steps

After generating optimal configurations:

  1. Apply quantization to your Stable Diffusion model using the generated config
  2. Generate images with the quantized model (see Phase 4)
  3. Measure quality using FID and CLIP scores
  4. Compare results across different budget levels
  5. Select optimal budget that balances quality and computational cost