Phase 4: Image Generation Experiments

Generate images with multiple quantization budgets and compare quality against the baseline FP16 model.

Quick Start

# Generate 50 images for baseline + 9 budget levels (0.1 to 0.9)
python generate_experiment.py --num_prompts 50

# This creates:
# experiments/2025-10-31_14-30-45_50prompts/
#   baseline/          # FP16 images
#   budget_0.1/        # Heavily quantized
#   budget_0.2/
#   ...
#   budget_0.9/        # Lightly quantized
#   prompts.txt        # List of prompts used
#   config.json        # Experiment metadata

Image Generation Script

The generate_experiment.py script automates the complete evaluation workflow:

Creates timestamped directory in experiments/
Loads N COCO prompts from prompts/coco_val2017.txt
Generates baseline images with FP16 model
For each budget level:
- Runs greedy optimizer to get bit allocation
- Applies mixed-precision quantization
- Generates images with quantized model
- Saves quantization config

Usage Examples

# Basic: 50 prompts, default budgets (0.1 to 0.9)
python generate_experiment.py --num_prompts 50

# Custom budget levels
python generate_experiment.py \
    --num_prompts 100 \
    --budget_levels 0.3 0.5 0.7

# Specify all options
python generate_experiment.py \
    --num_prompts 50 \
    --device cuda \
    --model_path CompVis/stable-diffusion-v1-4 \
    --flops_file results/flops_analysis/flops_analysis_unet.json \
    --sensitivity_file results/sensitivity_analysis/sensitivity_100_prompts.json \
    --budget_levels 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 \
    --seed 42 \
    --prompt_seed 42

CLI Options

Flag	Type	Default	Description
`--num_prompts`	int	required	Number of COCO prompts to use
`--coco_path`	str	`None`	Path to COCO prompts file (default: `prompts/coco_val2017.txt`)
`--model_path`	str	`CompVis/stable-diffusion-v1-4`	Hugging Face model identifier
`--device`	str	`cuda`	Device: `cuda` or `cpu`
`--flops_file`	str	`results/flops_analysis/flops_analysis_unet.json`	Path to FLOPs analysis
`--sensitivity_file`	str	`results/sensitivity_analysis/sensitivity_100_prompts.json`	Path to sensitivity analysis
`--budget_levels`	float[]	`[0.1, 0.2, ..., 0.9]`	Budget multipliers (space-separated)
`--seed`	int	`42`	Random seed for image generation
`--prompt_seed`	int	`42`	Random seed for prompt selection
`--experiment_dir`	str	`experiments`	Base directory for experiments

Output Structure

For --num_prompts 50 with 9 budget levels:

experiments/2025-10-31_14-30-45_50prompts/
├── baseline/
│   ├── image_0000.png
│   ├── image_0001.png
│   └── ... (50 images)
├── budget_0.1/
│   ├── image_0000.png
│   ├── image_0001.png
│   ├── ... (50 images)
│   └── quantization_config.json
├── budget_0.2/
│   └── ... (same structure)
├── budget_0.3/
├── budget_0.4/
├── budget_0.5/
├── budget_0.6/
├── budget_0.7/
├── budget_0.8/
├── budget_0.9/
├── prompts.txt        # List of prompts used
└── config.json        # Experiment metadata

Total images generated: N_prompts × (1 + N_budgets)

Example: 50 prompts × 10 configs = 500 images

Quality Evaluation

After generating images, use evaluate_experiment.py to compute FID and CLIP scores:

# Basic evaluation (prints summary table)
python evaluate_experiment.py experiments/2025-10-31_14-30-45_50prompts/

# Save results to CSV and JSON
python evaluate_experiment.py experiments/2025-10-31_14-30-45_50prompts/ --save_results

# Specify device
python evaluate_experiment.py experiments/2025-10-31_14-30-45_50prompts/ \
    --device cuda \
    --save_results

What It Evaluates

FID (Fréchet Inception Distance): Measures distribution similarity between baseline and quantized images (lower is better)
CLIP Score: Measures semantic consistency between prompts and generated images (higher is better)
Bit-width Distribution: Number of layers at each precision level (4-bit, 8-bit, 16-bit)

Output Files

evaluation_results.csv: Table with all metrics
evaluation_results.json: Complete results with metadata

Example Output

================================================================================
EXPERIMENT EVALUATION SUMMARY
================================================================================
Experiment: 2025-11-03_13-27-00_100prompts
Number of prompts: 100
Baseline CLIP: 0.2960

----------------------------------------------------------------------------------------------------------------------------------
Budget   FID        CLIP         CLIP Δ     Target       Used         Util%    4-bit    8-bit    16-bit
----------------------------------------------------------------------------------------------------------------------------------
0.1      76.958     0.3025       -0.0065    398944.4     398932.1     99.9     14       50       250
0.2      5.936      0.2969       -0.0009    797888.8     797865.3     100.0    7        35       272
0.3      7.143      0.2969       -0.0009    1196833.2    1196798.4    99.9     4        27       283
0.4      -0.000     0.2960       0.0000     1595777.7    1595732.1    99.9     4        10       300
0.5      -0.000     0.2960       0.0000     1994722.1    1994665.8    99.9     3        11       300
0.6      -0.000     0.2960       0.0000     2393666.5    2393599.5    99.9     0        8        306
0.7      -0.000     0.2960       0.0000     2792610.9    2792533.2    99.9     0        4        310
0.8      -0.000     0.2960       0.0000     3191555.3    3191466.9    99.9     0        3        311
0.9      -0.000     0.2960       0.0000     3590499.7    3590400.6    99.9     0        2        312
----------------------------------------------------------------------------------------------------------------------------------

Interpretation:
  - FID: Lower is better (measures distribution similarity between baseline and quantized images)
  - CLIP: Higher is better (measures semantic consistency with prompts)
  - CLIP Δ: Degradation from baseline (lower is better)
  - Target: Target budget in GBOPs (budget_multiplier × max_cost)
  - Used: Actual BOPs used in GBOPs
  - Util%: Budget utilization (Used / Target × 100)
================================================================================

CLI Options

Flag	Type	Default	Description
`experiment_dir`	str	required	Path to experiment directory
`--device`	str	`cuda`	Device: `cuda` or `cpu`
`--save_results`	flag	-	Save results to CSV and JSON
`--output_path`	str	`<experiment_dir>/evaluation_results.csv`	Custom output path

Tips and Best Practices

Prompt Selection

Use --prompt_seed for reproducibility
Start with 50-100 prompts for quick validation
Use 500-1000 prompts for final evaluation

Device Selection

CUDA (NVIDIA): Best performance, recommended
CPU: Very slow, only for testing

Budget Levels

Test 3-5 budgets initially: 0.3 0.5 0.7
Expand to 9 levels for comprehensive analysis: 0.1 0.2 ... 0.9
Focus on range where quality transitions occur

Generation Speed

~10-15 seconds per image (NVIDIA A100)
~20-30 seconds per image (NVIDIA 3090)

Disk Space

Each PNG image: ~1-2 MB
50 prompts × 10 configs = ~500-1000 MB
Plan accordingly for large experiments

Next Steps After Evaluation

Plot quality vs cost curves: Visualize tradeoffs
- X-axis: Computational cost (GBOPs)
- Y-axis: Quality (FID, CLIP)
- Find optimal budget level
Compare bit-width distributions: Analyze allocation patterns
- Which layers are kept at high precision?
- How does allocation change with budget?
Select optimal budget: Balance quality and computational cost
- Budget 0.3-0.4: Often provides best quality/cost tradeoff
- Budget 0.5+: Minimal quality loss, higher cost
- Budget <0.3: Significant quality degradation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 4: Image Generation Experiments

Quick Start

Image Generation Script

Usage Examples

CLI Options

Output Structure

Quality Evaluation

What It Evaluates

Output Files

Example Output

CLI Options

Tips and Best Practices

Prompt Selection

Device Selection

Budget Levels

Generation Speed

Disk Space

Next Steps After Evaluation

FilesExpand file tree

phase4-experiments.md

Latest commit

History

phase4-experiments.md

File metadata and controls

Phase 4: Image Generation Experiments

Quick Start

Image Generation Script

Usage Examples

CLI Options

Output Structure

Quality Evaluation

What It Evaluates

Output Files

Example Output

CLI Options

Tips and Best Practices

Prompt Selection

Device Selection

Budget Levels

Generation Speed

Disk Space

Next Steps After Evaluation