Skip to content

hassan-hfk/YOLOv7-Selective-Quantization

Repository files navigation

YOLOv7-tiny Selective Quantization — Phase 1

Produces a fully INT8 quantized TFLite model ready for
NXP i.MX 8M Plus NPU deployment with TFLite delegate.


Folder Structure

phase1/
├── run_pipeline.py              ← run everything in one command
├── config.yaml                  ← all settings
├── requirements.txt
├── yolov7-tiny.pt               ← PUT THIS HERE (download below)
├── yolov7-main/                 ← PUT THIS HERE (from Nick's files)
├── calibration_images/          ← PUT IMAGES HERE (copy from yolov7-main/inference/images/)
├── scripts/
│   ├── utils.py                 ← shared helpers (auto-used, don't run directly)
│   ├── step1_sensitivity.py
│   ├── step2_selective_ptq.py
│   ├── step3_export_onnx.py
│   ├── step4_export_tflite.py
│   └── step5_benchmark.py
├── results/                     ← auto-created
├── quantized_models/            ← auto-created
└── benchmark_reports/           ← auto-created

Setup — Do This Once

1. Install Python packages

pip install -r requirements.txt

Python 3.9, 3.10, or 3.11 recommended.

2. Place yolov7-main folder

Copy the yolov7-main folder (from Nick's files) into phase1/:

phase1/
└── yolov7-main/
    ├── models/
    ├── utils/
    └── ...

3. Fix one line in yolov7-main (required for PyTorch 2.0+)

Open yolov7-main/models/experimental.py and find line ~252:

ckpt = torch.load(w, map_location=map_location)

Change it to:

ckpt = torch.load(w, map_location=map_location, weights_only=False)

Save and close.

4. Download yolov7-tiny.pt weights

# Option A — command line
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt

# Option B — browser
# Go to: https://github.com/WongKinYiu/yolov7/releases/tag/v0.1
# Download yolov7-tiny.pt
# Place it in phase1/

5. Add calibration images

# Copy the 6 sample images (enough for Phase 1)
cp yolov7-main/inference/images/* calibration_images/

Run the Pipeline

# Make sure you are in the phase1/ folder
cd phase1

# Run all 5 steps
python run_pipeline.py

# Or run from a specific step (useful if one step fails)
python run_pipeline.py 2    # start from step 2
python run_pipeline.py 4    # start from step 4 (TFLite)

Or run steps individually:

python scripts/step1_sensitivity.py
python scripts/step2_selective_ptq.py
python scripts/step3_export_onnx.py
python scripts/step4_export_tflite.py
python scripts/step5_benchmark.py

What Each Step Does

Step Script Time Output
1 step1_sensitivity.py ~5-10 min layer_sensitivity_report.csv, selective_quant_plan.csv
2 step2_selective_ptq.py ~1 min yolov7_tiny_selective_int8.pt, ptq_layer_summary.json
3 step3_export_onnx.py ~1 min yolov7_tiny_fp32.onnx
4 step4_export_tflite.py ~2-3 min yolov7_tiny_fp32.tflite, yolov7_tiny_int8.tflite
5 step5_benchmark.py ~2 min benchmark_report.json, benchmark_report.txt

Final Outputs

results/
├── layer_sensitivity_report.csv   ← every layer scored: keep_fp32 / quantize
├── selective_quant_plan.csv       ← the plan applied in Step 2
├── quantization_summary.json      ← top fragile layers, stats
└── ptq_layer_summary.json         ← which layers were actually quantized

quantized_models/
├── yolov7_tiny_fp32.onnx          ← FP32 ONNX
├── yolov7_tiny_fp32.tflite        ← FP32 TFLite (baseline)
└── yolov7_tiny_int8.tflite        ← INT8 TFLite — deploy this on NXP

benchmark_reports/
├── benchmark_report.json
└── benchmark_report.txt           ← human-readable before/after summary

Reading the Results

layer_sensitivity_report.csv

Column What it means
recommendation keep_fp32 = too fragile / quantize_candidate = safe to quantize
proxy_cosine How similar output is after fake-quantizing. 1.0 = no change
proxy_relative_mae Output error after quantization. Below 2% = safe
sensitivity_score Higher = more dangerous to quantize

benchmark_report.txt

  • Size reduction — how much smaller INT8 TFLite is vs FP32
  • Latency — measured on your CPU (workstation). On NXP NPU will be faster
  • Output cosine similarity — proxy for output quality. 0.999+ = near-identical
  • Detection delta — difference in number of detections

If TensorFlow Won't Install (Step 4)

TF is large. If you can't install it locally, run Step 4 on Google Colab:

  1. Upload quantized_models/yolov7_tiny_fp32.onnx and calibration_images/ to Colab
  2. In Colab:
!pip install tensorflow onnx-tf onnx
# upload step4_export_tflite.py and config.yaml
# run it
  1. Download yolov7_tiny_int8.tflite back
  2. Run Step 5 locally

Common Errors

No module named 'models'
yolov7_repo_dir in config.yaml is wrong. Must point to the folder containing models/

UnpicklingError or WeightsOnly error on torch.load
→ Apply the weights_only=False fix in Step 3 of Setup above

No images found in calibration_images/
→ Copy images into phase1/calibration_images/

ONNX not found when running step4
→ Run step3 first

qnnpack backend error on Windows
→ Change backend: "qnnpack" to backend: "fbgemm" in config.yaml


Phase 1 — YOLOv7-tiny Selective Quantization

About

INT8 selective quantization pipeline for YOLOv7-tiny, targets NXP i.MX 8M Plus NPU via TFLite

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages