Produces a fully INT8 quantized TFLite model ready for
NXP i.MX 8M Plus NPU deployment with TFLite delegate.
phase1/
├── run_pipeline.py ← run everything in one command
├── config.yaml ← all settings
├── requirements.txt
├── yolov7-tiny.pt ← PUT THIS HERE (download below)
├── yolov7-main/ ← PUT THIS HERE (from Nick's files)
├── calibration_images/ ← PUT IMAGES HERE (copy from yolov7-main/inference/images/)
├── scripts/
│ ├── utils.py ← shared helpers (auto-used, don't run directly)
│ ├── step1_sensitivity.py
│ ├── step2_selective_ptq.py
│ ├── step3_export_onnx.py
│ ├── step4_export_tflite.py
│ └── step5_benchmark.py
├── results/ ← auto-created
├── quantized_models/ ← auto-created
└── benchmark_reports/ ← auto-created
pip install -r requirements.txtPython 3.9, 3.10, or 3.11 recommended.
Copy the yolov7-main folder (from Nick's files) into phase1/:
phase1/
└── yolov7-main/
├── models/
├── utils/
└── ...
Open yolov7-main/models/experimental.py and find line ~252:
ckpt = torch.load(w, map_location=map_location)Change it to:
ckpt = torch.load(w, map_location=map_location, weights_only=False)Save and close.
# Option A — command line
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
# Option B — browser
# Go to: https://github.com/WongKinYiu/yolov7/releases/tag/v0.1
# Download yolov7-tiny.pt
# Place it in phase1/# Copy the 6 sample images (enough for Phase 1)
cp yolov7-main/inference/images/* calibration_images/# Make sure you are in the phase1/ folder
cd phase1
# Run all 5 steps
python run_pipeline.py
# Or run from a specific step (useful if one step fails)
python run_pipeline.py 2 # start from step 2
python run_pipeline.py 4 # start from step 4 (TFLite)Or run steps individually:
python scripts/step1_sensitivity.py
python scripts/step2_selective_ptq.py
python scripts/step3_export_onnx.py
python scripts/step4_export_tflite.py
python scripts/step5_benchmark.py| Step | Script | Time | Output |
|---|---|---|---|
| 1 | step1_sensitivity.py | ~5-10 min | layer_sensitivity_report.csv, selective_quant_plan.csv |
| 2 | step2_selective_ptq.py | ~1 min | yolov7_tiny_selective_int8.pt, ptq_layer_summary.json |
| 3 | step3_export_onnx.py | ~1 min | yolov7_tiny_fp32.onnx |
| 4 | step4_export_tflite.py | ~2-3 min | yolov7_tiny_fp32.tflite, yolov7_tiny_int8.tflite |
| 5 | step5_benchmark.py | ~2 min | benchmark_report.json, benchmark_report.txt |
results/
├── layer_sensitivity_report.csv ← every layer scored: keep_fp32 / quantize
├── selective_quant_plan.csv ← the plan applied in Step 2
├── quantization_summary.json ← top fragile layers, stats
└── ptq_layer_summary.json ← which layers were actually quantized
quantized_models/
├── yolov7_tiny_fp32.onnx ← FP32 ONNX
├── yolov7_tiny_fp32.tflite ← FP32 TFLite (baseline)
└── yolov7_tiny_int8.tflite ← INT8 TFLite — deploy this on NXP
benchmark_reports/
├── benchmark_report.json
└── benchmark_report.txt ← human-readable before/after summary
| Column | What it means |
|---|---|
recommendation |
keep_fp32 = too fragile / quantize_candidate = safe to quantize |
proxy_cosine |
How similar output is after fake-quantizing. 1.0 = no change |
proxy_relative_mae |
Output error after quantization. Below 2% = safe |
sensitivity_score |
Higher = more dangerous to quantize |
- Size reduction — how much smaller INT8 TFLite is vs FP32
- Latency — measured on your CPU (workstation). On NXP NPU will be faster
- Output cosine similarity — proxy for output quality. 0.999+ = near-identical
- Detection delta — difference in number of detections
TF is large. If you can't install it locally, run Step 4 on Google Colab:
- Upload
quantized_models/yolov7_tiny_fp32.onnxandcalibration_images/to Colab - In Colab:
!pip install tensorflow onnx-tf onnx
# upload step4_export_tflite.py and config.yaml
# run it- Download
yolov7_tiny_int8.tfliteback - Run Step 5 locally
No module named 'models'
→ yolov7_repo_dir in config.yaml is wrong. Must point to the folder containing models/
UnpicklingError or WeightsOnly error on torch.load
→ Apply the weights_only=False fix in Step 3 of Setup above
No images found in calibration_images/
→ Copy images into phase1/calibration_images/
ONNX not found when running step4
→ Run step3 first
qnnpack backend error on Windows
→ Change backend: "qnnpack" to backend: "fbgemm" in config.yaml
Phase 1 — YOLOv7-tiny Selective Quantization