Skip to content

jam2miller/EdgeKeywordDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Commands CNN Model - Quantization Project

This project implements and evaluates CNN models for speech command recognition using the Speech Commands v0.02 dataset, with a focus on Post-Training Quantization (PTQ) for model compression and deployment.

Project Overview

  • Dataset: Speech Commands v0.02 (36 classes)
  • Model: CNN with 3 convolutional blocks (1.2M parameters)
  • Framework: TensorFlow/Keras (Keras 3)
  • Quantization: Post-Training Quantization (PTQ) to INT8
  • Results: 93.49% accuracy with 11.73x model compression

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Download Dataset

If you don't have the dataset yet, download it:

cd Code
python -c "from utils import download_files; download_files()"

This will download and extract the Speech Commands v0.02 dataset (~2.3 GB) to Code/data/speech_commands_v0_extracted/.

Note: The dataset must be at Code/data/speech_commands_v0_extracted/ for the test scripts to work.

3. Test the FP32 Model

cd Code
python test_tf_keras_model.py

Expected output: ~93.46% test accuracy

4. Test the Quantized (PTQ) Model

cd Code
python test_ptq_model.py

Expected output: ~93.49% INT8 accuracy, 11.73x compression

5. Test the Quantization-Aware Training (QAT) Model

cd Code
python test_qat_model.py

Expected output: ~90.12% INT8 accuracy, 11.79x compression

Project Structure

Key Files

Test Scripts

  • test_tf_keras_model.py - Tests the FP32 TensorFlow/Keras model
  • test_ptq_model.py - Tests and compares FP32 vs INT8 PTQ quantized models
  • test_qat_model.py - Tests and compares FP32 vs INT8 QAT quantized models

Quantization Pipeline

  • ptq_pipeline_tf_keras.py - Post-Training Quantization pipeline (generates quantized model)

Model Files

  • model_weights/tf_keras_weights.h5 - Trained FP32 model weights (14.08 MB)
  • model_weights/model_ptq_int8.tflite - Quantized INT8 model (1.20 MB)
  • model_weights/model_fp32.keras - Full FP32 model (saved format)

Core Code

  • preprocessing.py - Audio preprocessing (STFT → mel spectrogram → log scale)
  • utils.py - Dataset loading and train/test split utilities
  • conv_block_model.py - CNN model architecture definition

Training Notebooks

  • tf_keras_files/tf_keras_model.ipynb - TensorFlow/Keras training notebook
  • qat_pipeline_final.ipynb - Quantization Aware Training (QAT) notebook (PyTorch)
  • temp_file_pytorch_backend.ipynb - PyTorch training notebook

Dataset

  • Code/data/speech_commands_v0_extracted/ - Extracted Speech Commands dataset
    • 36 classes (yes, no, up, down, left, right, etc.)
    • ~105,835 audio files
    • 70/15/15 train/val/test split

Detailed Usage

Testing the FP32 Model

Purpose: Verify the trained FP32 model works correctly.

cd Code
python test_tf_keras_model.py

What it does:

  1. Loads the model architecture
  2. Loads weights from tf_keras_files/tf_keras_weights.h5
  3. Loads and preprocesses test data
  4. Evaluates model accuracy
  5. Shows 10 sample predictions

Expected output:

Test accuracy = 0.9346
Sample 0: True = wow | Predicted = wow
...

Testing the Quantized Model

Purpose: Compare FP32 vs INT8 quantized model performance.

cd Code
python test_ptq_model.py

What it does:

  1. Loads FP32 model and quantized TFLite model
  2. Evaluates both on the same test set
  3. Compares accuracies and model sizes
  4. Shows side-by-side sample predictions

Expected output:

FP32 Accuracy:      93.4551% (0.934551)
INT8 Accuracy:      93.4929% (0.934929)
Accuracy Drop:      -0.0378% (-0.04%)
Model Size (FP32):  14.08 MB
Model Size (INT8):  1.20 MB
Compression Ratio:  11.73x

Testing the QAT Model

Purpose: Compare FP32 vs INT8 QAT model performance.

cd Code
python test_qat_model.py

What it does:

  1. Loads FP32 model and QAT TFLite model
  2. Evaluates both on the same test set
  3. Compares accuracies and model sizes
  4. Shows side-by-side sample predictions

Expected output:

 FP32 Accuracy:      93.4299% (0.934299)
 INT8 Accuracy:      90.1228% (0.901228)
 Accuracy Drop:      3.3071% (+3.31%)
 Model Size (FP32):  14.08 MB
 Model Size (INT8):  1.19 MB
 Compression Ratio:  11.79x

Running PTQ Pipeline

Purpose: Generate a new quantized model from the FP32 model.

cd Code
python ptq_pipeline_tf_keras.py

What it does:

  1. Loads the FP32 model
  2. Creates calibration dataset (200 samples)
  3. Applies INT8 quantization using TensorFlow Lite
  4. Saves quantized model to tf_keras_files/model_ptq_int8.tflite
  5. Evaluates both models and shows comparison

Note: This takes several minutes due to calibration and conversion.

Model Architecture

The CNN model consists of:

  • 3 Convolutional Blocks:
    • Block 1: 64 filters, stride 2
    • Block 2: 128 filters
    • Block 3: 256 filters
  • Each block: Conv2D → BatchNorm → ReLU → MaxPooling → SpatialDropout
  • Global Average Pooling
  • Dense layers: 256 → 36 (softmax output)

Input: Mel spectrogram (128×64×1)
Output: 36 class probabilities

Preprocessing Pipeline

The preprocessing matches the training pipeline:

  1. Audio normalization (int16 → float32, /32768.0)
  2. STFT (frame_length=256, frame_step=128)
  3. Mel spectrogram (64 mel bins, 0-8000 Hz)
  4. Log scale (log(x + 1e-6))
  5. Resize to (128, 64, 1)

Results Summary

Metric FP32 Model INT8 Model
Accuracy 93.46% 93.49%
Model Size 14.08 MB 1.20 MB
Compression 1x 11.73x
Format H5 weights TFLite

See "Testing the Quantized Model" section above for detailed comparison output.

Troubleshooting

Model weights not loading

  • Ensure tf_keras_files/tf_keras_weights.h5 exists
  • The script will try manual HDF5 loading if standard loading fails

Dataset not found

  • Ensure Code/data/speech_commands_v0_extracted/ exists
  • If missing, download the dataset:
    cd Code
    python -c "from utils import download_files; download_files()"

Quantized model not found

  • Run ptq_pipeline_tf_keras.py first to generate the TFLite model
  • Check that tf_keras_files/model_ptq_int8.tflite exists

Import errors

  • Ensure all dependencies are installed: pip install -r requirements.txt
  • Use Keras 3 with TensorFlow backend (set KERAS_BACKEND=tensorflow)

Dependencies

See requirements.txt for full list. Core dependencies:

  • tensorflow>=2.15.0
  • keras>=3.0.0
  • numpy>=1.24.0
  • h5py>=3.8.0

Citation

If you use this code, please cite:

  • Speech Commands Dataset: Warden, P. (2018). "Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition"
  • TensorFlow Lite: Google (2024). "TensorFlow Lite: Machine Learning for Mobile and Edge Devices"

About

Final Project Code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors