Speech Commands CNN Model - Quantization Project

This project implements and evaluates CNN models for speech command recognition using the Speech Commands v0.02 dataset, with a focus on Post-Training Quantization (PTQ) for model compression and deployment.

Project Overview

Dataset: Speech Commands v0.02 (36 classes)
Model: CNN with 3 convolutional blocks (1.2M parameters)
Framework: TensorFlow/Keras (Keras 3)
Quantization: Post-Training Quantization (PTQ) to INT8
Results: 93.49% accuracy with 11.73x model compression

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Download Dataset

If you don't have the dataset yet, download it:

cd Code
python -c "from utils import download_files; download_files()"

This will download and extract the Speech Commands v0.02 dataset (~2.3 GB) to Code/data/speech_commands_v0_extracted/.

Note: The dataset must be at Code/data/speech_commands_v0_extracted/ for the test scripts to work.

3. Test the FP32 Model

cd Code
python test_tf_keras_model.py

Expected output: ~93.46% test accuracy

4. Test the Quantized (PTQ) Model

cd Code
python test_ptq_model.py

Expected output: ~93.49% INT8 accuracy, 11.73x compression

5. Test the Quantization-Aware Training (QAT) Model

cd Code
python test_qat_model.py

Expected output: ~90.12% INT8 accuracy, 11.79x compression

Project Structure

Key Files

Test Scripts

test_tf_keras_model.py - Tests the FP32 TensorFlow/Keras model
test_ptq_model.py - Tests and compares FP32 vs INT8 PTQ quantized models
test_qat_model.py - Tests and compares FP32 vs INT8 QAT quantized models

Quantization Pipeline

ptq_pipeline_tf_keras.py - Post-Training Quantization pipeline (generates quantized model)

Model Files

model_weights/tf_keras_weights.h5 - Trained FP32 model weights (14.08 MB)
model_weights/model_ptq_int8.tflite - Quantized INT8 model (1.20 MB)
model_weights/model_fp32.keras - Full FP32 model (saved format)

Core Code

preprocessing.py - Audio preprocessing (STFT → mel spectrogram → log scale)
utils.py - Dataset loading and train/test split utilities
conv_block_model.py - CNN model architecture definition

Training Notebooks

tf_keras_files/tf_keras_model.ipynb - TensorFlow/Keras training notebook
qat_pipeline_final.ipynb - Quantization Aware Training (QAT) notebook (PyTorch)
temp_file_pytorch_backend.ipynb - PyTorch training notebook

Dataset

Code/data/speech_commands_v0_extracted/ - Extracted Speech Commands dataset
- 36 classes (yes, no, up, down, left, right, etc.)
- ~105,835 audio files
- 70/15/15 train/val/test split

Detailed Usage

Testing the FP32 Model

Purpose: Verify the trained FP32 model works correctly.

cd Code
python test_tf_keras_model.py

What it does:

Loads the model architecture
Loads weights from tf_keras_files/tf_keras_weights.h5
Loads and preprocesses test data
Evaluates model accuracy
Shows 10 sample predictions

Expected output:

Test accuracy = 0.9346
Sample 0: True = wow | Predicted = wow
...

Testing the Quantized Model

Purpose: Compare FP32 vs INT8 quantized model performance.

cd Code
python test_ptq_model.py

What it does:

Loads FP32 model and quantized TFLite model
Evaluates both on the same test set
Compares accuracies and model sizes
Shows side-by-side sample predictions

Expected output:

FP32 Accuracy:      93.4551% (0.934551)
INT8 Accuracy:      93.4929% (0.934929)
Accuracy Drop:      -0.0378% (-0.04%)
Model Size (FP32):  14.08 MB
Model Size (INT8):  1.20 MB
Compression Ratio:  11.73x

Testing the QAT Model

Purpose: Compare FP32 vs INT8 QAT model performance.

cd Code
python test_qat_model.py

What it does:

Loads FP32 model and QAT TFLite model
Evaluates both on the same test set
Compares accuracies and model sizes
Shows side-by-side sample predictions

Expected output:

 FP32 Accuracy:      93.4299% (0.934299)
 INT8 Accuracy:      90.1228% (0.901228)
 Accuracy Drop:      3.3071% (+3.31%)
 Model Size (FP32):  14.08 MB
 Model Size (INT8):  1.19 MB
 Compression Ratio:  11.79x

Running PTQ Pipeline

Purpose: Generate a new quantized model from the FP32 model.

cd Code
python ptq_pipeline_tf_keras.py

What it does:

Loads the FP32 model
Creates calibration dataset (200 samples)
Applies INT8 quantization using TensorFlow Lite
Saves quantized model to tf_keras_files/model_ptq_int8.tflite
Evaluates both models and shows comparison

Note: This takes several minutes due to calibration and conversion.

Model Architecture

The CNN model consists of:

3 Convolutional Blocks:
- Block 1: 64 filters, stride 2
- Block 2: 128 filters
- Block 3: 256 filters
Each block: Conv2D → BatchNorm → ReLU → MaxPooling → SpatialDropout
Global Average Pooling
Dense layers: 256 → 36 (softmax output)

Input: Mel spectrogram (128×64×1)
Output: 36 class probabilities

Preprocessing Pipeline

The preprocessing matches the training pipeline:

Audio normalization (int16 → float32, /32768.0)
STFT (frame_length=256, frame_step=128)
Mel spectrogram (64 mel bins, 0-8000 Hz)
Log scale (log(x + 1e-6))
Resize to (128, 64, 1)

Results Summary

Metric	FP32 Model	INT8 Model
Accuracy	93.46%	93.49%
Model Size	14.08 MB	1.20 MB
Compression	1x	11.73x
Format	H5 weights	TFLite

See "Testing the Quantized Model" section above for detailed comparison output.

Troubleshooting

Model weights not loading

Ensure tf_keras_files/tf_keras_weights.h5 exists
The script will try manual HDF5 loading if standard loading fails

Dataset not found

Ensure Code/data/speech_commands_v0_extracted/ exists

If missing, download the dataset:

cd Code
python -c "from utils import download_files; download_files()"

Quantized model not found

Run ptq_pipeline_tf_keras.py first to generate the TFLite model
Check that tf_keras_files/model_ptq_int8.tflite exists

Import errors

Ensure all dependencies are installed: pip install -r requirements.txt
Use Keras 3 with TensorFlow backend (set KERAS_BACKEND=tensorflow)

Dependencies

See requirements.txt for full list. Core dependencies:

tensorflow>=2.15.0
keras>=3.0.0
numpy>=1.24.0
h5py>=3.8.0

Citation

If you use this code, please cite:

Speech Commands Dataset: Warden, P. (2018). "Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition"
TensorFlow Lite: Google (2024). "TensorFlow Lite: Machine Learning for Mobile and Edge Devices"

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.vscode		.vscode
Code		Code
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Speech Commands CNN Model - Quantization Project

Project Overview

Quick Start

1. Install Dependencies

2. Download Dataset

3. Test the FP32 Model

4. Test the Quantized (PTQ) Model

5. Test the Quantization-Aware Training (QAT) Model

Project Structure

Key Files

Test Scripts

Quantization Pipeline

Model Files

Core Code

Training Notebooks

Dataset

Detailed Usage

Testing the FP32 Model

Testing the Quantized Model

Testing the QAT Model

Running PTQ Pipeline

Model Architecture

Preprocessing Pipeline

Results Summary

Troubleshooting

Model weights not loading

Dataset not found

Quantized model not found

Import errors

Dependencies

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages