ESP32 Audio Classifier

Real-time DSP audio analysis system for ESP32 with advanced feature extraction and JSON data streaming.

Features

Real-time FFT: 2048-point, 21.5 Hz resolution
Spectral Features: Centroid, spread, flatness, rolloff
Temporal Features: Zero-crossing rate, RMS energy, peak detection
Note Detection: 12-semitone chromatic scale (Goertzel algorithm)
Frequency Bands: 16 logarithmic bands (60 Hz - 16 kHz)
JSON Streaming: ~10 Hz updates over serial

Hardware

ESP32 DevKit V1
INMP441 I2S digital microphone
USB cable (power + serial)

Quick Start

Pinout

INMP441    ESP32
WS    �    GPIO 32
SD    �    GPIO 35
SCK   �    GPIO 33
GND   �    GND
3V3   �    3V3

Build

cd audio-signal-processor
pio run --target upload

Monitor

pio device monitor --baud 115200

Output Format

JSON with spectral, temporal, and frequency band features:

{
  "timestamp_ms": 12345678,
  "spectral": {"centroid_hz": 2450.5, ...},
  "temporal": {"zcr": 0.15, "rms_energy": 0.35, ...},
  "freq_bands": [...],
  "peaks": [...],
  "note_detection": [...]
}

System Architecture

Audio Input (44.1 kHz)
    ↓
Audio Frame Buffer (2048 samples)
    ↓
Preprocessing (DC removal, windowing)
    ↓
FFT Analysis (2048-point)
    ↓
Feature Extraction (48 features)
    ├─ Spectral: centroid, spread, flatness, rolloff
    ├─ Temporal: ZCR, RMS energy, peak amplitude
    ├─ Frequency: 16 logarithmic bands (60 Hz - 16 kHz)
    ├─ MFCC: 13 coefficients
    └─ Chroma: 12 note bins
    ↓
ML Classification (TensorFlow Lite)
    ├─ Normalize features
    └─ Genre prediction (10 genres)
    ↓
JSON Output (serial @ 115200 baud)

ML Pipeline & Training

Feature Extraction (ml/feature_extraction.py)
- Extract 48 audio features from audio files
- Output: CSV with feature vectors
Model Training (ml/train_model.py)
- Train neural network on extracted features
- Split: 64% train, 16% validation, 20% test (subject to change)
- Output: TensorFlow Lite model for ESP32
Deployment
- Quantized model runs on ESP32 in real-time (might change)
- Classifies audio into 10 music genres
Model Accuracy
- Overall test accuracy: 62.5% across 10 genres (200 test samples)
- Classical music has the highest per-genre accuracy at 90% (18/20 correct), likely due to its distinct spectral profile — low zero-crossing rate, narrow frequency spread, and strong harmonic structure make it easy to separate from other genres
- Metal (75%) and pop (75%) also perform well, while genres with overlapping characteristics like disco (35%) and rock (40%) are harder to distinguish
- The confusion matrix below shows that most misclassifications occur between sonically similar genres (e.g., rock confused with metal/country, disco confused with reggae/rock)
Training Results and Initial Data
- Correlation Matrix & Training History:

Documentation

ARCHITECTURE.md - Technical deep-dive
FEATURES_REFERENCE.md - Feature explanations
DSP Features API - API documentation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
ml		ml
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
FEATURES_REFERENCE.md		FEATURES_REFERENCE.md
README.md		README.md
genres.tar.gz		genres.tar.gz
mic_test.cpp		mic_test.cpp
platformio.ini		platformio.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESP32 Audio Classifier

Features

Hardware

Quick Start

Pinout

Build

Monitor

Output Format

System Architecture

ML Pipeline & Training

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ESP32 Audio Classifier

Features

Hardware

Quick Start

Pinout

Build

Monitor

Output Format

System Architecture

ML Pipeline & Training

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages