A powerful CLI tool for analyzing audio samples, with a focus on ML training data quality checks.
-
Comprehensive Audio Analysis: Extract a wide range of audio features including:
- MFCCs (Mel-frequency cepstral coefficients)
- Spectral features (centroid, rolloff, flux, bandwidth)
- Fundamental frequency (F0/pitch) analysis
- Zero-crossing rate (ZCR)
- Formants (F1, F2, F3)
- Voice quality metrics (jitter, shimmer, HNR)
- Energy metrics (RMS, peak levels, crest factor, energy entropy)
- Quality indicators (SNR estimate, clipping detection, DC offset, speech-to-silence ratio)
-
Rich Terminal UI: Beautiful, formatted output using the
richlibrary -
Visualization: ASCII spectrogram in terminal + high-resolution PNG export
-
Batch Processing: Analyze entire directories of audio files
-
CSV Export: Export batch results for further analysis
-
Multiple Formats: Support for WAV, MP3, FLAC, OGG, M4A
# Install pipx if you don't have it
pip install --user pipx
# Install audiolens globally
pipx install git+https://github.com/yourusername/audiolens.git
# Or from local source
pipx install /path/to/audiolensWhy pipx? Installs CLI tools in isolated environments, avoiding dependency conflicts.
pipx install audiolensgit clone https://github.com/yourusername/audiolens.git
cd audiolens
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in editable mode
pip install -e .# Clone and install
git clone https://github.com/yourusername/audiolens.git
cd audiolens
pip install --user .audiolens requires Python 3.8+ and the following packages:
- librosa - Audio feature extraction
- numpy - Numerical computing
- scipy - Scientific computing
- rich - Terminal UI
- soundfile - Audio I/O
- matplotlib - Visualization
- praat-parselmouth - Voice quality metrics (optional but recommended)
All dependencies are installed automatically.
audiolens analyze audio.wavaudiolens analyze audio.wav --visualizeThis displays an ASCII spectrogram in your terminal and saves a high-resolution PNG image.
audiolens analyze /path/to/audio/files/audiolens analyze /path/to/audio/files/ --output results.csvaudiolens analyze audio.wav --n-mfcc 20 --sample-rate 22050 --hop-length 256audiolens analyze [OPTIONS] INPUT
Arguments:
INPUT Input audio file or directory
Options:
-o, --output PATH Output CSV file for batch results
-v, --visualize Display spectrogram visualization
--sample-rate INT Target sample rate (default: use file's native rate)
--n-mfcc INT Number of MFCCs to compute (default: 13)
--hop-length INT Hop length for frame-based analysis (default: 512)
--n-fft INT FFT window size (default: 2048)
--formats STR Comma-separated audio formats (default: wav,mp3,flac,ogg,m4a)
--help Show help message
- Duration
- Sample rate
- Number of channels
- Spectral centroid (mean & std)
- Spectral rolloff (mean & std)
- Spectral bandwidth (mean & std)
- Spectral flux (mean & std)
- F0 fundamental frequency (mean, std, min, max)
- Formants F1, F2, F3 (mean values)
- Jitter (pitch variation)
- Shimmer (amplitude variation)
- HNR (Harmonics-to-Noise Ratio)
- RMS energy (mean & std)
- Peak amplitude
- Crest factor
- Energy entropy
- Zero-crossing rate
- SNR estimate
- Speech-to-silence ratio
- Clipping detection & percentage
- DC offset
# Analyze all wav files in a directory and export to CSV
audiolens analyze ./dataset/train/ --output train_analysis.csv --formats wav
# Check for common issues
# The output will highlight:
# - Low SNR (< 10 dB = poor quality)
# - Clipping (> 0.1% clipped samples)
# - High DC offset (> 0.01)# Analyze a single speech file with full voice quality metrics
audiolens analyze speech.wav --visualize
# This will show:
# - Pitch characteristics (F0)
# - Formants (vowel quality)
# - Jitter & shimmer (voice stability)
# - HNR (voice clarity)# Batch analyze and compare
audiolens analyze ./recordings/ --output comparison.csv
# Open comparison.csv in Excel/pandas to compare:
# - SNR across files
# - Spectral characteristics
# - Clipping issues
# - Energy levels- Detect clipping, noise, and other artifacts
- Ensure consistent audio quality across dataset
- Identify outliers based on spectral features
- Validate SNR requirements
- Analyze pitch and formants
- Measure voice quality (jitter, shimmer, HNR)
- Detect speech vs. silence regions
- Verify audio processing pipelines
- Check for DC offset after processing
- Ensure no clipping introduced
- Validate spectral characteristics
pytest tests/black src/mypy src/Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with librosa for audio analysis
- Uses Praat-Parselmouth for voice quality features
- Terminal UI powered by rich
If you use audiolens in your research, please cite:
@software{audiolens,
title = {audiolens: Audio Analysis for ML Training Data},
author = {Your Name},
year = {2025},
url = {https://github.com/yourusername/audiolens}
}- Add more quality metrics (PESQ, STOI, etc.)
- Support for multi-channel analysis
- Integration with audio augmentation libraries
- Interactive TUI mode
- Plugin system for custom analyzers
- Real-time audio analysis mode
- Web-based dashboard for batch results