Skip to content

DataAthleteChamp/Audio-Explorers-2026

Repository files navigation

Audio Explorers 2026

Python License Status

Demant Audio Explorers 2026 case challenge: analyse a 4-channel hearing-aid recording to identify, separate, and enhance a target talker from a multi-talker mixture.

Status / disclaimer. This repository is a one-shot submission for the Demant Audio Explorers 2026 case challenge, shared publicly for reference and learning. It is not a maintained product. Issues and PRs are welcome but responses are best-effort. See CONTRIBUTING.md for setup notes.

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run the Streamlit app
streamlit run app.py

# Run tests
python3 -m pytest tests/ -v

Project Structure

├── app.py                          # Streamlit app (Demo + Experiment modes)
├── requirements.txt
├── CLAUDE.md                       # AI assistant instructions
├── src/
│   ├── array_geometry.py           # Shared mic positions, steering utilities
│   ├── audio_io.py                 # Load/save multi-channel WAV files
│   ├── doa_estimation.py           # GCC-PHAT, TDOA, spatial clustering, speaker counting
│   ├── target_enhancement.py       # DAS & MVDR beamforming, two-stage enhancement, Wiener post-filter
│   ├── post_processing.py          # Presence boost, compression, spectral subtraction, comfort noise, LUFS
│   ├── preprocessing.py            # Highpass filter, resample, normalize, mono extraction
│   ├── transcription.py            # Whisper ASR wrapper (lazy loading, language auto-detect)
│   ├── gender_classification.py    # ECAPA-TDNN → wav2vec2 → F0 fallback gender classifier
│   ├── enhancement_evaluation.py   # Structured method comparison harness
│   ├── quality_metrics.py          # SI-SDR, segmental SNR, PESQ evaluation
│   ├── visualization.py            # Spectrograms, polar plots, F0 contours
│   └── experiment.py               # A/B preprocessing comparison framework
├── tests/                          # 70 tests across 7 test files
├── data/raw/                       # Input audio files (mixture.wav, example_mixture.wav)
├── output/                         # Enhanced audio output
├── report/                         # LaTeX report and figures
├── notebooks/                      # Exploratory Jupyter notebooks
├── docs/                           # Research notes and design decisions
└── scripts/                        # Utility scripts

Application

The Streamlit app (app.py) has two modes:

Demo Mode — Full pipeline:

  • Load mixture.wav or upload a custom 4-channel WAV
  • DOA estimation with GCC-PHAT spatial clustering → detects N speakers
  • Beamforming (DAS or MVDR) toward each speaker direction
  • Two intuitive sliders:
    • 🔇 Background Suppression (0–1): controls mask sharpness, Wiener floor, spectral subtraction
    • 🔊 Target Boost (0–1): controls presence EQ, compression, LUFS target
  • Whisper transcription per direction
  • Gender classification (ECAPA-TDNN → wav2vec2 → F0 fallback)
  • Talker-of-interest reasoning
  • Enhanced WAV download

Experiment Mode — A/B preprocessing comparison for evaluating different enhancement strategies side by side.

Pipeline Overview

mixture.wav (44.1 kHz, 4ch)
    → Preprocessing (highpass, normalize)
    → DOA Estimation (GCC-PHAT, TDOA, spatial clustering)
    → Beamforming (DAS / MVDR)
    → Post-processing (Wiener filter, spectral subtraction, presence boost, compression, LUFS)
    → Whisper ASR
    → Gender Classification (ECAPA-TDNN → wav2vec2 → F0 fallback)
    → Talker-of-Interest Reasoning
    → Enhanced WAV

Testing

70 tests across 7 files:

test_doa_estimation.py        — GCC-PHAT, weighted GCC, DOA precision
test_target_enhancement.py    — DAS, MVDR, enhance_target, numerical stability
test_speaker_estimation.py    — Speaker count validation (4 speakers detected)
test_gender_classification.py — F0 + wav2vec2 hybrid, edge cases
test_quality_metrics.py       — SI-SDR, segmental SNR
test_voice_isolator.py        — Suppression/boost dial mapping, param passthrough
test_doa_printout.py          — Integration test (requires data files)
python3 -m pytest tests/ -v

Key Dependencies

numpy · scipy · librosa · openai-whisper · torch · transformers · soundfile · matplotlib · streamlit · pyloudnorm · nara-wpe (optional)

See requirements.txt for pinned versions.

Audio Facts

Property Value
Sample rate 44.1 kHz
Channels 4
Duration ~21 seconds
Channel order left front, left rear, right front, right rear
Assumed BTE geometry ~12 mm front-rear, ~17 cm inter-aural

Input files live in data/raw/: mixture.wav (challenge input) and example_mixture.wav (development).

For deeper technical notes see docs/.

Limitations & Assumptions

  • Single-side array, front/back ambiguity. A 4-mic BTE array on one side cannot fully resolve front-vs-back without extra cues; reported azimuths are most reliable in the lateral hemisphere.
  • Assumed geometry. ~12 mm front-rear spacing, ~17 cm inter-aural distance. These are reasonable for hearing-aid form factors but were not measured for the specific device that produced the recording.
  • Far-field plane-wave model. Beamforming and DOA assume sources are far enough that wavefronts are approximately planar at the array.
  • Short clip (≈21 s). Statistical estimates (covariances, embeddings) are trained on limited data, so confidence is bounded.
  • Gender classifier. Lower confidence than spatial and transcription evidence, especially in the 150–180 Hz F0 ambiguity zone. Treat as a hint, not ground truth.
  • Whisper hallucinations. Short, noisy, or non-speech segments can produce spurious transcripts; cross-check against energy/voice-activity evidence.

How to Obtain the Input Audio

The 4-channel mixture WAV files are provided as part of the Demant Audio Explorers 2026 challenge materials and are not redistributed by this repo. If you have access to the challenge package, place the files at:

data/raw/mixture.wav
data/raw/example_mixture.wav

You can also point the Streamlit app at any other 4-channel 44.1 kHz WAV via its file uploader. A minimal synthetic test mixture is generated by scripts/generate_synthetic_test.py and stored in data/synthetic/.

Acknowledgments

  • Demant for the case challenge and the recordings.
  • The open-source projects this work builds on: NumPy, SciPy, librosa, soundfile, PyTorch, torchaudio, OpenAI Whisper, Hugging Face Transformers, SpeechBrain, pyannote-audio, Streamlit, pyloudnorm, nara-wpe, PESQ, PyStoI.
  • Academic prior work cited in docs/papers/reference-set-01/README.md.

Citation

If you reference this work, please use the metadata in CITATION.cff. A short BibTeX-style snippet:

@software{audio_explorers_2026,
  title  = {Audio Explorers 2026 — Multi-Talker 4-Channel Hearing-Aid Pipeline},
  author = {DataAthleteChamp},
  year   = {2026},
  url    = {https://github.com/DataAthleteChamp/Audio-Explorers-2026},
  license = {MIT}
}

License

Released under the MIT License.

About

Demant Audio Explorers 2026 case challenge: analyse a 4-channel hearing-aid recording to identify, separate, and enhance a target talker from a multi-talker mixture.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors