Demant Audio Explorers 2026 case challenge: analyse a 4-channel hearing-aid recording to identify, separate, and enhance a target talker from a multi-talker mixture.
Status / disclaimer. This repository is a one-shot submission for the Demant Audio Explorers 2026 case challenge, shared publicly for reference and learning. It is not a maintained product. Issues and PRs are welcome but responses are best-effort. See CONTRIBUTING.md for setup notes.
# Install dependencies
pip install -r requirements.txt
# Run the Streamlit app
streamlit run app.py
# Run tests
python3 -m pytest tests/ -v├── app.py # Streamlit app (Demo + Experiment modes)
├── requirements.txt
├── CLAUDE.md # AI assistant instructions
├── src/
│ ├── array_geometry.py # Shared mic positions, steering utilities
│ ├── audio_io.py # Load/save multi-channel WAV files
│ ├── doa_estimation.py # GCC-PHAT, TDOA, spatial clustering, speaker counting
│ ├── target_enhancement.py # DAS & MVDR beamforming, two-stage enhancement, Wiener post-filter
│ ├── post_processing.py # Presence boost, compression, spectral subtraction, comfort noise, LUFS
│ ├── preprocessing.py # Highpass filter, resample, normalize, mono extraction
│ ├── transcription.py # Whisper ASR wrapper (lazy loading, language auto-detect)
│ ├── gender_classification.py # ECAPA-TDNN → wav2vec2 → F0 fallback gender classifier
│ ├── enhancement_evaluation.py # Structured method comparison harness
│ ├── quality_metrics.py # SI-SDR, segmental SNR, PESQ evaluation
│ ├── visualization.py # Spectrograms, polar plots, F0 contours
│ └── experiment.py # A/B preprocessing comparison framework
├── tests/ # 70 tests across 7 test files
├── data/raw/ # Input audio files (mixture.wav, example_mixture.wav)
├── output/ # Enhanced audio output
├── report/ # LaTeX report and figures
├── notebooks/ # Exploratory Jupyter notebooks
├── docs/ # Research notes and design decisions
└── scripts/ # Utility scripts
The Streamlit app (app.py) has two modes:
Demo Mode — Full pipeline:
- Load
mixture.wavor upload a custom 4-channel WAV - DOA estimation with GCC-PHAT spatial clustering → detects N speakers
- Beamforming (DAS or MVDR) toward each speaker direction
- Two intuitive sliders:
- 🔇 Background Suppression (0–1): controls mask sharpness, Wiener floor, spectral subtraction
- 🔊 Target Boost (0–1): controls presence EQ, compression, LUFS target
- Whisper transcription per direction
- Gender classification (ECAPA-TDNN → wav2vec2 → F0 fallback)
- Talker-of-interest reasoning
- Enhanced WAV download
Experiment Mode — A/B preprocessing comparison for evaluating different enhancement strategies side by side.
mixture.wav (44.1 kHz, 4ch)
→ Preprocessing (highpass, normalize)
→ DOA Estimation (GCC-PHAT, TDOA, spatial clustering)
→ Beamforming (DAS / MVDR)
→ Post-processing (Wiener filter, spectral subtraction, presence boost, compression, LUFS)
→ Whisper ASR
→ Gender Classification (ECAPA-TDNN → wav2vec2 → F0 fallback)
→ Talker-of-Interest Reasoning
→ Enhanced WAV
70 tests across 7 files:
test_doa_estimation.py — GCC-PHAT, weighted GCC, DOA precision
test_target_enhancement.py — DAS, MVDR, enhance_target, numerical stability
test_speaker_estimation.py — Speaker count validation (4 speakers detected)
test_gender_classification.py — F0 + wav2vec2 hybrid, edge cases
test_quality_metrics.py — SI-SDR, segmental SNR
test_voice_isolator.py — Suppression/boost dial mapping, param passthrough
test_doa_printout.py — Integration test (requires data files)
python3 -m pytest tests/ -vnumpy · scipy · librosa · openai-whisper · torch · transformers · soundfile · matplotlib · streamlit · pyloudnorm · nara-wpe (optional)
See requirements.txt for pinned versions.
| Property | Value |
|---|---|
| Sample rate | 44.1 kHz |
| Channels | 4 |
| Duration | ~21 seconds |
| Channel order | left front, left rear, right front, right rear |
| Assumed BTE geometry | ~12 mm front-rear, ~17 cm inter-aural |
Input files live in data/raw/: mixture.wav (challenge input) and example_mixture.wav (development).
For deeper technical notes see docs/.
- Single-side array, front/back ambiguity. A 4-mic BTE array on one side cannot fully resolve front-vs-back without extra cues; reported azimuths are most reliable in the lateral hemisphere.
- Assumed geometry. ~12 mm front-rear spacing, ~17 cm inter-aural distance. These are reasonable for hearing-aid form factors but were not measured for the specific device that produced the recording.
- Far-field plane-wave model. Beamforming and DOA assume sources are far enough that wavefronts are approximately planar at the array.
- Short clip (≈21 s). Statistical estimates (covariances, embeddings) are trained on limited data, so confidence is bounded.
- Gender classifier. Lower confidence than spatial and transcription evidence, especially in the 150–180 Hz F0 ambiguity zone. Treat as a hint, not ground truth.
- Whisper hallucinations. Short, noisy, or non-speech segments can produce spurious transcripts; cross-check against energy/voice-activity evidence.
The 4-channel mixture WAV files are provided as part of the Demant Audio Explorers 2026 challenge materials and are not redistributed by this repo. If you have access to the challenge package, place the files at:
data/raw/mixture.wav
data/raw/example_mixture.wav
You can also point the Streamlit app at any other 4-channel 44.1 kHz WAV via
its file uploader. A minimal synthetic test mixture is generated by
scripts/generate_synthetic_test.py and stored in data/synthetic/.
- Demant for the case challenge and the recordings.
- The open-source projects this work builds on: NumPy, SciPy, librosa, soundfile, PyTorch, torchaudio, OpenAI Whisper, Hugging Face Transformers, SpeechBrain, pyannote-audio, Streamlit, pyloudnorm, nara-wpe, PESQ, PyStoI.
- Academic prior work cited in
docs/papers/reference-set-01/README.md.
If you reference this work, please use the metadata in
CITATION.cff. A short BibTeX-style snippet:
@software{audio_explorers_2026,
title = {Audio Explorers 2026 — Multi-Talker 4-Channel Hearing-Aid Pipeline},
author = {DataAthleteChamp},
year = {2026},
url = {https://github.com/DataAthleteChamp/Audio-Explorers-2026},
license = {MIT}
}Released under the MIT License.