Audio Explorers 2026

Demant Audio Explorers 2026 case challenge: analyse a 4-channel hearing-aid recording to identify, separate, and enhance a target talker from a multi-talker mixture.

Status / disclaimer. This repository is a one-shot submission for the Demant Audio Explorers 2026 case challenge, shared publicly for reference and learning. It is not a maintained product. Issues and PRs are welcome but responses are best-effort. See CONTRIBUTING.md for setup notes.

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run the Streamlit app
streamlit run app.py

# Run tests
python3 -m pytest tests/ -v

Project Structure

├── app.py                          # Streamlit app (Demo + Experiment modes)
├── requirements.txt
├── CLAUDE.md                       # AI assistant instructions
├── src/
│   ├── array_geometry.py           # Shared mic positions, steering utilities
│   ├── audio_io.py                 # Load/save multi-channel WAV files
│   ├── doa_estimation.py           # GCC-PHAT, TDOA, spatial clustering, speaker counting
│   ├── target_enhancement.py       # DAS & MVDR beamforming, two-stage enhancement, Wiener post-filter
│   ├── post_processing.py          # Presence boost, compression, spectral subtraction, comfort noise, LUFS
│   ├── preprocessing.py            # Highpass filter, resample, normalize, mono extraction
│   ├── transcription.py            # Whisper ASR wrapper (lazy loading, language auto-detect)
│   ├── gender_classification.py    # ECAPA-TDNN → wav2vec2 → F0 fallback gender classifier
│   ├── enhancement_evaluation.py   # Structured method comparison harness
│   ├── quality_metrics.py          # SI-SDR, segmental SNR, PESQ evaluation
│   ├── visualization.py            # Spectrograms, polar plots, F0 contours
│   └── experiment.py               # A/B preprocessing comparison framework
├── tests/                          # 70 tests across 7 test files
├── data/raw/                       # Input audio files (mixture.wav, example_mixture.wav)
├── output/                         # Enhanced audio output
├── report/                         # LaTeX report and figures
├── notebooks/                      # Exploratory Jupyter notebooks
├── docs/                           # Research notes and design decisions
└── scripts/                        # Utility scripts

Application

The Streamlit app (app.py) has two modes:

Demo Mode — Full pipeline:

Load mixture.wav or upload a custom 4-channel WAV
DOA estimation with GCC-PHAT spatial clustering → detects N speakers
Beamforming (DAS or MVDR) toward each speaker direction
Two intuitive sliders:
- 🔇 Background Suppression (0–1): controls mask sharpness, Wiener floor, spectral subtraction
- 🔊 Target Boost (0–1): controls presence EQ, compression, LUFS target
Whisper transcription per direction
Gender classification (ECAPA-TDNN → wav2vec2 → F0 fallback)
Talker-of-interest reasoning
Enhanced WAV download

Experiment Mode — A/B preprocessing comparison for evaluating different enhancement strategies side by side.

Pipeline Overview

mixture.wav (44.1 kHz, 4ch)
    → Preprocessing (highpass, normalize)
    → DOA Estimation (GCC-PHAT, TDOA, spatial clustering)
    → Beamforming (DAS / MVDR)
    → Post-processing (Wiener filter, spectral subtraction, presence boost, compression, LUFS)
    → Whisper ASR
    → Gender Classification (ECAPA-TDNN → wav2vec2 → F0 fallback)
    → Talker-of-Interest Reasoning
    → Enhanced WAV

Testing

70 tests across 7 files:

test_doa_estimation.py        — GCC-PHAT, weighted GCC, DOA precision
test_target_enhancement.py    — DAS, MVDR, enhance_target, numerical stability
test_speaker_estimation.py    — Speaker count validation (4 speakers detected)
test_gender_classification.py — F0 + wav2vec2 hybrid, edge cases
test_quality_metrics.py       — SI-SDR, segmental SNR
test_voice_isolator.py        — Suppression/boost dial mapping, param passthrough
test_doa_printout.py          — Integration test (requires data files)

python3 -m pytest tests/ -v

Key Dependencies

numpy · scipy · librosa · openai-whisper · torch · transformers · soundfile · matplotlib · streamlit · pyloudnorm · nara-wpe (optional)

See requirements.txt for pinned versions.

Audio Facts

Property	Value
Sample rate	44.1 kHz
Channels	4
Duration	~21 seconds
Channel order	left front, left rear, right front, right rear
Assumed BTE geometry	~12 mm front-rear, ~17 cm inter-aural

Input files live in data/raw/: mixture.wav (challenge input) and example_mixture.wav (development).

For deeper technical notes see docs/.

Limitations & Assumptions

Single-side array, front/back ambiguity. A 4-mic BTE array on one side cannot fully resolve front-vs-back without extra cues; reported azimuths are most reliable in the lateral hemisphere.
Assumed geometry. ~12 mm front-rear spacing, ~17 cm inter-aural distance. These are reasonable for hearing-aid form factors but were not measured for the specific device that produced the recording.
Far-field plane-wave model. Beamforming and DOA assume sources are far enough that wavefronts are approximately planar at the array.
Short clip (≈21 s). Statistical estimates (covariances, embeddings) are trained on limited data, so confidence is bounded.
Gender classifier. Lower confidence than spatial and transcription evidence, especially in the 150–180 Hz F0 ambiguity zone. Treat as a hint, not ground truth.
Whisper hallucinations. Short, noisy, or non-speech segments can produce spurious transcripts; cross-check against energy/voice-activity evidence.

How to Obtain the Input Audio

The 4-channel mixture WAV files are provided as part of the Demant Audio Explorers 2026 challenge materials and are not redistributed by this repo. If you have access to the challenge package, place the files at:

data/raw/mixture.wav
data/raw/example_mixture.wav

You can also point the Streamlit app at any other 4-channel 44.1 kHz WAV via its file uploader. A minimal synthetic test mixture is generated by scripts/generate_synthetic_test.py and stored in data/synthetic/.

Acknowledgments

Demant for the case challenge and the recordings.
The open-source projects this work builds on: NumPy, SciPy, librosa, soundfile, PyTorch, torchaudio, OpenAI Whisper, Hugging Face Transformers, SpeechBrain, pyannote-audio, Streamlit, pyloudnorm, nara-wpe, PESQ, PyStoI.
Academic prior work cited in docs/papers/reference-set-01/README.md.

Citation

If you reference this work, please use the metadata in CITATION.cff. A short BibTeX-style snippet:

@software{audio_explorers_2026,
  title  = {Audio Explorers 2026 — Multi-Talker 4-Channel Hearing-Aid Pipeline},
  author = {DataAthleteChamp},
  year   = {2026},
  url    = {https://github.com/DataAthleteChamp/Audio-Explorers-2026},
  license = {MIT}
}

License

Released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Explorers 2026

Quick Start

Project Structure

Application

Pipeline Overview

Testing

Key Dependencies

Audio Facts

Limitations & Assumptions

How to Obtain the Input Audio

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
data		data
docs		docs
notebooks		notebooks
output		output
report		report
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Audio Explorers 2026

Quick Start

Project Structure

Application

Pipeline Overview

Testing

Key Dependencies

Audio Facts

Limitations & Assumptions

How to Obtain the Input Audio

Acknowledgments

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages