MultiChannel_RIR_Generation

Source code for "Development of a Microphone Array Room Impulse Response Dataset for Evaluating Multichannel Acoustic Generation (多チャンネル音響生成を評価するためのマイクロフォンアレイ室内インパルス応答データセットの構築)".
This source code is based on Learning Neural Acoustic Fields.

Neural acoustic fields learn from a Room Impulse Response (RIR) dataset and can estimate RIR at arbitrary positions. We extended Neural Acoustic Fields to be able to learn from multichannel RIRs.
Additionally, we made it possible to construct datasets through Pyroomacoustics simulations and evaluate multichannel RIR estimation based on the accuracy of Direction of Arrival (DoA), i.e., sound source direction estimation.

The source code in this repository has been verified to work on Google Colaboratory.

Requirements (in addition to the usual python stack)

Pytorch 1.9 (1.10 should work as well)
Pyroomacoustics 0.7.3
h5py

Project structure

model: Extended NAF network
model_pipeline: Train, test, and evaluation of the network
preprocess: Preprocessing the RIR dataset for the network
simulation: Pyroomacoustics simulation for creating dataset

Dataset

Simulation Data

Creating a simulation RIR dataset using Pyroomacoustics

```
python ./simulation/simulation.py
```

Modifying the simulation environment by adjusting the following parameters within ./simulation/simulation.py

MultiChannel_RIR_Generation/simulation/simulation.py

Lines 14 to 39 in 31c7077

    
           # The number of x-coordinates for placing the speakers and microphone arrays 
        
           position_num_x = 13 
        
           # The number of y-coordinates for placing the speakers and microphone arrays 
        
           position_num_y = 13 
        
           # The z-coordinate of the speakers and microphone arrays [m] 
        
           position_z = 1.35 
        
           # The spacing between microphone arrays [m] 
        
           blank_space = 0.5 
        
           # The radius of the circular microphone array [m] 
        
           mic_radius = 0.1 
        
           # The number of channels in the microphone array 
        
           mic_num = 4 
        
           # The microphone directivity flag (should be set to False) 
        
           mic_directivity_flg = False 
        
           # The path to record the placement coordinates 
        
           points_path = "./wav_data/points.txt" 
        
           # The path to record the maximum and minimum values of the placement coordinates 
        
           minmax_path = "./minmax/minmax.pkl" 
        
           # The path to record the simulation RIR 
        
           results_dir = "./wav_data/raw/" 
        
           # Reverberation time and room dimensions 
        
           rt60 = 0.5  # seconds 
        
           # If you make this two-dimensional, it will represent a two-dimensional room 
        
           room_dim = [7.0, 6.4, 2.7]  # meters 
        
           sampling_rate = 48000 # Hz

Real data

Downloading the real RIR dataset from Google Drive

gdown https://drive.google.com/uc?id=1ed4MeDcsWhquXO_a3mNQCJ5i21TR8JpQ  
unzip real_wav_data.zip

Usage

Preprocess

Splitting the dataset into training and test dataset
- ```
python ./preprocess/make_train_test_split.py
```
Preprocessing the RIR waveform data to match the format of the network output (e.g., by converting it into spectrograms)
- ```
python ./preprocess/make_data.py
```

Train

Training the network

For more details on options, refer to ./model_pipeline/options.py.
The trained model will be saved in the directory specified by save_loc and exp_name options.
- ```
python ./model_pipeline/train/train.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
```

Test

Performing inference on the test data

The options must be the same as those used in training.
The inference output will be saved in the directory specified by the --save_loc and --inference_loc options.
- ```
python ./model_pipeline/test/test.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
```

Perform inference on the train data

python ./model_pipeline/test/test_train_data.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4

Evaluation

Computing the spectral loss from the inference results on the test data

The options must be the same as those used in training.
The results will be printed to the standard output.
- ```
python ./model_pipeline/evaluation/compute_spectral_loss.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
```
Computing the T60-error from the inference results on the test data

The options must be the same as those used in training.
The results will be printed to the standard output.
- ```
python ./model_pipeline/evaluation/compute_T60_err.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
```

Computing the DoA error from the inference results on the test data

Do not specify the options.
The results will be printed to the standard output.

python ./model_pipeline/evaluation/compute_DoA_err.py

Directly modifying ./model_pipeline/evaluation/compute_DoA_err.py to change the parameters

MultiChannel_RIR_Generation/model_pipeline/evaluation/compute_DoA_err.py

Lines 8 to 19 in 31c7077

    
           # The radius of the circular microphone array [m] 
        
           mic_radius = 0.1 
        
           # The number of channels in the microphone array 
        
           mic_num = 4 
        
           # Sampling rate 
        
           fs = 22050 
        
           # The number of points in the FFT (Fourier Transform) used in each STFT window 
        
           n_fft = 512 
        
           # The path to load the placement coordinates 
        
           points_path = "./wav_data/points.txt" 
        
           # The path to record the DoA results 
        
           write_path = "./DoA.pkl"

Citation

@article{加藤 雅大2024,
  title={多チャンネル音響生成を評価するためのマイクロフォンアレイ室内インパルス応答データセットの構築},
  author={加藤 雅大 and 小島 諒介},
  journal={人工知能学会研究会資料 人工知能基本問題研究会},
  volume={128},
  pages={40-45},
  year={2024},
  doi={10.11517/jsaifpai.128.0_40}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
model		model
model_pipeline		model_pipeline
preprocess		preprocess
simulation		simulation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiChannel_RIR_Generation

Requirements (in addition to the usual python stack)

Project structure

Dataset

Simulation Data

Real data

Usage

Preprocess

Train

Test

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

	# The number of x-coordinates for placing the speakers and microphone arrays
	position_num_x = 13
	# The number of y-coordinates for placing the speakers and microphone arrays
	position_num_y = 13
	# The z-coordinate of the speakers and microphone arrays [m]
	position_z = 1.35
	# The spacing between microphone arrays [m]
	blank_space = 0.5
	# The radius of the circular microphone array [m]
	mic_radius = 0.1
	# The number of channels in the microphone array
	mic_num = 4
	# The microphone directivity flag (should be set to False)
	mic_directivity_flg = False
	# The path to record the placement coordinates
	points_path = "./wav_data/points.txt"
	# The path to record the maximum and minimum values of the placement coordinates
	minmax_path = "./minmax/minmax.pkl"
	# The path to record the simulation RIR
	results_dir = "./wav_data/raw/"

	# Reverberation time and room dimensions
	rt60 = 0.5 # seconds
	# If you make this two-dimensional, it will represent a two-dimensional room
	room_dim = [7.0, 6.4, 2.7] # meters
	sampling_rate = 48000 # Hz

Folders and files

Latest commit

History

Repository files navigation

MultiChannel_RIR_Generation

Requirements (in addition to the usual python stack)

Project structure

Dataset

Simulation Data

Real data

Usage

Preprocess

Train

Test

Evaluation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages