Skip to content

KMASAHIRO/MultiChannel_RIR_Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiChannel_RIR_Generation

Source code for "Development of a Microphone Array Room Impulse Response Dataset for Evaluating Multichannel Acoustic Generation (多チャンネル音響生成を評価するためのマイクロフォンアレイ室内インパルス応答データセットの構築)".
This source code is based on Learning Neural Acoustic Fields.

Neural acoustic fields learn from a Room Impulse Response (RIR) dataset and can estimate RIR at arbitrary positions. We extended Neural Acoustic Fields to be able to learn from multichannel RIRs.
Additionally, we made it possible to construct datasets through Pyroomacoustics simulations and evaluate multichannel RIR estimation based on the accuracy of Direction of Arrival (DoA), i.e., sound source direction estimation.

The source code in this repository has been verified to work on Google Colaboratory.
Open In Colab

Requirements (in addition to the usual python stack)

  • Pytorch 1.9 (1.10 should work as well)
  • Pyroomacoustics 0.7.3
  • h5py

Project structure

  • model: Extended NAF network
  • model_pipeline: Train, test, and evaluation of the network
  • preprocess: Preprocessing the RIR dataset for the network
  • simulation: Pyroomacoustics simulation for creating dataset

Dataset

Simulation Data

  • Creating a simulation RIR dataset using Pyroomacoustics
    • python ./simulation/simulation.py
      
    Modifying the simulation environment by adjusting the following parameters within ./simulation/simulation.py
    # The number of x-coordinates for placing the speakers and microphone arrays
    position_num_x = 13
    # The number of y-coordinates for placing the speakers and microphone arrays
    position_num_y = 13
    # The z-coordinate of the speakers and microphone arrays [m]
    position_z = 1.35
    # The spacing between microphone arrays [m]
    blank_space = 0.5
    # The radius of the circular microphone array [m]
    mic_radius = 0.1
    # The number of channels in the microphone array
    mic_num = 4
    # The microphone directivity flag (should be set to False)
    mic_directivity_flg = False
    # The path to record the placement coordinates
    points_path = "./wav_data/points.txt"
    # The path to record the maximum and minimum values of the placement coordinates
    minmax_path = "./minmax/minmax.pkl"
    # The path to record the simulation RIR
    results_dir = "./wav_data/raw/"
    # Reverberation time and room dimensions
    rt60 = 0.5 # seconds
    # If you make this two-dimensional, it will represent a two-dimensional room
    room_dim = [7.0, 6.4, 2.7] # meters
    sampling_rate = 48000 # Hz

Real data

  • Downloading the real RIR dataset from Google Drive
    • gdown https://drive.google.com/uc?id=1ed4MeDcsWhquXO_a3mNQCJ5i21TR8JpQ  
      unzip real_wav_data.zip
      

Usage

Preprocess

  • Splitting the dataset into training and test dataset
    • python ./preprocess/make_train_test_split.py
      
  • Preprocessing the RIR waveform data to match the format of the network output (e.g., by converting it into spectrograms)
    • python ./preprocess/make_data.py
      

Train

  • Training the network

    For more details on options, refer to ./model_pipeline/options.py.
    The trained model will be saved in the directory specified by save_loc and exp_name options.
    • python ./model_pipeline/train/train.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
      

Test

  • Performing inference on the test data

    The options must be the same as those used in training.
    The inference output will be saved in the directory specified by the --save_loc and --inference_loc options.
    • python ./model_pipeline/test/test.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
      
  • Perform inference on the train data
    • python ./model_pipeline/test/test_train_data.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
      

Evaluation

  • Computing the spectral loss from the inference results on the test data

    The options must be the same as those used in training.
    The results will be printed to the standard output.
    • python ./model_pipeline/evaluation/compute_spectral_loss.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
      
  • Computing the T60-error from the inference results on the test data

    The options must be the same as those used in training.
    The results will be printed to the standard output.
    • python ./model_pipeline/evaluation/compute_T60_err.py --exp_name sim_data_exp --epochs 300 --phase_alpha 3.0 --dir_ch 4
      
  • Computing the DoA error from the inference results on the test data

    Do not specify the options.
    The results will be printed to the standard output.
    • python ./model_pipeline/evaluation/compute_DoA_err.py
      
    Directly modifying ./model_pipeline/evaluation/compute_DoA_err.py to change the parameters
    # The radius of the circular microphone array [m]
    mic_radius = 0.1
    # The number of channels in the microphone array
    mic_num = 4
    # Sampling rate
    fs = 22050
    # The number of points in the FFT (Fourier Transform) used in each STFT window
    n_fft = 512
    # The path to load the placement coordinates
    points_path = "./wav_data/points.txt"
    # The path to record the DoA results
    write_path = "./DoA.pkl"

Citation

@article{加藤 雅大2024,
  title={多チャンネル音響生成を評価するためのマイクロフォンアレイ室内インパルス応答データセットの構築},
  author={加藤 雅大 and 小島 諒介},
  journal={人工知能学会研究会資料 人工知能基本問題研究会},
  volume={128},
  pages={40-45},
  year={2024},
  doi={10.11517/jsaifpai.128.0_40}
}

About

Source code for "Development of a Microphone Array Room Impulse Response Dataset for Evaluating Multichannel Acoustic Generation (多チャンネル音響生成を評価するためのマイクロフォンアレイ室内インパルス応答データセットの構築)".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages