Is your feature request related to a problem? Please describe.
Add a VAD module for downstream tasks like Speech Recognition.
Describe the solution you'd like
The output should be probability estimates of speech vs no speech. Or just binary. 1 indicates speech 0 indicates no speech.
0.1 0.1 01. 0.5 0.6 0.7 0.9 1.0 1.0 1.0 0.4
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Some VAD systems process MFCC as images with CNN. Not sure how the module should be designed in that case.
Is your feature request related to a problem? Please describe.
Add a VAD module for downstream tasks like Speech Recognition.
Describe the solution you'd like
The output should be probability estimates of speech vs no speech. Or just binary. 1 indicates speech 0 indicates no speech.
0.1 0.1 01. 0.5 0.6 0.7 0.9 1.0 1.0 1.0 0.4
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Some VAD systems process MFCC as images with CNN. Not sure how the module should be designed in that case.