-
Notifications
You must be signed in to change notification settings - Fork 695
Open
Labels
questionFurther information is requestedFurther information is requested
Description
I wonder how to decode in 70ms for 10s audio as you mention in "The SenseVoice-Small model utilizes a non-autoregressive end-to-end framework, leading to exceptionally low inference latency. It requires only 70ms to process 10 seconds of audio, which is 15 times faster than Whisper-Large."
it took me 200ms to decode 5s audio on GPU
But I don't use onnx and quantize, is it the cause why it is more time-comsuming than that as you declare?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested