Skip to content

decoding speed #280

@housebaby

Description

@housebaby

I wonder how to decode in 70ms for 10s audio as you mention in "The SenseVoice-Small model utilizes a non-autoregressive end-to-end framework, leading to exceptionally low inference latency. It requires only 70ms to process 10 seconds of audio, which is 15 times faster than Whisper-Large."

it took me 200ms to decode 5s audio on GPU
But I don't use onnx and quantize, is it the cause why it is more time-comsuming than that as you declare?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions