Skip to content

STT Failure: "No Voice detected" when using Wyoming Whisper with Sherpa-ONNX #64

@pvossel

Description

@pvossel

Spoken commands fail with the error "No Voice detected" (and Home Assistant logs show stt-no-text-recognized) when the Wyoming STT engine is configured to use the Sherpa-ONNX library (e.g., Parakeet-TDT models).

The wake word is detected perfectly, and the VAD (Voice Activity Detection) triggers correctly in the logs, but no text is transcribed.

Technical Details:
App Version: Ava v0.4.5 (knoop7 fork)
STT Engine: rhasspy/wyoming-whisper:latest
Library: sherpa (specifically sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8)

Behavior:
Switching back to standard faster-whisper library with medium-int8 model resolves the issue immediately.

The brownard/Ava fork works correctly with the Sherpa library, suggesting an audio encoding or streaming protocol mismatch in the knoop7 audio pipeline when communicating with Sherpa-based Wyoming servers.

- type: stt-vad-start
  timestamp: "2026-04-08T07:45:41.956267+00:00"
- type: stt-vad-end
  timestamp: "2026-04-08T07:45:42.835689+00:00"
- type: error
  data:
    code: stt-no-text-recognized
    message: No text recognized

It seems the audio stream format or chunking provided by this fork is not compatible with the stricter requirements of the Sherpa library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions