Feat/realtime voice vad p2 by ailuckly · Pull Request #20 · ailuckly/VocaTa

ailuckly · 2026-04-19T07:03:59Z

No description provided.

Two fixes for false VAD triggers from ambient noise: 1. SPEECH_THRESHOLD: 0.02 → 0.03, MIN_SPEECH_FRAMES: 5 → 8 (~1s) Higher bar for what counts as "speech" to filter ambient noise. 2. Empty pipeline loop protection: when handleProcessComplete fires with no TTS played (likely a noise-triggered empty pipeline), delay resumeRecording by 2s instead of 300ms. Prevents the rapid loop: noise → empty STT → complete → resume → noise → repeat.

Root cause: ambient noise exceeded SPEECH_THRESHOLD, causing false hasSpeechStarted=true. When noise subsided, VAD silence triggered even though the user never actually spoke. Raising thresholds helped but didn't eliminate the issue for noisy environments. Fix: add sttConfirmedSpeech flag. VAD silence detection now requires BOTH local RMS silence AND at least one non-empty STT intermediate result from the server. This ensures the user actually spoke before the system considers ending the recording. Flow: recording starts → local RMS detects audio → sends to server → server STT returns partial text → confirmSpeechFromSTT() → NOW VAD silence detection is armed → user stops speaking → 1.3s silence → auto audio_end Without STT confirmation, ambient noise can trigger local speech detection but VAD will never fire audio_end.

After TTS playback, resumeRecording() re-enabled audio frame sending on the frontend but never sent audio_start to the server. The server's audioSink was null (cleared by doFinally after the previous pipeline), so all audio frames were silently dropped in handleBinaryMessage. Fix: send wsClient.startAudioRecording() (audio_start) alongside resumeRecording() in both the playback state listener (normal TTS end) and handleProcessComplete (empty pipeline case). This creates a new server-side audioSink + pipeline for the next conversation round.

…tion Root cause: startAudioCall() calls clearQueue() which triggers notifyPlaybackState(false). The playback listener sees the call is active and schedules resumeRecording() + startAudioRecording() after 300ms. But startRecording() already sent audio_start. The second audio_start is rejected by the server with "已有进行中的音频会话", handleError stops recording, and the session enters a broken state where subsequent pipelines receive no audio (Xunfei timeout). Fix: resumeRecording() now returns boolean (true only when actually resuming from monitoring mode). Callers only send audio_start when resume returns true. This prevents: - Double audio_start on initial connection (monitoringOnly=false → no-op) - Stale audio_start after error recovery (recordingState=idle → no-op) Also adds fallback in handleProcessComplete: if recording hardware was stopped by an error, falls back to full startRecording() instead of resumeRecording().

…trigger After user speaks and VAD pauses recording, sttConfirmedSpeech stayed true from the previous STT result. When TTS ended and recording resumed, the stale sttConfirmedSpeech allowed VAD silence to trigger immediately (ambient noise met all other conditions). This sent an empty audio_end, creating an empty pipeline that took 15s to timeout on Xunfei, appearing as "mic stuck". Fix: reset sttConfirmedSpeech=false in pauseRecording(). Each new recording round must get fresh STT confirmation before VAD can fire.

Copilot

Pull request overview

This PR tunes the client-side realtime voice VAD behavior in VocaTaAIChat/AudioManager to reduce environment-noise false triggers by tightening VAD thresholds and gating “silence stop” on an STT-confirmed speech signal, while also preventing redundant audio_start messages when resuming.

Changes:

Adjust VAD thresholds and minimum speech frame requirements to be more noise-resistant.
Add an sttConfirmedSpeech gate so VAD silence only triggers after STT has produced valid text.
Change AudioManager.resumeRecording() to return a boolean and only send audio_start when a resume actually occurred; add a longer delay on complete recovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T07:07:52Z

+                // VAD 静音触发条件：发送模式 + STT 已确认说话
+                if (this.silenceFrameCount >= SILENCE_FRAMES_REQUIRED
+                    && !this.monitoringOnly
+                    && this.sttConfirmedSpeech) {


VAD 静音触发现在依赖 sttConfirmedSpeech。如果 STT 在本轮没有产出任何文本（例如识别失败/空结果/只在 audio_end 后才返回），这里将永远不会触发静音回调，导致连续模式下客户端持续发送音频且不会 audio_end。建议增加兜底策略：例如在检测到持续语音达到一定时长后也允许进入静音检测/触发停止，或在收到任何 STT（含 final）/服务端状态事件时解锁静音检测。

Copilot · 2026-04-19T07:07:52Z

+    // STT 返回有效文本 → 确认用户确实在说话，启用 VAD 静音检测
+    if (message.text && message.text.trim().length > 0 && !message.isFinal) {


confirmSpeechFromSTT() 仅在 !message.isFinal 时触发，但静音检测又依赖该确认标志。若服务端只发送 final（或首个非空结果即为 final），将导致整轮对话永远无法启用 VAD 静音检测。建议对任意非空文本（包含 final）都进行确认，或至少在 final 为非空时也设置确认标志。

Suggested change

// STT 返回有效文本 → 确认用户确实在说话，启用 VAD 静音检测

if (message.text && message.text.trim().length > 0 && !message.isFinal) {

// STT 返回任意有效文本（包括 final）→ 确认用户确实在说话，启用 VAD 静音检测

if (message.text && message.text.trim().length > 0) {

ailuckly added 5 commits April 19, 2026 10:59

Copilot AI review requested due to automatic review settings April 19, 2026 07:03

ailuckly merged commit f0e48e9 into develop Apr 19, 2026
6 checks passed

Copilot started reviewing on behalf of ailuckly April 19, 2026 07:04 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

ailuckly deleted the feat/realtime-voice-vad-p2 branch May 7, 2026 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/realtime voice vad p2#20

Feat/realtime voice vad p2#20
ailuckly merged 5 commits into
developfrom
feat/realtime-voice-vad-p2

ailuckly commented Apr 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// STT 返回有效文本 → 确认用户确实在说话，启用 VAD 静音检测
		if (message.text && message.text.trim().length > 0 && !message.isFinal) {

Conversation

ailuckly commented Apr 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants