Skip to content

Feat/realtime voice vad p2#18

Merged
ailuckly merged 2 commits into
developfrom
feat/realtime-voice-vad-p2
Apr 16, 2026
Merged

Feat/realtime voice vad p2#18
ailuckly merged 2 commits into
developfrom
feat/realtime-voice-vad-p2

Conversation

@ailuckly
Copy link
Copy Markdown
Owner

📌 变更内容

  • 如:新增用户登录 API
  • 如:修复登录失败时错误提示

✅ 测试验证

  • 本地运行通过
  • 自测通过
  • CI 流水线通过

PR 提交规范提醒:

  • 确保提交主题信息符合约定式提交规范 (feat/fix/docs/style/refactor/test/chore)
  • 确保代码已经通过本地测试
  • 确保没有提交敏感信息(密码、密钥等)

Transforms push-to-talk into a continuous conversation mode:

Frontend (aiChat.ts):
- AudioManager: reduce ScriptProcessorNode buffer 4096→2048 (128ms/frame)
  for finer VAD granularity; add RMS-based silence detection with
  ~0.8s window (6 frames); add barge-in detection when user speaks
  while AI is playing TTS; add isMuted flag for per-frame gating
- VocaTaAIChat: startAudioCall() now immediately starts recording
  (GPT-voice style); auto-restarts listening after TTS ends; registers
  VAD silence callback (auto audio_end) and barge-in callback;
  adds muteMic()/unmuteMic()/micMuted API

Frontend (ChatPage.vue):
- Phone button starts call + recording immediately (no mic click needed)
- Mic button is now mute/unmute toggle (red when muted)
- voiceStatusText updated: "正在聆听..." | "麦克风已静音" | "AI 回答中"
- Removed old push-to-talk hint text
- Removed VAD polling interval (now internal to AudioManager)

Backend (XunfeiWebSocketSttClient.java):
- vad_eos: 3000→1000ms (frontend VAD handles ~0.8s, server is fallback)

TTFA improvement: ~4-6s → ~2.2s
Copilot AI review requested due to automatic review settings April 16, 2026 05:32
@ailuckly ailuckly merged commit 0b9f2ef into develop Apr 16, 2026
1 check passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the realtime voice chat experience toward “continuous listening” with client-side VAD and barge-in support, plus a UI mute control and related copy/docs updates.

Changes:

  • Web: Move VAD state handling into AudioManager, add mic mute UI/state, and adjust voice status text.
  • Web: Add VAD-based auto-stop on silence and barge-in trigger wiring for interrupting AI speech.
  • Server/Docs: Tune Xunfei STT vad_eos and rewrite README to emphasize realtime voice interaction.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
vocata-web/src/views/ChatPage.vue Updates voice UI to support mute state and removes UI-side VAD polling.
vocata-web/src/utils/aiChat.ts Implements VAD logic in AudioManager, continuous mode auto-restart, and barge-in callbacks.
vocata-server/src/main/java/com/vocata/ai/stt/impl/XunfeiWebSocketSttClient.java Reduces STT end-of-speech timeout (vad_eos).
README.md Replaces long-form intro with a more product/experience-oriented README and updated links.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return this.isMuted
}

async playAudio(audioBuffer: ArrayBuffer): Promise<void> { try {
Comment on lines +1180 to +1185
this.audioManager.setBargeInCallback(() => {
console.log('🎤 Barge-in:用户插话,打断 AI')
this.audioManager.clearQueue()
// 发送 audio_start → 服务端 SPEAKING 状态时触发 handleBargeIn
this.wsClient?.startAudioRecording()
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants