Skip to content

Feat/realtime voice stt p0 p1#17

Merged
ailuckly merged 3 commits into
developfrom
feat/realtime-voice-stt-p0-p1
Apr 15, 2026
Merged

Feat/realtime voice stt p0 p1#17
ailuckly merged 3 commits into
developfrom
feat/realtime-voice-stt-p0-p1

Conversation

@ailuckly
Copy link
Copy Markdown
Owner

📌 变更内容

  • 如:新增用户登录 API
  • 如:修复登录失败时错误提示

✅ 测试验证

  • 本地运行通过
  • 自测通过
  • CI 流水线通过

PR 提交规范提醒:

  • 确保提交主题信息符合约定式提交规范 (feat/fix/docs/style/refactor/test/chore)
  • 确保代码已经通过本地测试
  • 确保没有提交敏感信息(密码、密钥等)

xunfei.stt.* was only defined in test/prod profiles, causing
XunfeiWebSocketSttClient.isAvailable() to return false on the local
profile and silently fall back to QiniuSttClient. Moving the env-var
mappings to application.yml makes them available across all profiles.

Also updates .env.example with XUNFEI_STT_* fields and corrects
AI_STT_PROVIDER default to xunfei.
Three issues fixed:

1. Authorization missing base64 encoding: Xunfei API requires the
   authorization string to be base64-encoded before URL-encoding.
   Previously only URL-encoded, causing HTTP 401 on every connection.

2. Invalid business parameter: `max_rg` is not a valid Xunfei IAT
   field. Removed it along with other non-standard params (rlang,
   vinfo, speex_size) that caused "param validate error" responses.

3. wpgs mode returns only the last segment in the final frame: Added
   ConcurrentSkipListMap<sn, text> to accumulate recognition segments
   across frames. Handles pgs=rpl (replacement) and pgs=apd (append)
   so the final isFinal=true result contains the complete utterance.
Two related fixes:

1. saveMessage blocks Netty IO thread: Mono.fromCallable with a blocking
   JDBC call was running on the reactive pipeline thread (Netty IO),
   causing doFinally to be delayed. This left audioSink non-null when
   the user started the second recording, resulting in a silent
   "already in progress" rejection. Fixed by adding
   .subscribeOn(Schedulers.boundedElastic()).

2. Server error silently deadlocks frontend recording: When the server
   rejected audio_start with an error, handleError() only logged it.
   The frontend stayed in voiceState='recording' with no active server
   pipeline, so no STT results ever came back. Fixed by calling
   stopRecording() on any server error received while recording.
Copilot AI review requested due to automatic review settings April 15, 2026 17:07
@ailuckly ailuckly merged commit ad98175 into develop Apr 15, 2026
2 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the realtime voice STT experience by enhancing client-side error handling and adding/adjusting server-side support for Xunfei WebSocket STT, including config wiring and response parsing suited for wpgs dynamic correction.

Changes:

  • Web: stop recording when a server error is received to avoid a stuck recording state.
  • Server: add Xunfei STT/TTS configuration entries and improve Xunfei WebSocket STT streaming response handling (wpgs segment accumulation + auth encoding update).
  • Server: run blocking message persistence on boundedElastic to avoid blocking reactive threads.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vocata-web/src/utils/aiChat.ts Stops recording on server error to prevent the UI/audio pipeline from getting stuck.
vocata-server/src/main/resources/application.yml Adds Xunfei STT/TTS configuration section.
vocata-server/src/main/java/com/vocata/ai/stt/impl/XunfeiWebSocketSttClient.java Updates Xunfei STT WebSocket parsing for wpgs and adjusts authorization encoding.
vocata-server/src/main/java/com/vocata/ai/pipeline/StreamingPipelineOrchestrator.java Schedules blocking DB work on boundedElastic.
.env.example Adds Xunfei STT env vars and switches example STT provider to xunfei.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +148 to +150
app-id: ${XUNFEI_STT_APP_ID:your-xunfei-stt-app-id}
api-key: ${XUNFEI_STT_API_KEY:your-xunfei-stt-api-key}
secret-key: ${XUNFEI_STT_SECRET_KEY:your-xunfei-stt-secret-key}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里给 xunfei.stt.* 配置设置了默认占位符值(your-xunfei-stt-...)。但 XunfeiWebSocketSttClient#isAvailable() 目前只排除了 "your-xunfei-app-id/api-key/secret-key" 这几个占位符字符串,导致在仅设置 AI_STT_PROVIDER=xunfei 且未提供真实凭据时,appId/apiKey/secretKey 会落到这些默认值并被误判为“已配置”,随后连接科大讯飞会失败。建议:要么去掉这些默认值(留空让应用明确判定未配置),要么把 isAvailable() 里的占位符判断与这里的默认值保持一致。

Suggested change
app-id: ${XUNFEI_STT_APP_ID:your-xunfei-stt-app-id}
api-key: ${XUNFEI_STT_API_KEY:your-xunfei-stt-api-key}
secret-key: ${XUNFEI_STT_SECRET_KEY:your-xunfei-stt-secret-key}
app-id: ${XUNFEI_STT_APP_ID:}
api-key: ${XUNFEI_STT_API_KEY:}
secret-key: ${XUNFEI_STT_SECRET_KEY:}

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +136
int code = response.path("code").asInt(-1);
if (code != 0) {
String msg = response.path("message").asText("未知错误");
logger.error("🎤【科大讯飞WebSocket STT】API错误 code={}: {}", code, msg);
SttResult errorResult = new SttResult();
errorResult.setText("API错误: " + msg);
errorResult.setConfidence(0.0);
errorResult.setFinal(true);
Map<String, Object> errMeta = new HashMap<>();
errMeta.put("error", msg);
errMeta.put("provider", "XunfeiWebSocketSTT");
errorResult.setMetadata(errMeta);
sink.next(errorResult);
return WebSocket.Listener.super.onText(webSocket, data, last);
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当 response.code != 0 时这里只是 sink.next(errorResult) 然后继续返回 onText,但没有主动关闭 WebSocket、停止心跳或终止 sink。这样音频流仍会继续发送帧,可能导致持续报错/资源占用,调用端也可能一直等不到完成信号。建议在该分支里:停止 heartbeatActive、对 webSocket 发送 close,并 sink.complete() 或 sink.error(...)(并确保后续不再发送音频帧)。

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +141
JsonNode dataNode = response.path("data");
int frameStatus = dataNode.path("status").asInt(-1);
JsonNode result = dataNode.path("result");

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把解析逻辑内联后,类里原来的 parseWebSocketResponse(...) 已不再被调用(当前仓库内也没有其他引用),会留下死代码并增加后续维护/逻辑漂移风险。建议删除该方法,或复用该方法来解析响应并在其基础上做 wpgs 段落合并。

Copilot uses AI. Check for mistakes.
private Mono<Message> saveMessage(Long conversationId, String content, SenderType senderType, Long userId) {
return Mono.fromCallable(() -> {
Message message = new Message();
return Mono.fromCallable(() -> { Message message = new Message();
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono.fromCallable 的 lambda 开头被挤在同一行({ Message message...),看起来像是误删了换行/缩进,容易触发格式化或 Checkstyle/lint 规则,也降低可读性。建议把 { 后的语句按项目常规格式换行并对齐缩进。

Suggested change
return Mono.fromCallable(() -> { Message message = new Message();
return Mono.fromCallable(() -> {
Message message = new Message();

Copilot uses AI. Check for mistakes.
@ailuckly ailuckly deleted the feat/realtime-voice-stt-p0-p1 branch May 7, 2026 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants