Feat/realtime voice stt p0 p1#17
Conversation
xunfei.stt.* was only defined in test/prod profiles, causing XunfeiWebSocketSttClient.isAvailable() to return false on the local profile and silently fall back to QiniuSttClient. Moving the env-var mappings to application.yml makes them available across all profiles. Also updates .env.example with XUNFEI_STT_* fields and corrects AI_STT_PROVIDER default to xunfei.
Three issues fixed: 1. Authorization missing base64 encoding: Xunfei API requires the authorization string to be base64-encoded before URL-encoding. Previously only URL-encoded, causing HTTP 401 on every connection. 2. Invalid business parameter: `max_rg` is not a valid Xunfei IAT field. Removed it along with other non-standard params (rlang, vinfo, speex_size) that caused "param validate error" responses. 3. wpgs mode returns only the last segment in the final frame: Added ConcurrentSkipListMap<sn, text> to accumulate recognition segments across frames. Handles pgs=rpl (replacement) and pgs=apd (append) so the final isFinal=true result contains the complete utterance.
Two related fixes: 1. saveMessage blocks Netty IO thread: Mono.fromCallable with a blocking JDBC call was running on the reactive pipeline thread (Netty IO), causing doFinally to be delayed. This left audioSink non-null when the user started the second recording, resulting in a silent "already in progress" rejection. Fixed by adding .subscribeOn(Schedulers.boundedElastic()). 2. Server error silently deadlocks frontend recording: When the server rejected audio_start with an error, handleError() only logged it. The frontend stayed in voiceState='recording' with no active server pipeline, so no STT results ever came back. Fixed by calling stopRecording() on any server error received while recording.
There was a problem hiding this comment.
Pull request overview
This PR improves the realtime voice STT experience by enhancing client-side error handling and adding/adjusting server-side support for Xunfei WebSocket STT, including config wiring and response parsing suited for wpgs dynamic correction.
Changes:
- Web: stop recording when a server error is received to avoid a stuck recording state.
- Server: add Xunfei STT/TTS configuration entries and improve Xunfei WebSocket STT streaming response handling (wpgs segment accumulation + auth encoding update).
- Server: run blocking message persistence on
boundedElasticto avoid blocking reactive threads.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vocata-web/src/utils/aiChat.ts | Stops recording on server error to prevent the UI/audio pipeline from getting stuck. |
| vocata-server/src/main/resources/application.yml | Adds Xunfei STT/TTS configuration section. |
| vocata-server/src/main/java/com/vocata/ai/stt/impl/XunfeiWebSocketSttClient.java | Updates Xunfei STT WebSocket parsing for wpgs and adjusts authorization encoding. |
| vocata-server/src/main/java/com/vocata/ai/pipeline/StreamingPipelineOrchestrator.java | Schedules blocking DB work on boundedElastic. |
| .env.example | Adds Xunfei STT env vars and switches example STT provider to xunfei. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| app-id: ${XUNFEI_STT_APP_ID:your-xunfei-stt-app-id} | ||
| api-key: ${XUNFEI_STT_API_KEY:your-xunfei-stt-api-key} | ||
| secret-key: ${XUNFEI_STT_SECRET_KEY:your-xunfei-stt-secret-key} |
There was a problem hiding this comment.
这里给 xunfei.stt.* 配置设置了默认占位符值(your-xunfei-stt-...)。但 XunfeiWebSocketSttClient#isAvailable() 目前只排除了 "your-xunfei-app-id/api-key/secret-key" 这几个占位符字符串,导致在仅设置 AI_STT_PROVIDER=xunfei 且未提供真实凭据时,appId/apiKey/secretKey 会落到这些默认值并被误判为“已配置”,随后连接科大讯飞会失败。建议:要么去掉这些默认值(留空让应用明确判定未配置),要么把 isAvailable() 里的占位符判断与这里的默认值保持一致。
| app-id: ${XUNFEI_STT_APP_ID:your-xunfei-stt-app-id} | |
| api-key: ${XUNFEI_STT_API_KEY:your-xunfei-stt-api-key} | |
| secret-key: ${XUNFEI_STT_SECRET_KEY:your-xunfei-stt-secret-key} | |
| app-id: ${XUNFEI_STT_APP_ID:} | |
| api-key: ${XUNFEI_STT_API_KEY:} | |
| secret-key: ${XUNFEI_STT_SECRET_KEY:} |
| int code = response.path("code").asInt(-1); | ||
| if (code != 0) { | ||
| String msg = response.path("message").asText("未知错误"); | ||
| logger.error("🎤【科大讯飞WebSocket STT】API错误 code={}: {}", code, msg); | ||
| SttResult errorResult = new SttResult(); | ||
| errorResult.setText("API错误: " + msg); | ||
| errorResult.setConfidence(0.0); | ||
| errorResult.setFinal(true); | ||
| Map<String, Object> errMeta = new HashMap<>(); | ||
| errMeta.put("error", msg); | ||
| errMeta.put("provider", "XunfeiWebSocketSTT"); | ||
| errorResult.setMetadata(errMeta); | ||
| sink.next(errorResult); | ||
| return WebSocket.Listener.super.onText(webSocket, data, last); | ||
| } |
There was a problem hiding this comment.
当 response.code != 0 时这里只是 sink.next(errorResult) 然后继续返回 onText,但没有主动关闭 WebSocket、停止心跳或终止 sink。这样音频流仍会继续发送帧,可能导致持续报错/资源占用,调用端也可能一直等不到完成信号。建议在该分支里:停止 heartbeatActive、对 webSocket 发送 close,并 sink.complete() 或 sink.error(...)(并确保后续不再发送音频帧)。
| JsonNode dataNode = response.path("data"); | ||
| int frameStatus = dataNode.path("status").asInt(-1); | ||
| JsonNode result = dataNode.path("result"); | ||
|
|
There was a problem hiding this comment.
这里把解析逻辑内联后,类里原来的 parseWebSocketResponse(...) 已不再被调用(当前仓库内也没有其他引用),会留下死代码并增加后续维护/逻辑漂移风险。建议删除该方法,或复用该方法来解析响应并在其基础上做 wpgs 段落合并。
| private Mono<Message> saveMessage(Long conversationId, String content, SenderType senderType, Long userId) { | ||
| return Mono.fromCallable(() -> { | ||
| Message message = new Message(); | ||
| return Mono.fromCallable(() -> { Message message = new Message(); |
There was a problem hiding this comment.
Mono.fromCallable 的 lambda 开头被挤在同一行({ Message message...),看起来像是误删了换行/缩进,容易触发格式化或 Checkstyle/lint 规则,也降低可读性。建议把 { 后的语句按项目常规格式换行并对齐缩进。
| return Mono.fromCallable(() -> { Message message = new Message(); | |
| return Mono.fromCallable(() -> { | |
| Message message = new Message(); |
📌 变更内容
✅ 测试验证
PR 提交规范提醒: