fix: correct Qiniu STT endpoint and robust response parsing#14
Conversation
- QiniuSttClient: add error metadata to failed STT results so they can be distinguished from valid recognition results - StreamingPipelineOrchestrator: filter out STT error results before forwarding to LLM pipeline. Errors still sent to client for display.
Docker layer caching caused deployments to skip recompilation even when source code changed via git pull. Split into explicit build --no-cache + up --wait to ensure fresh builds on every deploy.
- Change default QINIU_STT_ENDPOINT from openai.qiniu.com to api.qnaigc.com (the correct API host for qnaigc voice/asr) - Fix parseAsrResponse to handle multiple response formats (data.text, data.result.text, top-level text) and correctly set error metadata so isSttError() filter in Orchestrator can detect API errors - Add subscribeOn(boundedElastic) to uploadAudioToQiniu to avoid blocking Reactor thread during synchronous file upload
…t/realtime-voice-stt-p0-p1
There was a problem hiding this comment.
Pull request overview
This PR fixes Qiniu STT integration issues by updating the default ASR endpoint, improving ASR response parsing/error signaling, and adjusting execution to avoid blocking Reactor threads during audio upload.
Changes:
- Update default Qiniu STT endpoint to
https://api.qnaigc.com/v1. - Make
parseAsrResponsehandle multiple response shapes and attach error metadata for downstream filtering. - Run synchronous upload work on
boundedElasticand tweak staging deploy build steps.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
vocata-server/src/main/resources/application.yml |
Updates default STT endpoint configuration. |
vocata-server/src/main/java/com/vocata/ai/stt/impl/QiniuSttClient.java |
Adds error metadata and broadens ASR response parsing; offloads upload to boundedElastic. |
vocata-server/src/main/java/com/vocata/ai/pipeline/StreamingPipelineOrchestrator.java |
Filters STT error results (via metadata) out of LLM path. |
.github/workflows/cd-staging.yml |
Splits compose build/up and forces no-cache builds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -144,13 +144,13 @@ public Flux<PipelineEvent> processVoiceMessage(String conversationUuid, | |||
| .filter(r -> r.getText() != null && !r.getText().trim().isEmpty()) | |||
There was a problem hiding this comment.
sttFlux 先按 text 非空过滤(144行),但当前 STT 错误结果有些分支会把 text 设为空(例如 Qiniu parseAsrResponse 的 error/unknown-format 分支)。这会导致“错误结果让前端知道发生了什么”的注释与实际行为不一致,并可能让语音链路在 STT 失败时静默结束(没有任何 STT/Error 事件)。建议保留错误结果通过(例如过滤条件允许 isSttError(r) 的结果通过,或在 STT 端为错误设置可展示的非空文本)。
| .filter(r -> r.getText() != null && !r.getText().trim().isEmpty()) | |
| .filter(r -> isSttError(r) || (r.getText() != null && !r.getText().trim().isEmpty())) |
| // 检查顶层错误字段 ({"error": "..."} 或 {"error": {...}}) | ||
| if (response.containsKey("error") && response.get("error") != null) { | ||
| String errorMessage = response.get("error").toString(); | ||
| logger.error("七牛云ASR API错误: {}", errorMessage); | ||
| metadata.put("error", errorMessage); | ||
| result.setText(""); | ||
| result.setConfidence(0.0); | ||
| } else if (response.containsKey("data")) { |
There was a problem hiding this comment.
当 ASR 返回顶层 error 或未知格式时,这里把 result.text 设为 ""(469-470、494-495)。在当前编排器里 STT 结果会先按 text 非空过滤,因此这些错误会被直接丢弃,前端看不到错误提示,且后续流程可能无输入直接完成。建议在保留 metadata.error 的同时给 text 设置一个可展示的错误消息(或改为由上游根据 metadata.error 生成并透传错误事件)。
| // 格式1: data.text | ||
| if (data.containsKey("text")) { | ||
| String t = (String) data.get("text"); | ||
| return StringUtils.hasText(t) ? t : ""; | ||
| } |
There was a problem hiding this comment.
extractTextFromData 在字段存在但内容为空白时返回空字符串(522行),而上层 parseAsrResponse 仅用 text != null 判定成功并赋予 0.95 置信度(476-479行)。这会把“空结果”误判为成功并记录高置信度。建议将空白文本视为未识别(返回 null 或在上层用 StringUtils.hasText(text) 判定成功),并在未识别时保持置信度为 0。
| echo "[2/5] deploy compose stack" | ||
| echo "当前提交: $GIT_SHA" | ||
| docker compose --env-file "$ENV_FILE" up -d --build --wait | ||
| docker compose --env-file "$ENV_FILE" build --no-cache |
There was a problem hiding this comment.
staging 部署流程改为每次都 docker compose build --no-cache(71行)会完全失去镜像层缓存,显著增加构建/部署时间与服务器磁盘/网络压力;如果只是确保拉到最新代码,默认使用缓存通常更合适。建议移除 --no-cache,或仅在手动触发/显式参数开启时使用无缓存构建。
| docker compose --env-file "$ENV_FILE" build --no-cache | |
| docker compose --env-file "$ENV_FILE" build |
…ture Xunfei IAT requires raw PCM (audio/L16;rate=16000) but MediaRecorder outputs webm/opus containers. Switch to AudioWorklet-compatible ScriptProcessorNode approach: - Create dedicated 16kHz AudioContext for recording - Capture Float32 samples, convert to Int16 PCM in-line - Send 4096-frame chunks (~256ms) directly as binary WebSocket frames - Remove MediaRecorder, handleAudioChunk, handleMediaRecorderStop - Update audio_start format field from 'mediarecorder' to 'pcm'
Summary
openai.qiniu.com/v1→api.qnaigc.com/v1(the correct host for voice/asr API)parseAsrResponseto handle multiple response formats (data.text,data.result.text, top-leveltext) and correctly seterrormetadata soisSttError()filter in Orchestrator properly detects and discards error results before LLMsubscribeOn(boundedElastic)touploadAudioToQiniuto avoid blocking Reactor thread during sync file uploadRoot cause
The
QINIU_STT_ENDPOINTwas never set on the staging server, so it fell back to the defaulthttps://openai.qiniu.com/v1instead of the correcthttps://api.qnaigc.com/v1, causing all/voice/asrcalls to fail with "access denied".Test plan