Skip to content

fix: correct Qiniu STT endpoint and robust response parsing#14

Merged
ailuckly merged 5 commits into
developfrom
feat/realtime-voice-stt-p0-p1
Apr 12, 2026
Merged

fix: correct Qiniu STT endpoint and robust response parsing#14
ailuckly merged 5 commits into
developfrom
feat/realtime-voice-stt-p0-p1

Conversation

@ailuckly
Copy link
Copy Markdown
Owner

Summary

  • Fix default STT endpoint from openai.qiniu.com/v1api.qnaigc.com/v1 (the correct host for voice/asr API)
  • Improve parseAsrResponse to handle multiple response formats (data.text, data.result.text, top-level text) and correctly set error metadata so isSttError() filter in Orchestrator properly detects and discards error results before LLM
  • Add subscribeOn(boundedElastic) to uploadAudioToQiniu to avoid blocking Reactor thread during sync file upload

Root cause

The QINIU_STT_ENDPOINT was never set on the staging server, so it fell back to the default https://openai.qiniu.com/v1 instead of the correct https://api.qnaigc.com/v1, causing all /voice/asr calls to fail with "access denied".

Test plan

  • Voice chat: verify STT recognition triggers and returns recognized text in stt_result events
  • Verify STT errors no longer get sent to LLM as user input

- QiniuSttClient: add error metadata to failed STT results so they
  can be distinguished from valid recognition results
- StreamingPipelineOrchestrator: filter out STT error results before
  forwarding to LLM pipeline. Errors still sent to client for display.
Docker layer caching caused deployments to skip recompilation even
when source code changed via git pull. Split into explicit build
--no-cache + up --wait to ensure fresh builds on every deploy.
- Change default QINIU_STT_ENDPOINT from openai.qiniu.com to api.qnaigc.com
  (the correct API host for qnaigc voice/asr)
- Fix parseAsrResponse to handle multiple response formats (data.text,
  data.result.text, top-level text) and correctly set error metadata
  so isSttError() filter in Orchestrator can detect API errors
- Add subscribeOn(boundedElastic) to uploadAudioToQiniu to avoid
  blocking Reactor thread during synchronous file upload
Copilot AI review requested due to automatic review settings April 12, 2026 12:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Qiniu STT integration issues by updating the default ASR endpoint, improving ASR response parsing/error signaling, and adjusting execution to avoid blocking Reactor threads during audio upload.

Changes:

  • Update default Qiniu STT endpoint to https://api.qnaigc.com/v1.
  • Make parseAsrResponse handle multiple response shapes and attach error metadata for downstream filtering.
  • Run synchronous upload work on boundedElastic and tweak staging deploy build steps.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
vocata-server/src/main/resources/application.yml Updates default STT endpoint configuration.
vocata-server/src/main/java/com/vocata/ai/stt/impl/QiniuSttClient.java Adds error metadata and broadens ASR response parsing; offloads upload to boundedElastic.
vocata-server/src/main/java/com/vocata/ai/pipeline/StreamingPipelineOrchestrator.java Filters STT error results (via metadata) out of LLM path.
.github/workflows/cd-staging.yml Splits compose build/up and forces no-cache builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -144,13 +144,13 @@ public Flux<PipelineEvent> processVoiceMessage(String conversationUuid,
.filter(r -> r.getText() != null && !r.getText().trim().isEmpty())
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sttFlux 先按 text 非空过滤(144行),但当前 STT 错误结果有些分支会把 text 设为空(例如 Qiniu parseAsrResponse 的 error/unknown-format 分支)。这会导致“错误结果让前端知道发生了什么”的注释与实际行为不一致,并可能让语音链路在 STT 失败时静默结束(没有任何 STT/Error 事件)。建议保留错误结果通过(例如过滤条件允许 isSttError(r) 的结果通过,或在 STT 端为错误设置可展示的非空文本)。

Suggested change
.filter(r -> r.getText() != null && !r.getText().trim().isEmpty())
.filter(r -> isSttError(r) || (r.getText() != null && !r.getText().trim().isEmpty()))

Copilot uses AI. Check for mistakes.
Comment on lines +464 to 471
// 检查顶层错误字段 ({"error": "..."} 或 {"error": {...}})
if (response.containsKey("error") && response.get("error") != null) {
String errorMessage = response.get("error").toString();
logger.error("七牛云ASR API错误: {}", errorMessage);
metadata.put("error", errorMessage);
result.setText("");
result.setConfidence(0.0);
} else if (response.containsKey("data")) {
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当 ASR 返回顶层 error 或未知格式时,这里把 result.text 设为 ""(469-470、494-495)。在当前编排器里 STT 结果会先按 text 非空过滤,因此这些错误会被直接丢弃,前端看不到错误提示,且后续流程可能无输入直接完成。建议在保留 metadata.error 的同时给 text 设置一个可展示的错误消息(或改为由上游根据 metadata.error 生成并透传错误事件)。

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +523
// 格式1: data.text
if (data.containsKey("text")) {
String t = (String) data.get("text");
return StringUtils.hasText(t) ? t : "";
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractTextFromData 在字段存在但内容为空白时返回空字符串(522行),而上层 parseAsrResponse 仅用 text != null 判定成功并赋予 0.95 置信度(476-479行)。这会把“空结果”误判为成功并记录高置信度。建议将空白文本视为未识别(返回 null 或在上层用 StringUtils.hasText(text) 判定成功),并在未识别时保持置信度为 0。

Copilot uses AI. Check for mistakes.
echo "[2/5] deploy compose stack"
echo "当前提交: $GIT_SHA"
docker compose --env-file "$ENV_FILE" up -d --build --wait
docker compose --env-file "$ENV_FILE" build --no-cache
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

staging 部署流程改为每次都 docker compose build --no-cache(71行)会完全失去镜像层缓存,显著增加构建/部署时间与服务器磁盘/网络压力;如果只是确保拉到最新代码,默认使用缓存通常更合适。建议移除 --no-cache,或仅在手动触发/显式参数开启时使用无缓存构建。

Suggested change
docker compose --env-file "$ENV_FILE" build --no-cache
docker compose --env-file "$ENV_FILE" build

Copilot uses AI. Check for mistakes.
…ture

Xunfei IAT requires raw PCM (audio/L16;rate=16000) but MediaRecorder
outputs webm/opus containers. Switch to AudioWorklet-compatible
ScriptProcessorNode approach:
- Create dedicated 16kHz AudioContext for recording
- Capture Float32 samples, convert to Int16 PCM in-line
- Send 4096-frame chunks (~256ms) directly as binary WebSocket frames
- Remove MediaRecorder, handleAudioChunk, handleMediaRecorderStop
- Update audio_start format field from 'mediarecorder' to 'pcm'
@ailuckly ailuckly merged commit 74b4e80 into develop Apr 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants