fix(tts): 修复流式播放初始杂音 by biggsun · Pull Request #27 · groxaxo/Qwen3-TTS-Openai-Fastapi

biggsun · 2026-05-15T03:18:20Z

首次发射时 codec 缓冲区帧数远小于解码窗口，
零填充码经解码器注意力层污染真实帧输出，
改用 decode() 直接解码实际帧，避免零填充伪影

首次发射时 codec 缓冲区帧数远小于解码窗口，零填充码经解码器注意力层污染真实帧输出，改用 decode() 直接解码实际帧，避免零填充伪影

xsploit · 2026-05-24T20:45:59Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e32945f50b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-24T20:48:56Z

+                window_codes = torch.stack(codes_buffer, dim=0)
+                wavs, sr = self.speech_tokenizer.decode(
+                    [{"audio_codes": window_codes.to(self.talker.device)}]
                )


Preserve ref-code context in first emitted chunk

This first-emit path decodes codes_buffer directly and bypasses _add_ref_code_context(...). In voice-clone ICL mode (ref_code_context is set) and the common case where the first emit has fewer than decode_window_frames, the first chunk is produced without reference-speaker context while subsequent chunks include it, which can cause an audible speaker/prosody jump at the first chunk boundary and regresses the earlier "stable from start" behavior.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

该 PR 旨在修复 TTS 流式播放时开头出现的杂音问题：当流式解码窗口远大于当前累计的 codec 帧数时，固定窗口的零填充会在解码器注意力/卷积感受野中传播并污染真实帧的输出，因此在首段输出时改用不做零填充的 decode() 路径。

Changes:

在 stream_generate_pcm() 的首次 emit 且帧数不足解码窗口时，改用 speech_tokenizer.decode() 直接解码真实帧，避免零填充伪影。
保留后续 emit 使用固定窗口的 streaming decode 优化路径（decode_streaming(... pad_to_size=decode_window_frames)）。

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            if total_frames_emitted == 0 and len(codes_buffer) < decode_window_frames:
+                # First emit: decode without zero-padding to avoid decoder artifacts.
+                # When the codec buffer is smaller than the decode window, zero-padding
+                # (codebook index 0) introduces artifacts because the neural decoder's
+                # transformer attention and convolutional receptive fields propagate
+                # the zero-code context into the real audio output, causing ~0.1-0.2s of
+                # noise at the beginning of streamed audio.  Using the regular decode()
+                # path produces clean output at the cost of a slightly higher latency for
+                # the first chunk only.
+                window_codes = torch.stack(codes_buffer, dim=0)
+                wavs, sr = self.speech_tokenizer.decode(
+                    [{"audio_codes": window_codes.to(self.talker.device)}]
                )
+                wav = wavs[0].astype(np.float32)
+                chunk = wav
            else:


+                if use_optimized_decode and hasattr(self.speech_tokenizer, 'decode_streaming'):
+                    wavs, sr = self.speech_tokenizer.decode_streaming(
+                        window.to(self.talker.device),
+                        use_optimized=True,
+                        pad_to_size=decode_window_frames,
+                    )


fix(tts): 修复流式播放初始杂音

e32945f

首次发射时 codec 缓冲区帧数远小于解码窗口，零填充码经解码器注意力层污染真实帧输出，改用 decode() 直接解码实际帧，避免零填充伪影

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

groxaxo requested a review from Copilot May 24, 2026 22:59

Copilot started reviewing on behalf of groxaxo May 24, 2026 22:59 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

groxaxo marked this pull request as draft June 4, 2026 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): 修复流式播放初始杂音#27

fix(tts): 修复流式播放初始杂音#27
biggsun wants to merge 1 commit into
groxaxo:mainfrom
biggsun:main

biggsun commented May 15, 2026

Uh oh!

xsploit commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

biggsun commented May 15, 2026

Uh oh!

xsploit commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants