(fix): audio gaps in tts by narsimhaReddyJuspay · Pull Request #749 · juspay/clairvoyance

narsimhaReddyJuspay · 2026-05-07T09:28:50Z

Summary by CodeRabbit

New Features
- Added audio pre-buffering for Daily WebRTC conversations to improve audio delivery consistency
- Optimized audio chunk timing configuration for smoother cadence during voice interactions
Documentation
- Added technical analysis documentation for audio handling in Daily transport mode

coderabbitai · 2026-05-07T09:29:04Z

Walkthrough

This PR mitigates audible TTS audio gaps in Breeze Buddy's Daily WebRTC mode by introducing a new audio pre-buffering processor, reducing transport chunk size to 20ms, and conditionally wiring these into the agent pipeline for daily mode only.

Changes

Daily Mode Audio Gap Fixes

Layer / File(s)	Summary
Audio Pre-Buffer Processor `app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py`	New `AudioPreBufferProcessor` class buffers the first N `OutputAudioRawFrame` frames per speaking turn before releasing them downstream, then switches to pass-through mode; resets and flushes on `BotStoppedSpeakingFrame`.
Processor Module Exports `app/ai/voice/agents/breeze_buddy/processors/__init__.py`	Imports and exports `AudioPreBufferProcessor` via `__all__`.
Transport Audio Chunking `app/ai/voice/agents/breeze_buddy/agent/transport.py`	Daily `DailyParams` now configured with `audio_out_10ms_chunks=2` to emit 20ms chunks for smoother WebRTC cadence.
Daily Mode Pipeline Wiring `app/ai/voice/agents/breeze_buddy/agent/pipeline.py`, `app/ai/voice/agents/breeze_buddy/agent/__init__.py`	`build_pipeline()` gains `is_daily_mode` parameter; when enabled in agent mode (not stream), inserts `AudioPreBufferProcessor` between TTS and transport output, and reorders `context_aggregator.assistant()` after transport output. Agent initialization passes `is_daily_mode=self.is_daily_mode` to `build_pipeline()`.
Root Cause Analysis & Fix Plan `docs/TTS_AUDIO_GAP_ROOT_CAUSE_ANALYSIS.md`	Documents post-rebase analysis: marks 500ms inter-context silence and SDK version issues as fixed; identifies remaining active cause (missing `without_mixer` audio pacing); specifies prioritized fixes and verification plan.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

juspay/clairvoyance#481: Modifies Breeze Buddy pipeline wiring and adds new FrameProcessors.
juspay/clairvoyance#591: Edits the same build_pipeline() construction and agent wiring for different processor insertion patterns.
juspay/clairvoyance#714: Modifies transport parameter wiring in app/ai/voice/agents/breeze_buddy/agent/transport.py.

Suggested reviewers

Devansh-1218
amreetkhuntia
sharifajahanshaik

Poem

🐰 A gap in the audio, a puzzle to solve,
Pre-buffering frames so the silence can dissolve,
Twenty milliseconds, a rhythmic new beat,
Daily mode's flowing, so smooth and so sweet! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main objective: fixing audio gaps in TTS output, which is reflected in all code changes (pipeline buffering, transport audio chunk configuration, and the root cause analysis document).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR targets audible TTS audio gaps in Breeze Buddy’s Daily (WebRTC) mode by improving the cadence and startup buffering of outbound audio, and documents the investigation/root causes.

Changes:

Reduce Daily outbound write granularity by setting audio_out_10ms_chunks=2 (20ms chunks).
Add an AudioPreBufferProcessor and insert it (Daily agent-mode only) between TTS and transport.output() to pre-buffer initial audio frames per bot turn.
Add a root-cause analysis document describing confirmed/eliminated causes and a verification plan.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
docs/TTS_AUDIO_GAP_ROOT_CAUSE_ANALYSIS.md	Adds RCA + proposed fixes and verification plan for Daily-mode TTS gaps.
app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py	Introduces a new processor that buffers the first N `OutputAudioRawFrame`s per bot turn.
app/ai/voice/agents/breeze_buddy/processors/init.py	Exports `AudioPreBufferProcessor` from the processors package.
app/ai/voice/agents/breeze_buddy/agent/transport.py	Configures Daily transport to emit smaller audio chunks (`audio_out_10ms_chunks=2`).
app/ai/voice/agents/breeze_buddy/agent/pipeline.py	Adds `is_daily_mode` pipeline flag and conditionally inserts the pre-buffer processor.
app/ai/voice/agents/breeze_buddy/agent/init.py	Passes `is_daily_mode` into `build_pipeline()` when constructing the agent pipeline.

+
+## Diagnostic Tests Run
+
+13 tests across 6 test classes in `tests/test_audio_gap_diagnosis.py`. All pass.


+| Test Class | What It Tests | Result |
+|------------|---------------|--------|
+| TestEventLoopContention | Does BB's extra tasks starve audio output? | **ELIMINATED** — max gap 1.1ms even with 80 competing tasks |
+| TestSOXRResampler | Does resampler state clearing cause discontinuities? | **ELIMINATED** — discontinuity ratio 0.19 (threshold 2.0) |
+| TestDailySDKCallbackLatency | Does `call_soon` / `write_frames` callback delay? | **ELIMINATED** — P99 0.35ms under 30 competing tasks |
+| TestBaselinePipelineJitter | Baseline frame delivery jitter | **ELIMINATED** — sub-ms jitter in ideal conditions |
+| TestInterContextSilence | 500ms silence between TTS audio contexts | **CONFIRMED** (now fixed by pipecat 1.1.0) |
+| TestAggregateSentencesImpact | aggregate_sentences creates more context boundaries | **REVISED** — not a cause post-rebase (sentence aggregation is beneficial) |
+


 async def build_pipeline(
    transport: Any,
    stt: Optional[Any],
    llm: Optional[Any],
    tts: Optional[Any],
    vad_analyzer: Optional[SileroVADAnalyzer] = None,
    configurations: Optional[ConfigurationModel] = None,
    on_user_idle_timeout: Optional[Callable[[int], Any]] = None,
    mode: Literal["agent", "stream"] = "agent",
+    is_daily_mode: bool = False,
 ) -> tuple[


coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py`:
- Around line 20-24: Remove the extra blank line between the import block and
the class definition so Black formatting passes: ensure there are exactly two
consecutive newlines between the import statements (including "from
pipecat.processors.frame_processor import FrameDirection, FrameProcessor") and
the "class AudioPreBufferProcessor(FrameProcessor):" declaration by deleting one
of the three blank lines.

In `@docs/TTS_AUDIO_GAP_ROOT_CAUSE_ANALYSIS.md`:
- Around line 158-165: Update the doc block to reflect the actual
implementation: replace references to the non-existent utils/audio_pacing.py
transport-wrapper with the implemented pipeline FrameProcessor in
app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py (mention the
FrameProcessor/AudioPreBuffer processor name and that it buffers 2–3 initial
audio chunks before forwarding to the Daily SDK rather than wrapping
write_audio_frame). Keep the rest of the text (why, risk, rollback) but change
the "File" and "Approach" lines to point to
app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py and describe the
implemented buffering pipeline stage.
- Around line 87-100: The fenced code block containing the audio flow should
include a language tag to silence markdownlint MD040; change the opening fence
from ``` to ```text for the block that starts with "TTS audio frames" (which
documents BaseOutputTransport.MediaSender.handle_audio_frame(), _next_frame(),
_audio_task_handler(), DailyOutputTransport.write_audio_frame(), and
CustomAudioSource.write_frames()) so the block is treated as plain text by the
linter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 76bbbe12-a9dd-4fd0-9b7b-ca7d84911cff

📥 Commits

Reviewing files that changed from the base of the PR and between a40123c and c41dee7.

📒 Files selected for processing (6)

app/ai/voice/agents/breeze_buddy/agent/__init__.py
app/ai/voice/agents/breeze_buddy/agent/pipeline.py
app/ai/voice/agents/breeze_buddy/agent/transport.py
app/ai/voice/agents/breeze_buddy/processors/__init__.py
app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py
docs/TTS_AUDIO_GAP_ROOT_CAUSE_ANALYSIS.md

coderabbitai · 2026-05-07T09:34:55Z

+from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
+
+
+
+class AudioPreBufferProcessor(FrameProcessor):


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the extra blank line to unblock CI (Black formatting failure).

There are three consecutive blank lines between the import block and the class definition (lines 21–23). Black requires exactly two, and the pipeline fails on this file.

🛠️ Proposed fix

from pipecat.processors.frame_processor import FrameDirection, FrameProcessor - class AudioPreBufferProcessor(FrameProcessor):

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from pipecat.processors.frame_processor import FrameDirection, FrameProcessor

class AudioPreBufferProcessor(FrameProcessor):

from pipecat.processors.frame_processor import FrameDirection, FrameProcessor

class AudioPreBufferProcessor(FrameProcessor):

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py` around lines 20 - 24, Remove the extra blank line between the import block and the class definition so Black formatting passes: ensure there are exactly two consecutive newlines between the import statements (including "from pipecat.processors.frame_processor import FrameDirection, FrameProcessor") and the "class AudioPreBufferProcessor(FrameProcessor):" declaration by deleting one of the three blank lines.

coderabbitai · 2026-05-07T09:34:55Z

+**Audio flow**:
+```
+TTS audio frames
+  → BaseOutputTransport.MediaSender.handle_audio_frame()
+    → resample + buffer in bytearray
+    → chunk into 40ms pieces (4 x 10ms default)
+    → put into _audio_queue
+  → _next_frame() / without_mixer
+    → yield immediately (NO pacing)
+  → _audio_task_handler()
+    → DailyOutputTransport.write_audio_frame()
+      → CustomAudioSource.write_frames() [daily-python native]
+      → await completion callback
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block to suppress the markdownlint MD040 warning.

📝 Proposed fix

-``` +```text TTS audio frames → BaseOutputTransport.MediaSender.handle_audio_frame()

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/TTS_AUDIO_GAP_ROOT_CAUSE_ANALYSIS.md` around lines 87 - 100, The fenced code block containing the audio flow should include a language tag to silence markdownlint MD040; change the opening fence from ``` to ```text for the block that starts with "TTS audio frames" (which documents BaseOutputTransport.MediaSender.handle_audio_frame(), _next_frame(), _audio_task_handler(), DailyOutputTransport.write_audio_frame(), and CustomAudioSource.write_frames()) so the block is treated as plain text by the linter.

coderabbitai · 2026-05-07T09:34:55Z

+### Fix 2: Add Pre-Buffering Before Audio Output (Priority: MEDIUM, Effort: MEDIUM)
+
+**What**: Buffer 2-3 audio chunks (40-60ms) before starting to write to Daily SDK
+**Why**: Gives the Daily SDK a head start so small delivery delays don't starve the WebRTC play cursor. The SDK has an internal buffer that absorbs jitter once it has some runway — the problem is the initial burst where the play cursor starts immediately with zero buffer.
+**File**: New file `app/ai/voice/agents/breeze_buddy/utils/audio_pacing.py`
+**Approach**: Wrap the Daily transport's `write_audio_frame` to buffer initial chunks before streaming
+**Risk**: Adds 40-60ms initial TTS latency; acceptable for voice calls
+**Rollback**: Remove the wrapper


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix 2 description is stale — references a file and approach that don't match the actual implementation.

The doc describes an unimplemented transport-wrapper approach at app/ai/voice/agents/breeze_buddy/utils/audio_pacing.py, but the PR actually ships a pipeline FrameProcessor at app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py. Someone reading this document later will look for utils/audio_pacing.py and find nothing.

Consider updating lines 158–165 to reflect what was actually implemented.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/TTS_AUDIO_GAP_ROOT_CAUSE_ANALYSIS.md` around lines 158 - 165, Update the doc block to reflect the actual implementation: replace references to the non-existent utils/audio_pacing.py transport-wrapper with the implemented pipeline FrameProcessor in app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py (mention the FrameProcessor/AudioPreBuffer processor name and that it buffers 2–3 initial audio chunks before forwarding to the Daily SDK rather than wrapping write_audio_frame). Keep the rest of the text (why, risk, rollback) but change the "File" and "Approach" lines to point to app/ai/voice/agents/breeze_buddy/processors/audio_pre_buffer.py and describe the implemented buffering pipeline stage.

(fix): audio gaps in tts

c41dee7

Copilot AI review requested due to automatic review settings May 7, 2026 09:28

Copilot started reviewing on behalf of narsimhaReddyJuspay May 7, 2026 09:29 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix): audio gaps in tts#749

(fix): audio gaps in tts#749
narsimhaReddyJuspay wants to merge 1 commit into
juspay:releasefrom
narsimhaReddyJuspay:fix-for-tts-gaps-in-daily

narsimhaReddyJuspay commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Diagnostic Tests Run

		13 tests across 6 test classes in `tests/test_audio_gap_diagnosis.py`. All pass.

		from pipecat.processors.frame_processor import FrameDirection, FrameProcessor



		class AudioPreBufferProcessor(FrameProcessor):

Conversation

narsimhaReddyJuspay commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

narsimhaReddyJuspay commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading