feat(tts): add Soniox TTS provider end-to-end by swaroopvarma1 · Pull Request #748 · juspay/clairvoyance

swaroopvarma1 · 2026-05-07T09:05:35Z

Summary

Wires Soniox WebSocket TTS into the shared builder/factory pattern alongside ElevenLabs, Cartesia, and Sarvam. Templates can now select Soniox via tts_configuration.provider = "soniox".
New app/ai/voice/tts/soniox.py: thin SonioxTTSConfig + build_soniox_tts wrapping pipecat's SonioxTTSService (no subclassing — pipecat handles the WS protocol, multiplexing, keepalives), plus a small _generate_soniox_audio one-shot WS synth for greeting prep (pipecat's Soniox client is streaming-only, so the one-shot is implemented directly against the same documented WS protocol).
Adds SONIOX to TTSProvider enum and a Soniox example to the TTSConfig docstring; wires the soniox branch into get_tts_service and generate_audio; adds hardcoded soniox defaults (Adrian / tts-rt-v1-preview / en) to BB_SPEECH_PROVIDER_DEFAULTS. Reuses the existing SONIOX_API_KEY from the STT integration.

What's exposed

`TTSConfig` field	Soniox use
`voice_id`	maps to Soniox `voice` (e.g. `Adrian`, plus the v3 voices listed in `SonioxTTSSpeakerV3`)
`model`	maps to Soniox `model` (default `tts-rt-v1-preview`)
`language`	parsed via `_parse_language` to a `Language` enum, then converted by pipecat to a Soniox 2-letter code (`en`, `ml`, `hi`, ...)
`speed` / `volume` / `emotion` / `pitch`	silently ignored — Soniox doesn't accept them

Other Soniox-specific knobs (sample_rate 16000, audio_format pcm_s16le) are set to telephony-friendly defaults inside the builder; pipecat resamples downstream and convert_to_mulaw produces 8 kHz mu-law for Twilio/Plivo/Exotel.

Dynamic config

Picked up automatically by existing helpers:

BB_VOICE_DEFAULTS_SONIOX (Redis JSON, optional override of hardcoded defaults via BB_VOICE_PROVIDER_DEFAULTS)
BB_SONIOX_AGGREGATE_SENTENCES (Redis bool, via BB_AGGREGATE_SENTENCES("soniox"))

Test plan

uv run pyrefly check passes (verified locally — 0 errors)
uv run black --check . && uv run isort --check . --profile black pass
Construct a TTS service via the factory with provider=soniox and verify the resulting service is SonioxTTSService with expected voice / model / language / audio_format (verified locally)
Make an outbound call with a template that has "tts_configuration": {"provider": "soniox", "voice_id": "Adrian", "model": "tts-rt-v1-preview", "language": "en"} and confirm audio plays correctly
Test a template with a dynamic greeting (variables in static_greeting text) and verify _generate_soniox_audio returns playable audio that converts cleanly to mu-law
Try a non-English language (e.g. "language": "ml") and confirm Soniox accepts it and produces audio

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added Soniox as a supported text-to-speech provider. Users can now configure voice synthesis with customizable models, languages, and audio formats for audio generation.

Wires Soniox WebSocket TTS into the existing builder/factory pattern alongside ElevenLabs, Cartesia, and Sarvam. Templates can now select Soniox via tts_configuration.provider="soniox" with voice_id, model, and language. - New app/ai/voice/tts/soniox.py: SonioxTTSConfig + build_soniox_tts thin wrapper over pipecat's SonioxTTSService, plus _generate_soniox_audio for greeting prep (pipecat's Soniox client is streaming-only, so the one-shot is a small WS exchange against the same protocol) - Add SONIOX to TTSProvider enum and to TTSConfig docstring - Wire soniox branch in get_tts_service and generate_audio - Add hardcoded soniox defaults to BB_SPEECH_PROVIDER_DEFAULTS Reuses the existing SONIOX_API_KEY from STT. BB_VOICE_DEFAULTS_SONIOX and BB_SONIOX_AGGREGATE_SENTENCES Redis keys are picked up automatically by the existing dynamic-config helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-07T09:06:33Z

Walkthrough

Adds Soniox as a new Text-to-Speech provider integrated into the Breeze Buddy voice agent. Changes span type definitions, service implementation, public exports, wiring into the audio pipeline, and default configuration.

Changes

Soniox TTS Provider Integration

Layer / File(s)	Summary
Data Types and Configuration `app/ai/voice/agents/breeze_buddy/template/types.py`, `app/ai/voice/tts/soniox.py`	Adds `SONIOX` member to `TTSProvider` enum and defines `SonioxTTSConfig` dataclass with API key, voice, model, language, sample rate, audio format, and sentence aggregation settings. TTSConfig docstring updated with Soniox example.
Soniox Service and Audio Generation `app/ai/voice/tts/soniox.py`	Implements `build_soniox_tts` to construct a Soniox service with text aggregation mode mapping, and `_generate_soniox_audio` as an async WebSocket synthesizer that validates API key, parses language with EN fallback, streams base64-encoded PCM audio, and handles Soniox-reported errors.
Package Exports `app/ai/voice/tts/__init__.py`	Imports and exports `SonioxTTSConfig` and `build_soniox_tts` to the public TTS package API.
Breeze Buddy Integration `app/ai/voice/agents/breeze_buddy/tts/__init__.py`	Wires Soniox into `get_tts_service` (validates `SONIOX_API_KEY`, resolves aggregation and parameters, builds service) and `generate_audio` (routes to `_generate_soniox_audio` with `input_format="raw"`, converts result to mulaw).
Default Configuration `app/core/config/dynamic.py`	Adds `"soniox"` entry to `BB_SPEECH_PROVIDER_DEFAULTS` with voice, model, and language defaults.

Sequence Diagram(s)

sequenceDiagram
    participant BBClient as Breeze Buddy Client
    participant Service as get_tts_service
    participant SonioxService as SonioxTTSService
    participant Synthesis as _generate_soniox_audio
    participant WebSocket as Soniox WebSocket
    participant AudioProcessing as Audio Processing
    
    BBClient->>Service: request TTS (provider="soniox")
    Service->>Service: validate SONIOX_API_KEY
    Service->>Service: fetch aggregation settings
    Service->>SonioxService: build with config
    SonioxService-->>Service: constructed service
    Service-->>BBClient: SonioxTTSService ready
    
    BBClient->>Synthesis: generate_audio(text)
    Synthesis->>Synthesis: apply voice/model defaults
    Synthesis->>Synthesis: parse & validate language
    Synthesis->>WebSocket: send config + text JSON
    WebSocket-->>Synthesis: stream base64 audio chunks
    Synthesis->>Synthesis: decode & concatenate PCM
    WebSocket-->>Synthesis: terminated signal
    Synthesis-->>BBClient: pcm_s16le bytes
    
    BBClient->>AudioProcessing: convert_to_mulaw(pcm)
    AudioProcessing-->>BBClient: mulaw audio

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

juspay/clairvoyance#461: Adds another new TTS provider with similar modifications to TTS initialization and package exports.
juspay/clairvoyance#559: Modifies the same TTSProvider enum in template/types.py to add other providers.
juspay/clairvoyance#421: Foundational Breeze Buddy TTS integration that this PR extends with Soniox provider support.

Poem

🐰 A new voice joins the choir so sweet,
Soniox brings beats to Breeze's fleet,
From WebSocket streams to mulaw's call,
The rabbit hops through types and all—
Configuration, exports, and more,
Audio magic through every door! 🎵

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main objective—adding Soniox as a new TTS provider end-to-end across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/soniox-tts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds Soniox as a first-class TTS provider in the Breeze Buddy TTS factory/builder flow, enabling templates to select tts_configuration.provider = "soniox" and supporting both streaming TTS (via pipecat) and one-shot greeting synthesis (direct WebSocket protocol).

Changes:

Added Soniox defaults to dynamic per-provider TTS defaults (BB_SPEECH_PROVIDER_DEFAULTS).
Introduced app/ai/voice/tts/soniox.py with SonioxTTSConfig, build_soniox_tts, and a one-shot _generate_soniox_audio WebSocket synth helper.
Wired soniox into Breeze Buddy TTS service construction and greeting audio generation; extended TTSProvider / TTSConfig docs accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
app/core/config/dynamic.py	Adds Soniox hardcoded provider defaults for Redis-override merge.
app/ai/voice/tts/soniox.py	Implements Soniox TTS builder + one-shot WebSocket greeting synthesis helper.
app/ai/voice/tts/init.py	Exposes Soniox builder/config via shared TTS package exports.
app/ai/voice/agents/breeze_buddy/tts/init.py	Adds `soniox` branches to `get_tts_service` and `generate_audio`.
app/ai/voice/agents/breeze_buddy/template/types.py	Extends `TTSProvider` enum and documents Soniox example config.

+    language: Optional[str] = None,
+    sample_rate: int = 16000,
+) -> bytes:
+    """One-shot synth via Soniox WebSocket for greeting prep.
+
+    Opens a single WebSocket, sends config + text + ``text_end:true``, collects
+    base64-encoded audio chunks until ``terminated``, and returns the
+    concatenated PCM bytes.
+
+    Returns 16-bit little-endian PCM mono at the requested ``sample_rate``,
+    matching ``convert_to_mulaw`` expectations for downstream telephony use.


coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

app/ai/voice/tts/soniox.py (1)

28-32: 💤 Low value

__all__ is not sorted (Ruff RUF022).

♻️ Proposed fix

 __all__ = [
     "SonioxTTSConfig",
+    "_generate_soniox_audio",
     "build_soniox_tts",
-    "_generate_soniox_audio",
 ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/tts/soniox.py` around lines 28 - 32, The __all__ export list in
soniox.py is not alphabetically sorted; update the __all__ list to be sorted
lexicographically (e.g., "SonioxTTSConfig", "build_soniox_tts",
"_generate_soniox_audio" -> order them alphabetically) so it satisfies Ruff
RUF022; edit the __all__ variable declaration to reorder the strings accordingly
while leaving the exact symbol names unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/agents/breeze_buddy/tts/__init__.py`:
- Around line 262-269: The streaming vs one-shot language handling is
inconsistent: update the generate_audio call site to pre-parse resolved.language
using _parse_language (the same logic get_tts_service uses) before passing it to
_generate_soniox_audio, and change _generate_soniox_audio's language parameter
type from Optional[str] to Optional[Language] so it skips internal value-based
parsing; reference get_tts_service, _parse_language, generate_audio,
_generate_soniox_audio, resolved.language and the Language enum when making the
change.

In `@app/ai/voice/tts/soniox.py`:
- Around line 128-130: The current logger.info in _generate_soniox_audio exposes
substituted template text (PII); remove logging of text[:50] and instead log
only non-PII metadata—e.g., sample_rate, voice/id, text length, and a redacted
or hashed fingerprint if you need traceability—and update the logger.info call
in _generate_soniox_audio to output those safe fields only so no
customer-sensitive content is written to logs.
- Around line 133-153: The WebSocket receive loop opened with
websocket_connect(SONIOX_TTS_WS_URL) has no overall receive timeout; wrap the
receive/processing block (the async for raw in ws: loop that decodes messages,
checks error_code, collects audio_chunks, and breaks on msg.get("terminated"))
in an asyncio.timeout(...) context (e.g., configurable seconds) so a silent
Soniox hang raises asyncio.TimeoutError; on timeout cancel/close the ws and
raise an informative exception so callers know the TTS request failed instead of
hanging indefinitely. Ensure the timeout is applied after sending config_msg and
text_msg and that you still handle JSONDecodeError and existing Soniox
error_code logic inside the timeout.

---

Nitpick comments:
In `@app/ai/voice/tts/soniox.py`:
- Around line 28-32: The __all__ export list in soniox.py is not alphabetically
sorted; update the __all__ list to be sorted lexicographically (e.g.,
"SonioxTTSConfig", "build_soniox_tts", "_generate_soniox_audio" -> order them
alphabetically) so it satisfies Ruff RUF022; edit the __all__ variable
declaration to reorder the strings accordingly while leaving the exact symbol
names unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0ba975a-4f8d-4335-a31a-88b0b7a89768

📥 Commits

Reviewing files that changed from the base of the PR and between a40123c and f8f744d.

📒 Files selected for processing (5)

app/ai/voice/agents/breeze_buddy/template/types.py
app/ai/voice/agents/breeze_buddy/tts/__init__.py
app/ai/voice/tts/__init__.py
app/ai/voice/tts/soniox.py
app/core/config/dynamic.py

coderabbitai · 2026-05-07T09:14:04Z

+    elif provider == "soniox":
+        audio_data = await _generate_soniox_audio(
+            text=text,
+            voice=resolved.voice_id,
+            model=resolved.model,
+            language=resolved.language,
+        )
+        input_format = "raw"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Language lookup inconsistency between streaming and one-shot paths.

get_tts_service uses _parse_language (key-based: Language[code.upper().replace("-", "_")]) for robustness, but generate_audio forwards resolved.language as a raw string to _generate_soniox_audio which applies value-based Language(language). These resolve identically for lowercase BCP 47 codes ("en", "hi"), but diverge for uppercase inputs ("EN", "EN_IN"): the value-based path silently falls back to Language.EN, while the streaming path would correctly map to the intended enum member.

🛠️ Proposed fix — pre-parse with _parse_language before forwarding

- elif provider == "soniox": - audio_data = await _generate_soniox_audio( - text=text, - voice=resolved.voice_id, - model=resolved.model, - language=resolved.language, - ) - input_format = "raw" + elif provider == "soniox": + audio_data = await _generate_soniox_audio( + text=text, + voice=resolved.voice_id, + model=resolved.model, + language=_parse_language(resolved.language, Language.EN), + ) + input_format = "raw"

This also requires updating _generate_soniox_audio's language parameter type from Optional[str] to Optional[Language] to skip re-parsing internally.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/agents/breeze_buddy/tts/__init__.py` around lines 262 - 269, The streaming vs one-shot language handling is inconsistent: update the generate_audio call site to pre-parse resolved.language using _parse_language (the same logic get_tts_service uses) before passing it to _generate_soniox_audio, and change _generate_soniox_audio's language parameter type from Optional[str] to Optional[Language] so it skips internal value-based parsing; reference get_tts_service, _parse_language, generate_audio, _generate_soniox_audio, resolved.language and the Language enum when making the change.

coderabbitai · 2026-05-07T09:14:04Z

+    logger.info(
+        f"Synthesizing greeting with Soniox (pcm_s16le {sample_rate}): {text[:50]}..."
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

PII exposure risk: greeting text logged verbatim.

By the time _generate_soniox_audio is called, template variables (e.g., {{customer_name}}) are already substituted, so text[:50] can contain customer names. Per project guidelines, logging sensitive data is a major compliance risk.

🛡️ Proposed fix — log metadata only

- logger.info( - f"Synthesizing greeting with Soniox (pcm_s16le {sample_rate}): {text[:50]}..." - ) + logger.info( + f"Synthesizing greeting with Soniox (pcm_s16le {sample_rate}), " + f"text_length={len(text)} chars" + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/tts/soniox.py` around lines 128 - 130, The current logger.info in _generate_soniox_audio exposes substituted template text (PII); remove logging of text[:50] and instead log only non-PII metadata—e.g., sample_rate, voice/id, text length, and a redacted or hashed fingerprint if you need traceability—and update the logger.info call in _generate_soniox_audio to output those safe fields only so no customer-sensitive content is written to logs.

coderabbitai · 2026-05-07T09:14:04Z

+    async with websocket_connect(SONIOX_TTS_WS_URL) as ws:
+        await ws.send(json.dumps(config_msg))
+        await ws.send(json.dumps(text_msg))
+
+        async for raw in ws:
+            try:
+                msg = json.loads(raw)
+            except json.JSONDecodeError:
+                continue
+
+            error_code = msg.get("error_code")
+            if error_code is not None:
+                error_message = msg.get("error_message", "")
+                raise Exception(f"Soniox TTS error {error_code}: {error_message}")
+
+            audio_b64 = msg.get("audio")
+            if audio_b64:
+                audio_chunks.append(base64.b64decode(audio_b64))
+
+            if msg.get("terminated"):
+                break


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

No overall timeout on the WebSocket receive loop.

open_timeout=10 handles connection establishment, but once connected, async for raw in ws: blocks until either terminated=True arrives or the keepalive mechanism fires (~40 s at the default ping_interval=20 + ping_timeout=20). A silent Soniox-side hang stalls greeting preparation — and therefore call startup — for up to 40 seconds.

The websockets library itself recommends asyncio.timeout() (Python ≥ 3.11) for per-receive timeouts, and the project requires Python 3.11+.

⏱️ Proposed fix — add asyncio.timeout around the WS block

+import asyncio ... async def _generate_soniox_audio( text: str, voice: Optional[str] = None, model: Optional[str] = None, language: Optional[str] = None, sample_rate: int = 16000, + timeout_secs: float = 30.0, ) -> bytes: ... - async with websocket_connect(SONIOX_TTS_WS_URL) as ws: - await ws.send(json.dumps(config_msg)) - await ws.send(json.dumps(text_msg)) - - async for raw in ws: - ... + try: + async with asyncio.timeout(timeout_secs): + async with websocket_connect(SONIOX_TTS_WS_URL) as ws: + await ws.send(json.dumps(config_msg)) + await ws.send(json.dumps(text_msg)) + + async for raw in ws: + ... + except TimeoutError: + raise Exception( + f"Soniox TTS timed out after {timeout_secs}s waiting for audio" + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/tts/soniox.py` around lines 133 - 153, The WebSocket receive loop opened with websocket_connect(SONIOX_TTS_WS_URL) has no overall receive timeout; wrap the receive/processing block (the async for raw in ws: loop that decodes messages, checks error_code, collects audio_chunks, and breaks on msg.get("terminated")) in an asyncio.timeout(...) context (e.g., configurable seconds) so a silent Soniox hang raises asyncio.TimeoutError; on timeout cancel/close the ws and raise an informative exception so callers know the TTS request failed instead of hanging indefinitely. Ensure the timeout is applied after sending config_msg and text_msg and that you still handle JSONDecodeError and existing Soniox error_code logic inside the timeout.

Copilot AI review requested due to automatic review settings May 7, 2026 09:05

Copilot started reviewing on behalf of swaroopvarma1 May 7, 2026 09:06 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): add Soniox TTS provider end-to-end#748

feat(tts): add Soniox TTS provider end-to-end#748
swaroopvarma1 wants to merge 1 commit into
releasefrom
feat/soniox-tts

swaroopvarma1 commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

swaroopvarma1 commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's exposed

Dynamic config

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swaroopvarma1 commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading