Skip to content

fix: Azure OpenAI Realtime 2024-10-01-preview compatibility with Pipe…#758

Open
amreetkhuntia wants to merge 1 commit into
juspay:releasefrom
amreetkhuntia:fix/azure-realtime-pipecat-compat
Open

fix: Azure OpenAI Realtime 2024-10-01-preview compatibility with Pipe…#758
amreetkhuntia wants to merge 1 commit into
juspay:releasefrom
amreetkhuntia:fix/azure-realtime-pipecat-compat

Conversation

@amreetkhuntia
Copy link
Copy Markdown
Contributor

@amreetkhuntia amreetkhuntia commented May 12, 2026

…cat v1.1.0

Introduce AzureRealtimeLegacyLLMService shim and _TranslatingWebSocket to bridge schema differences between Azure api-version=2024-10-01-preview and Pipecat v1.1.0's OpenAI Realtime v1 wire format.

Outbound fixes (client → server):

  • Rename output_modalities → modalities in response.create events
  • Upgrade modalities ["audio"] → ["audio", "text"] (Azure rejects audio-only)
  • Strip session.type and session.audio (flat schema incompatible with nested)

Inbound fixes (server → client, via _TranslatingWebSocket):

  • conversation.item.created → conversation.item.added
  • response.audio.delta → response.output_audio.delta
  • response.audio.done → response.output_audio.done
  • response.audio_transcript.delta → response.output_audio_transcript.delta
  • response.audio_transcript.done → response.output_audio_transcript.done
  • response.text.delta → response.output_text.delta
  • response.text.done → response.output_text.done

Summary by CodeRabbit

  • New Features
    • Added support for Azure OpenAI Realtime API version 2024-10-01-preview deployments.
    • Enhanced Azure Realtime services with automatic event translation and payload normalization for seamless integration with the latest Azure OpenAI endpoints.

Review Change Stack

Copilot AI review requested due to automatic review settings May 12, 2026 11:13
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86643576-89ea-4b63-9b85-5326f7aeb4d2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR introduces AzureRealtimeLegacyLLMService, a compatibility shim for Azure OpenAI Realtime deployments running api-version=2024-10-01-preview. The shim translates inbound event schemas and outbound response payloads to match Pipecat v1.1.0/OpenAI Realtime v1 parsing expectations, enabling support for newer Azure API versions with older framework schemas.

Changes

Azure Realtime Legacy Compatibility Layer

Layer / File(s) Summary
Legacy Compatibility Overview and Module Exports
app/ai/voice/llm/realtime/azure_realtime.py
Module docstring now documents Azure legacy compatibility for api-version=2024-10-01-preview with event/field rename mappings; imports are adjusted and __all__ is updated to export AzureRealtimeLegacyLLMService.
Inbound Message Translation Wrapper
app/ai/voice/llm/realtime/azure_realtime.py
New _TranslatingWebSocket async-iterable wrapper intercepts inbound messages, conditionally rewrites the type field using a rename map to match Pipecat expectations, and yields translated messages while delegating other operations.
Legacy Service Implementation and Connection Wrapping
app/ai/voice/llm/realtime/azure_realtime.py
New AzureRealtimeLegacyLLMService subclass overrides send_client_event to map outbound output_modalitiesmodalities and normalize ["audio"] to ["audio", "text"]; overrides _connect to wrap the websocket in _TranslatingWebSocket with error routing via push_error(...).
Builder Function Configuration
app/ai/voice/llm/realtime/azure_realtime.py
build_azure_realtime_llm now constructs AzureRealtimeLegacyLLMService with SessionProperties(type=None, audio=None) to skip nested session audio configuration, and logs legacy compat mode.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Hops through Azure's newer halls,
Where schemas dance in different calls,
A translating shim steps in with grace,
To bridge the time, the type, the place—
Now legacy and fresh align,
In harmony, both work divine! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding compatibility for Azure OpenAI Realtime 2024-10-01-preview with Pipecat v1.1.0.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/ai/voice/llm/realtime/azure_realtime.py (1)

178-202: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject or apply config.voice; it is currently ignored.

AzureRealtimeConfig still accepts voice, and Line 192 logs it, but the constructed SessionProperties never uses it. Any caller expecting a per-session voice override will silently get the deployment default instead. Either map the value into the legacy flat session payload or fail fast when config.voice is set.

Minimal fail-fast option
 def build_azure_realtime_llm(config: AzureRealtimeConfig) -> AzureRealtimeLegacyLLMService:
+    if config.voice is not None:
+        raise ValueError(
+            "Legacy Azure realtime mode doesn't support per-session voice overrides yet."
+        )
+
     session_properties = SessionProperties(
         type=None,
         audio=None,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/llm/realtime/azure_realtime.py` around lines 178 - 202, The
constructor ignores config.voice while creating SessionProperties, causing
silent use of deployment defaults; update the AzureRealtimeLegacyLLMService
instantiation to either (1) map config.voice into the legacy flat session schema
(populate the appropriate top-level voice field on SessionProperties or Settings
before creating AzureRealtimeLegacyLLMService) or (2) add a fail-fast check in
the factory that raises/returns an error when config.voice is set (validate
AzureRealtimeConfig.voice and log/raise), referencing AzureRealtimeConfig,
config.voice, SessionProperties, and AzureRealtimeLegacyLLMService.Settings to
locate where to apply the change.
🧹 Nitpick comments (1)
app/ai/voice/llm/realtime/azure_realtime.py (1)

83-99: ⚡ Quick win

Don't silently swallow translation failures.

Lines 97-98 currently hide any bug in the compatibility shim and just forward the raw event. That turns protocol drift into opaque downstream parse failures with no breadcrumb in logs. Please narrow the fallback to decode/type errors and log unexpected exceptions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/ai/voice/llm/realtime/azure_realtime.py` around lines 83 - 99, The
_translate coroutine currently swallows all exceptions in its try/except around
json.loads and renaming, which hides bugs; update the exception handling in
async def _translate(self) to only catch JSONDecodeError and TypeError when
parsing/inspecting the message (from json.loads and data.get), and let other
exceptions propagate after logging them via logger.exception or logger.error
with the unexpected exception and the raw msg; keep the existing fallback
behavior of yielding the original msg on parse/type errors but ensure unexpected
exceptions are logged with context (include event_type, _RENAMES, and msg) so
issues in the compatibility shim are observable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/ai/voice/llm/realtime/azure_realtime.py`:
- Around line 41-45: The code imports websocket_connect from
websockets.asyncio.client but uses the wrong header parameter name for the
pinned websockets==11.0.3; update the call that invokes websocket_connect (where
it currently passes additional_headers) to use extra_headers instead, or
alternatively update the project dependency to websockets>=14.0 if you intend to
keep additional_headers; locate the call site by searching for
websocket_connect(...) or the function/method in azure_realtime.py that
establishes the websocket and replace additional_headers with extra_headers to
match websockets 11.x.

---

Outside diff comments:
In `@app/ai/voice/llm/realtime/azure_realtime.py`:
- Around line 178-202: The constructor ignores config.voice while creating
SessionProperties, causing silent use of deployment defaults; update the
AzureRealtimeLegacyLLMService instantiation to either (1) map config.voice into
the legacy flat session schema (populate the appropriate top-level voice field
on SessionProperties or Settings before creating AzureRealtimeLegacyLLMService)
or (2) add a fail-fast check in the factory that raises/returns an error when
config.voice is set (validate AzureRealtimeConfig.voice and log/raise),
referencing AzureRealtimeConfig, config.voice, SessionProperties, and
AzureRealtimeLegacyLLMService.Settings to locate where to apply the change.

---

Nitpick comments:
In `@app/ai/voice/llm/realtime/azure_realtime.py`:
- Around line 83-99: The _translate coroutine currently swallows all exceptions
in its try/except around json.loads and renaming, which hides bugs; update the
exception handling in async def _translate(self) to only catch JSONDecodeError
and TypeError when parsing/inspecting the message (from json.loads and
data.get), and let other exceptions propagate after logging them via
logger.exception or logger.error with the unexpected exception and the raw msg;
keep the existing fallback behavior of yielding the original msg on parse/type
errors but ensure unexpected exceptions are logged with context (include
event_type, _RENAMES, and msg) so issues in the compatibility shim are
observable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 897a4bd5-1162-4dd3-8278-4cf06a614e8f

📥 Commits

Reviewing files that changed from the base of the PR and between 108e2fa and e508dfa.

📒 Files selected for processing (1)
  • app/ai/voice/llm/realtime/azure_realtime.py

Comment thread app/ai/voice/llm/realtime/azure_realtime.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a compatibility shim so Clairvoyance can use Azure OpenAI Realtime deployments on api-version=2024-10-01-preview with Pipecat v1.1.0 (OpenAI Realtime v1 wire format), by translating schema/event differences on the wire.

Changes:

  • Introduces AzureRealtimeLegacyLLMService to rewrite outbound response.create payload fields (including output_modalitiesmodalities and forcing ["audio","text"]).
  • Wraps the inbound websocket with _TranslatingWebSocket to rename legacy Azure server event types to Pipecat/OpenAI v1 equivalents.
  • Updates the Azure realtime builder to omit session audio/type configuration to avoid legacy flat-schema incompatibilities.

Comment thread app/ai/voice/llm/realtime/azure_realtime.py Outdated
Comment thread app/ai/voice/llm/realtime/azure_realtime.py
Comment thread app/ai/voice/llm/realtime/azure_realtime.py Outdated
Comment thread app/ai/voice/llm/realtime/azure_realtime.py Outdated
@amreetkhuntia amreetkhuntia force-pushed the fix/azure-realtime-pipecat-compat branch 3 times, most recently from 12ca277 to 3061bcc Compare May 13, 2026 06:12
…cat v1.1.0

Introduce AzureRealtimeLegacyLLMService shim and _TranslatingWebSocket
to bridge schema differences between Azure api-version=2024-10-01-preview
and Pipecat v1.1.0's OpenAI Realtime v1 wire format.

Outbound fixes (client → server):
- Rename output_modalities → modalities in response.create events
- Upgrade modalities ["audio"] → ["audio", "text"] (Azure rejects audio-only)
- Strip session.type and session.audio (flat schema incompatible with nested)

Inbound fixes (server → client, via _TranslatingWebSocket):
- conversation.item.created       → conversation.item.added
- response.audio.delta            → response.output_audio.delta
- response.audio.done             → response.output_audio.done
- response.audio_transcript.delta → response.output_audio_transcript.delta
- response.audio_transcript.done  → response.output_audio_transcript.done
- response.text.delta             → response.output_text.delta
- response.text.done              → response.output_text.done
@amreetkhuntia amreetkhuntia force-pushed the fix/azure-realtime-pipecat-compat branch from 3061bcc to f020daf Compare May 13, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants