fix: Azure OpenAI Realtime 2024-10-01-preview compatibility with Pipe…#758
fix: Azure OpenAI Realtime 2024-10-01-preview compatibility with Pipe…#758amreetkhuntia wants to merge 1 commit into
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR introduces ChangesAzure Realtime Legacy Compatibility Layer
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e508dfa to
c464865
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/ai/voice/llm/realtime/azure_realtime.py (1)
178-202:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject or apply
config.voice; it is currently ignored.
AzureRealtimeConfigstill acceptsvoice, and Line 192 logs it, but the constructedSessionPropertiesnever uses it. Any caller expecting a per-session voice override will silently get the deployment default instead. Either map the value into the legacy flat session payload or fail fast whenconfig.voiceis set.Minimal fail-fast option
def build_azure_realtime_llm(config: AzureRealtimeConfig) -> AzureRealtimeLegacyLLMService: + if config.voice is not None: + raise ValueError( + "Legacy Azure realtime mode doesn't support per-session voice overrides yet." + ) + session_properties = SessionProperties( type=None, audio=None, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/llm/realtime/azure_realtime.py` around lines 178 - 202, The constructor ignores config.voice while creating SessionProperties, causing silent use of deployment defaults; update the AzureRealtimeLegacyLLMService instantiation to either (1) map config.voice into the legacy flat session schema (populate the appropriate top-level voice field on SessionProperties or Settings before creating AzureRealtimeLegacyLLMService) or (2) add a fail-fast check in the factory that raises/returns an error when config.voice is set (validate AzureRealtimeConfig.voice and log/raise), referencing AzureRealtimeConfig, config.voice, SessionProperties, and AzureRealtimeLegacyLLMService.Settings to locate where to apply the change.
🧹 Nitpick comments (1)
app/ai/voice/llm/realtime/azure_realtime.py (1)
83-99: ⚡ Quick winDon't silently swallow translation failures.
Lines 97-98 currently hide any bug in the compatibility shim and just forward the raw event. That turns protocol drift into opaque downstream parse failures with no breadcrumb in logs. Please narrow the fallback to decode/type errors and log unexpected exceptions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/ai/voice/llm/realtime/azure_realtime.py` around lines 83 - 99, The _translate coroutine currently swallows all exceptions in its try/except around json.loads and renaming, which hides bugs; update the exception handling in async def _translate(self) to only catch JSONDecodeError and TypeError when parsing/inspecting the message (from json.loads and data.get), and let other exceptions propagate after logging them via logger.exception or logger.error with the unexpected exception and the raw msg; keep the existing fallback behavior of yielding the original msg on parse/type errors but ensure unexpected exceptions are logged with context (include event_type, _RENAMES, and msg) so issues in the compatibility shim are observable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/ai/voice/llm/realtime/azure_realtime.py`:
- Around line 41-45: The code imports websocket_connect from
websockets.asyncio.client but uses the wrong header parameter name for the
pinned websockets==11.0.3; update the call that invokes websocket_connect (where
it currently passes additional_headers) to use extra_headers instead, or
alternatively update the project dependency to websockets>=14.0 if you intend to
keep additional_headers; locate the call site by searching for
websocket_connect(...) or the function/method in azure_realtime.py that
establishes the websocket and replace additional_headers with extra_headers to
match websockets 11.x.
---
Outside diff comments:
In `@app/ai/voice/llm/realtime/azure_realtime.py`:
- Around line 178-202: The constructor ignores config.voice while creating
SessionProperties, causing silent use of deployment defaults; update the
AzureRealtimeLegacyLLMService instantiation to either (1) map config.voice into
the legacy flat session schema (populate the appropriate top-level voice field
on SessionProperties or Settings before creating AzureRealtimeLegacyLLMService)
or (2) add a fail-fast check in the factory that raises/returns an error when
config.voice is set (validate AzureRealtimeConfig.voice and log/raise),
referencing AzureRealtimeConfig, config.voice, SessionProperties, and
AzureRealtimeLegacyLLMService.Settings to locate where to apply the change.
---
Nitpick comments:
In `@app/ai/voice/llm/realtime/azure_realtime.py`:
- Around line 83-99: The _translate coroutine currently swallows all exceptions
in its try/except around json.loads and renaming, which hides bugs; update the
exception handling in async def _translate(self) to only catch JSONDecodeError
and TypeError when parsing/inspecting the message (from json.loads and
data.get), and let other exceptions propagate after logging them via
logger.exception or logger.error with the unexpected exception and the raw msg;
keep the existing fallback behavior of yielding the original msg on parse/type
errors but ensure unexpected exceptions are logged with context (include
event_type, _RENAMES, and msg) so issues in the compatibility shim are
observable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 897a4bd5-1162-4dd3-8278-4cf06a614e8f
📒 Files selected for processing (1)
app/ai/voice/llm/realtime/azure_realtime.py
There was a problem hiding this comment.
Pull request overview
Adds a compatibility shim so Clairvoyance can use Azure OpenAI Realtime deployments on api-version=2024-10-01-preview with Pipecat v1.1.0 (OpenAI Realtime v1 wire format), by translating schema/event differences on the wire.
Changes:
- Introduces
AzureRealtimeLegacyLLMServiceto rewrite outboundresponse.createpayload fields (includingoutput_modalities→modalitiesand forcing["audio","text"]). - Wraps the inbound websocket with
_TranslatingWebSocketto rename legacy Azure server event types to Pipecat/OpenAI v1 equivalents. - Updates the Azure realtime builder to omit session audio/type configuration to avoid legacy flat-schema incompatibilities.
12ca277 to
3061bcc
Compare
…cat v1.1.0 Introduce AzureRealtimeLegacyLLMService shim and _TranslatingWebSocket to bridge schema differences between Azure api-version=2024-10-01-preview and Pipecat v1.1.0's OpenAI Realtime v1 wire format. Outbound fixes (client → server): - Rename output_modalities → modalities in response.create events - Upgrade modalities ["audio"] → ["audio", "text"] (Azure rejects audio-only) - Strip session.type and session.audio (flat schema incompatible with nested) Inbound fixes (server → client, via _TranslatingWebSocket): - conversation.item.created → conversation.item.added - response.audio.delta → response.output_audio.delta - response.audio.done → response.output_audio.done - response.audio_transcript.delta → response.output_audio_transcript.delta - response.audio_transcript.done → response.output_audio_transcript.done - response.text.delta → response.output_text.delta - response.text.done → response.output_text.done
3061bcc to
f020daf
Compare
…cat v1.1.0
Introduce AzureRealtimeLegacyLLMService shim and _TranslatingWebSocket to bridge schema differences between Azure api-version=2024-10-01-preview and Pipecat v1.1.0's OpenAI Realtime v1 wire format.
Outbound fixes (client → server):
Inbound fixes (server → client, via _TranslatingWebSocket):
Summary by CodeRabbit