On iOS Safari and Chrome (tested on iPhone 16, iOS 18), the first agent utterance after startCall() produces static/audio artifacts that clear exactly when the first utterance ends. All subsequent audio is clean. We have confirmed the issue is not related to microphone permission timing or AudioContext cold-start — those have been addressed. The pattern suggests an iOS audio session category issue during the first WebRTC buffer cycle. Requesting that webAudioMix: { audioContext } or equivalent iOS audio session configuration be exposed through StartCallConfig so the audio session can be pre-configured before the first utterance begins.
Environment
retell-client-js-sdk: 2.0.8
livekit-client (resolved): 2.19.2
Device: iPhone 16, iOS 18 (Safari + Chrome)
NOT reproducible on: macOS Safari, macOS Chrome, Android Chrome, Windows Chrome
What we've ruled out
Microphone permission timing — we pre-flight getUserMedia and release the probe before startCall().
AudioContext cold-start — we unlock our own context and play a same-origin silent MP3 inside the user gesture before calling startCall().
AEC warmup — we tried muting the LiveKit element and playing a loud silence track for 1.5s before the first utterance. No effect.
Mic processing constraints — we tried disabling echoCancellation, noiseSuppression, and autoGainControl on the local mic track. No effect on the static (and it caused background noise oversensitivity).
Why this looks like an audio-session category issue
The static is present only during the first agent utterance.
It clears immediately and permanently after that utterance ends.
This is consistent with iOS Safari's WebRTC audio session settling into playAndRecord mode only after the first full buffer cycle.
What we're asking for The LiveKit Room constructor accepts webAudioMix?: boolean | { audioContext: AudioContext }, which is LiveKit's documented workaround for iOS autoplay and audio routing issues. When enabled, it routes all remote audio through a shared Web Audio graph instead of elements, and lets us pass our own pre-unlocked AudioContext.
Currently, RetellWebClient constructs its internal Room with a hardcoded set of options:
new Room({
audioCaptureDefaults: {
autoGainControl: true,
echoCancellation: true,
noiseSuppression: true,
channelCount: 1,
deviceId: config.captureDeviceId,
sampleRate: config.sampleRate,
},
audioOutput: {
deviceId: config.playbackDeviceId,
},
// webAudioMix is not forwarded
})
StartCallConfig does not expose any way to pass through room-level options. We'd like one of the following:
Preferred: Add webAudioMix?: boolean | { audioContext: AudioContext } to StartCallConfig and forward it to the Room constructor.
Alternative: Add a roomOptions?: Partial< passthrough field to StartCallConfig so callers can supply their own webAudioMix, audioContext, or future LiveKit options without waiting for SDK updates.
Happy to test a beta or provide additional diagnostics.
On iOS Safari and Chrome (tested on iPhone 16, iOS 18), the first agent utterance after startCall() produces static/audio artifacts that clear exactly when the first utterance ends. All subsequent audio is clean. We have confirmed the issue is not related to microphone permission timing or AudioContext cold-start — those have been addressed. The pattern suggests an iOS audio session category issue during the first WebRTC buffer cycle. Requesting that webAudioMix: { audioContext } or equivalent iOS audio session configuration be exposed through StartCallConfig so the audio session can be pre-configured before the first utterance begins.
Environment
retell-client-js-sdk: 2.0.8
livekit-client (resolved): 2.19.2
Device: iPhone 16, iOS 18 (Safari + Chrome)
NOT reproducible on: macOS Safari, macOS Chrome, Android Chrome, Windows Chrome
What we've ruled out
Microphone permission timing — we pre-flight getUserMedia and release the probe before startCall().
AudioContext cold-start — we unlock our own context and play a same-origin silent MP3 inside the user gesture before calling startCall().
AEC warmup — we tried muting the LiveKit element and playing a loud silence track for 1.5s before the first utterance. No effect.
Mic processing constraints — we tried disabling echoCancellation, noiseSuppression, and autoGainControl on the local mic track. No effect on the static (and it caused background noise oversensitivity).
Why this looks like an audio-session category issue
The static is present only during the first agent utterance.
It clears immediately and permanently after that utterance ends.
This is consistent with iOS Safari's WebRTC audio session settling into playAndRecord mode only after the first full buffer cycle.
What we're asking for The LiveKit Room constructor accepts webAudioMix?: boolean | { audioContext: AudioContext }, which is LiveKit's documented workaround for iOS autoplay and audio routing issues. When enabled, it routes all remote audio through a shared Web Audio graph instead of elements, and lets us pass our own pre-unlocked AudioContext.
Currently, RetellWebClient constructs its internal Room with a hardcoded set of options:
new Room({
audioCaptureDefaults: {
autoGainControl: true,
echoCancellation: true,
noiseSuppression: true,
channelCount: 1,
deviceId: config.captureDeviceId,
sampleRate: config.sampleRate,
},
audioOutput: {
deviceId: config.playbackDeviceId,
},
// webAudioMix is not forwarded
})
StartCallConfig does not expose any way to pass through room-level options. We'd like one of the following:
Preferred: Add webAudioMix?: boolean | { audioContext: AudioContext } to StartCallConfig and forward it to the Room constructor.
Alternative: Add a roomOptions?: Partial< passthrough field to StartCallConfig so callers can supply their own webAudioMix, audioContext, or future LiveKit options without waiting for SDK updates.
Happy to test a beta or provide additional diagnostics.