Add AVSynchronizer to sync audio & video playback for Avatars by dangusev · Pull Request #466 · GetStream/Vision-Agents

dangusev · 2026-04-06T11:55:25Z

What's changed

Added AVSynchronizer class to sync audio and video playback for avatars.
It comes together with two helper classes _SyncedVideoTrack and _BufferTrackingAudioTrack that work together in order to control when the video frames are to be played.
AVSynchronizer creates both video and audio tracks internally and exposes them to Avatars.
Updated Anam and LemonSlice avatar plugins to use AVSynchronizer
Reduced the audio chunk size while feeding audio to Anam avatar to 100ms

Summary by CodeRabbit

New Features
- Centralized audio–video synchronization for smoother, timed media playback and queuing.
- Avatar publishers now route media through the synchronizer for consistent delivery.
- RTC manager can proactively clear/interrupt playback when needed.
Bug Fixes
- Skip stale realtime audio and flush media on interruptions or turn changes to avoid glitches.
Tests
- Added tests covering sync timing, ordering, flush, and stop semantics.
Chores
- Example now requires LEMONSLICE_AGENT_ID environment variable.

coderabbitai · 2026-04-06T11:55:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ebab023-6acb-4c38-a044-4c0b986723ff

📥 Commits

Reviewing files that changed from the base of the PR and between 0b91522 and 1b7f236.

📒 Files selected for processing (1)

agents-core/vision_agents/core/utils/av_synchronizer.py

🚧 Files skipped from review as they are similar to previous changes (1)

agents-core/vision_agents/core/utils/av_synchronizer.py

📝 Walkthrough

Walkthrough

Adds AVSynchronizer to coordinate audio buffering and timed release of queued video frames; refactors Anam and LemonSlice avatar publishers to use it; adds RTC interruption flushing and epoch-staleness guards; updates LemonSlice example to require LEMONSLICE_AGENT_ID and send an initial response; includes tests for synchronization behavior.

Changes

Cohort / File(s)	Summary
Core Synchronization Module `agents-core/vision_agents/core/utils/av_synchronizer.py`	New `AVSynchronizer` plus `_BufferTrackingAudioTrack` and `_SyncedVideoTrack`. Tracks audio buffer depth, queues video frames with computed release timestamps, normalizes frame dimensions, exposes `video_track`, `audio_track`, and async `write_video()`, `write_audio()`, `flush()` APIs.
Anam Avatar Publisher `plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py`	Replaced direct track management with `AVSynchronizer` (`self._sync`). `publish_*_track` now returns synchronizer tracks; receiver loops call `write_video()`/`write_audio()`; added `await self._sync.flush()` before interrupts; audio resample chunk changed from `24000`→`2400`; receiver task annotations updated.
LemonSlice Avatar Publisher, RTC Manager & Example `plugins/lemonslice/vision_agents/plugins/lemonslice/lemonslice_avatar_publisher.py`, `plugins/lemonslice/vision_agents/plugins/lemonslice/lemonslice_rtc_manager.py`, `plugins/lemonslice/example/lemonslice_avatar_example.py`	Publisher switched to `AVSynchronizer` and returns its tracks; added epoch-staleness filtering for realtime audio, flush+interrupt on audio-done when interrupted, and TurnStartedEvent flush/interrupt; RTC manager added `interrupt()` to send `lk.clear_buffer` RPC when connected and lowered a log level; example requires `LEMONSLICE_AGENT_ID` and sends an initial `agent.simple_response()` during join.
Tests `tests/test_utils/test_av_synchronizer.py`	New asyncio pytest suite validating initialization, audio-driven frame delay/release, FIFO ordering, frame repetition when no newer frame, `flush()` semantics, and stop/closed behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Writer as External Writer
    participant Sync as AVSynchronizer
    participant Audio as _BufferTrackingAudioTrack
    participant Video as _SyncedVideoTrack
    participant Recv as Frame Receiver

    Writer->>Sync: write_audio(pcm)
    Sync->>Audio: append PCM (update buffered duration)

    Writer->>Sync: write_video(frame)
    Sync->>Video: enqueue frame with release_at = now + Audio.buffered

    Recv->>Video: recv()
    alt now < next.release_at
        Video->>Recv: return last released frame (repeat, update PTS)
    else
        Video->>Video: dequeue frame, advance PTS/timebase
        Video->>Recv: return released frame
    end

    Writer->>Sync: flush()
    Sync->>Audio: clear buffer
    Sync->>Video: discard pending frames

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

The blue frame is a mouth sealed with frost; I press my ear to its eyelid.
Sound piles like damp leaves in a throat that counts the beats.
I tally delayed breaths until the image uncurls, patient and thin.
A tiny ceremony: air decides, pixels obey — a slow, obedient thaw.
Hands flush the waiting, and motion, relieved, stumbles back into light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.98% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding AVSynchronizer to synchronize audio and video playback for avatars, which aligns with the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/avatars-av-sync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py (1)
213-220: ⚠️ Potential issue | 🟠 Major

Flush-before-interrupt still races with the receiver loops.

_video_receiver() and _audio_receiver() keep feeding self._sync while this block awaits flush() and interrupt(). Any in-flight frames from the interrupted turn can therefore repopulate the synchronizer right after the flush, so barge-in still leaks stale avatar audio/video. Please gate the receiver side with a turn/generation token, or re-flush after the interrupt completes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py` around
lines 213 - 220, The flush-before-interrupt races because _video_receiver() and
_audio_receiver() can push frames into self._sync while on_turn_started awaits
flush() and _session.interrupt(); fix by gating receivers with a turn token or
re-flushing after interrupt: update on_turn_started (the handler) to acquire
self._send_lock, increment/set a self._current_turn_token marker before calling
await self._sync.flush() and await self._session.interrupt(), then either
(preferred) keep the lock while setting the token and perform a second await
self._sync.flush() after interrupt to ensure no new frames remain, AND modify
_video_receiver()/_audio_receiver() to check the token (e.g., only push frames
when frame.token == self._current_turn_token) or skip pushing when token
mismatches; reference on_turn_started, _video_receiver, _audio_receiver,
self._sync, self._send_lock, and _session.interrupt().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agents-core/vision_agents/core/utils/av_synchronizer.py`:
- Around line 112-130: Add a synchronizer-level shutdown method (e.g., async def
stop(self) / close(self)) on AVSynchronizer that cleanly stops the owned tracks
by calling the underlying track teardown APIs (e.g., await
self._video_track.stop()/flush() and await self._audio_track.stop()/close() or
the appropriate async stop/close on QueuedVideoTrack and AudioStreamTrack) and
ensure any publisher close() paths call this synchronizer.stop()/close() so the
published video/audio tracks are terminated and consumers don't see a frozen
last frame; update references in the class to include this new method and call
it from the publishers' close hooks.
- Around line 61-77: recv() currently bypasses the parent pacing and always
returns immediately, so it ignores self.fps; fix it by preserving the parent's
cadence: when there is no pending frame ready to release (i.e., after checking
self._pending and possibly updating self.last_frame), if you would otherwise
return last_frame immediately you must instead await the parent's paced recv()
to honor self.fps (e.g., call await super().recv() or otherwise reuse the
QueuedVideoTrack pacing logic) and then set pts/time_base from next_timestamp;
keep using next_timestamp(), self._pending, self.last_frame, and self.fps in the
logic so the method only falls back to repeating last_frame after waiting up to
1/self.fps.

In `@plugins/lemonslice/example/lemonslice_avatar_example.py`:
- Around line 39-41: Fail fast when LEMONSLICE_AGENT_ID is missing by validating
os.getenv("LEMONSLICE_AGENT_ID") before constructing
lemonslice.LemonSliceAvatarPublisher: retrieve the value into a variable (e.g.,
agent_id), if it's falsy raise a clear RuntimeError or ValueError with a
descriptive message, and then pass that variable into
LemonSliceAvatarPublisher(agent_id=agent_id). Also add "LEMONSLICE_AGENT_ID" to
the required env var list at the top of the file so the header documents this
dependency.

---

Outside diff comments:
In `@plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py`:
- Around line 213-220: The flush-before-interrupt races because
_video_receiver() and _audio_receiver() can push frames into self._sync while
on_turn_started awaits flush() and _session.interrupt(); fix by gating receivers
with a turn token or re-flushing after interrupt: update on_turn_started (the
handler) to acquire self._send_lock, increment/set a self._current_turn_token
marker before calling await self._sync.flush() and await
self._session.interrupt(), then either (preferred) keep the lock while setting
the token and perform a second await self._sync.flush() after interrupt to
ensure no new frames remain, AND modify _video_receiver()/_audio_receiver() to
check the token (e.g., only push frames when frame.token ==
self._current_turn_token) or skip pushing when token mismatches; reference
on_turn_started, _video_receiver, _audio_receiver, self._sync, self._send_lock,
and _session.interrupt().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ccae3d86-019b-4766-a4e1-1d5d4f403d09

📥 Commits

Reviewing files that changed from the base of the PR and between b94e391 and 2bc3da4.

📒 Files selected for processing (6)

agents-core/vision_agents/core/utils/av_synchronizer.py
plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py
plugins/lemonslice/example/lemonslice_avatar_example.py
plugins/lemonslice/vision_agents/plugins/lemonslice/lemonslice_avatar_publisher.py
tests/test_utils/__init__.py
tests/test_utils/test_av_synchronizer.py

agents-core/vision_agents/core/utils/av_synchronizer.py

plugins/lemonslice/example/lemonslice_avatar_example.py

plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py

plugins/lemonslice/vision_agents/plugins/lemonslice/lemonslice_avatar_publisher.py

agents-core/vision_agents/core/utils/av_synchronizer.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py (2)

193-194: Chunk size reduction to 100ms noted.

The change from 24000 to 2400 samples (100ms at 24kHz mono) was flagged in a past review as a separate fix. Ensure this is mentioned in the PR description for the squash commit.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py` around
lines 193 - 194, The loop using pcm.resample(...).chunks(2400) reduces the audio
chunk size to 2400 samples (100ms at 24kHz mono); update the PR/squash commit
description to explicitly mention this chunk size change from the previous 24000
sample value and note that this was previously flagged for separate fixing so
reviewers can track the intentional change to 100ms in anam_avatar_publisher.py
(reference the pcm.resample(...).chunks(2400) call).

167-170: Duplicate _is_stale_epoch implementation.

This method is identical to the one in LemonSliceAvatarPublisher. Consider extracting it to a shared base class or utility in a follow-up to avoid drift.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py` around
lines 167 - 170, The _is_stale_epoch method in AnamAvatarPublisher duplicates
the implementation in LemonSliceAvatarPublisher; extract this logic into a
shared utility or base class to avoid duplication and future drift: create a new
helper (e.g., is_stale_epoch(llm, epoch) or add the method to a common base
class that both AnamAvatarPublisher and LemonSliceAvatarPublisher inherit from)
and replace the current _is_stale_epoch implementation in both classes to call
the shared helper; ensure the helper uses the same behavior (return
isinstance(llm, Realtime) and epoch != llm.epoch) and update imports/usages
accordingly.

agents-core/vision_agents/core/utils/av_synchronizer.py (2)

82-85: flush() clears video but also flushes audio—document this coupling.

The method discards pending video frames and flushes the companion audio track. This cross-track side effect might surprise callers expecting video-only behavior. The docstring is accurate, but the method name flush on a video track could be misleading.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@agents-core/vision_agents/core/utils/av_synchronizer.py` around lines 82 -
85, The flush method on the video synchronizer (async def flush) currently
clears video frames via self._pending.clear() and also flushes the companion
audio track via await self._audio_track.flush(), which is a surprising
cross-track side effect; either document this coupling in the method
name/docstring or split behavior: rename flush to flush_video (or add a
parameter like flush_audio: bool=False) and implement separate flush_audio logic
that calls self._audio_track.flush(), or keep flush as-is but update the
docstring to explicitly state that flush clears _pending and also flushes
_audio_track so callers know about the audio-side effect.

27-32: SDK internals dependency noted.

The buffered property reads _buffer and _bytes_per_sample directly from the SDK. A past review already flagged this—good that there's a plan to move this property upstream. Consider adding a brief comment here so future SDK changes get caught.

📝 Suggested comment

     `@property`
     def buffered(self) -> float:
         """Return the amount of seconds of audio pending in the buffer."""
+        # NOTE: Accesses SDK internals (_buffer, _bytes_per_sample).
+        # Update if AudioStreamTrack changes.
         return len(self._buffer) / (
             self.sample_rate * self.channels * self._bytes_per_sample
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@agents-core/vision_agents/core/utils/av_synchronizer.py` around lines 27 -
32, The buffered property directly reads SDK internals (_buffer and
_bytes_per_sample) which couples this code to the SDK; add a concise TODO
comment above the buffered property (referencing buffered, _buffer,
_bytes_per_sample, sample_rate, channels) stating that this access is temporary,
should be moved upstream into the SDK, and that any SDK change must update this
calculation accordingly (include guidance to replace direct internals with a
public API on the SDK when available).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agents-core/vision_agents/core/utils/av_synchronizer.py`:
- Around line 61-80: The recv() override currently advances timestamps and
returns last_frame immediately, so pacing does not match QueuedVideoTrack;
either implement the same rate-limiting used by QueuedVideoTrack.recv() or
update the docstring to remove the misleading claim. To fix, wrap the waiting
logic when no pending frame is ready: if _pending is empty (or when re-using
last_frame), use asyncio.wait_for or asyncio.sleep to enforce a max wait of
1/self.fps before returning (mirror QueuedVideoTrack.recv()), calling
next_timestamp() after the wait; alternatively, if you choose not to enforce
pacing, update the method docstring in _SyncedVideoTrack.recv() to state that
callers drive the frame rate and that fps does not throttle recv() here. Ensure
references to recv, next_timestamp, _pending, last_frame and fps are updated
accordingly.

---

Nitpick comments:
In `@agents-core/vision_agents/core/utils/av_synchronizer.py`:
- Around line 82-85: The flush method on the video synchronizer (async def
flush) currently clears video frames via self._pending.clear() and also flushes
the companion audio track via await self._audio_track.flush(), which is a
surprising cross-track side effect; either document this coupling in the method
name/docstring or split behavior: rename flush to flush_video (or add a
parameter like flush_audio: bool=False) and implement separate flush_audio logic
that calls self._audio_track.flush(), or keep flush as-is but update the
docstring to explicitly state that flush clears _pending and also flushes
_audio_track so callers know about the audio-side effect.
- Around line 27-32: The buffered property directly reads SDK internals (_buffer
and _bytes_per_sample) which couples this code to the SDK; add a concise TODO
comment above the buffered property (referencing buffered, _buffer,
_bytes_per_sample, sample_rate, channels) stating that this access is temporary,
should be moved upstream into the SDK, and that any SDK change must update this
calculation accordingly (include guidance to replace direct internals with a
public API on the SDK when available).

In `@plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py`:
- Around line 193-194: The loop using pcm.resample(...).chunks(2400) reduces the
audio chunk size to 2400 samples (100ms at 24kHz mono); update the PR/squash
commit description to explicitly mention this chunk size change from the
previous 24000 sample value and note that this was previously flagged for
separate fixing so reviewers can track the intentional change to 100ms in
anam_avatar_publisher.py (reference the pcm.resample(...).chunks(2400) call).
- Around line 167-170: The _is_stale_epoch method in AnamAvatarPublisher
duplicates the implementation in LemonSliceAvatarPublisher; extract this logic
into a shared utility or base class to avoid duplication and future drift:
create a new helper (e.g., is_stale_epoch(llm, epoch) or add the method to a
common base class that both AnamAvatarPublisher and LemonSliceAvatarPublisher
inherit from) and replace the current _is_stale_epoch implementation in both
classes to call the shared helper; ensure the helper uses the same behavior
(return isinstance(llm, Realtime) and epoch != llm.epoch) and update
imports/usages accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d9bc108-50fe-4525-946d-6125d2da8d4c

📥 Commits

Reviewing files that changed from the base of the PR and between bdc0eb7 and 0b91522.

📒 Files selected for processing (7)

agents-core/vision_agents/core/utils/av_synchronizer.py
plugins/anam/vision_agents/plugins/anam/anam_avatar_publisher.py
plugins/lemonslice/example/lemonslice_avatar_example.py
plugins/lemonslice/vision_agents/plugins/lemonslice/lemonslice_avatar_publisher.py
plugins/lemonslice/vision_agents/plugins/lemonslice/lemonslice_rtc_manager.py
tests/test_utils/__init__.py
tests/test_utils/test_av_synchronizer.py

🚧 Files skipped from review as they are similar to previous changes (2)

plugins/lemonslice/example/lemonslice_avatar_example.py
tests/test_utils/test_av_synchronizer.py

agents-core/vision_agents/core/utils/av_synchronizer.py

github-actions bot added agents-core plugins labels Apr 6, 2026

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

agents-core/vision_agents/core/utils/av_synchronizer.py Show resolved Hide resolved

agents-core/vision_agents/core/utils/av_synchronizer.py Show resolved Hide resolved

plugins/lemonslice/example/lemonslice_avatar_example.py Outdated Show resolved Hide resolved

aliev self-requested a review April 6, 2026 13:35

aliev requested changes Apr 6, 2026

View reviewed changes

dangusev added 10 commits April 6, 2026 21:13

Add AVSynchronizer class to sync av playback in avatars

45eb15c

Use AVSynchronizer in AnamAvatarPublisher

11d597a

Use AVSynchronizer in LemonSliceAvatarPublisher & update example

6a1bc69

_SyncedVideoTrack.recv(): respect "fps" setting

1a4742d

LemonSlice example: require LEMONSLICE_AGENT_ID to be set

f2810d2

simplify AVSynchronizer

45d2395

AnamAvatarPublisher: better barge-in support with Realtime models

7ff1725

LemonSliceAvatarPublisher: add barge-in support

cf8929b

Use get_running_loop

110bcb9

AnamAvatarPublisher: Add missing try except to _video_receiver

0b91522

dangusev force-pushed the fix/avatars-av-sync branch from bdc0eb7 to 0b91522 Compare April 6, 2026 20:50

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

agents-core/vision_agents/core/utils/av_synchronizer.py Show resolved Hide resolved

aliev approved these changes Apr 7, 2026

View reviewed changes

_SyncedVideoTrack: clarify docstring in recv()

1b7f236

dangusev merged commit 095f007 into main Apr 7, 2026
6 checks passed

dangusev deleted the fix/avatars-av-sync branch April 7, 2026 10:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVSynchronizer to sync audio & video playback for Avatars#466

Add AVSynchronizer to sync audio & video playback for Avatars#466
dangusev merged 11 commits intomainfrom
fix/avatars-av-sync

dangusev commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dangusev commented Apr 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dangusev commented Apr 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 6, 2026 •

edited

Loading