Feature/head wobbler by RemiFabre · Pull Request #1001 · pollen-robotics/reachy_mini

RemiFabre · 2026-03-30T09:06:59Z

No description provided.

Enable the robot to move its head naturally when audio is played. Audio playback pipelines are forked via a GStreamer tee to an appsink that feeds a SwayRollRT analyser producing 6-DOF movement offsets (pitch, yaw, roll, x, y, z). These offsets are composed with the current target head pose via compose_world_offset() before IK, so wobbling layers on top of any running movement. The feature is opt-in: `mini.enable_wobbling()` / `mini.disable_wobbling()`. The tee is always present in the pipeline (negligible overhead when idle). New files: - motion/speech_tapper.py: audio analysis (VAD + oscillator-based sway) - motion/head_wobbler.py: threaded wrapper with timing control - tests/unit_tests/test_head_wobbler.py: 29 unit tests Modified: - io/protocol.py: SetSpeechOffsetsCmd for SDK→daemon offset transport - daemon/backend/abstract.py: speech offsets + IK composition - media/audio_gstreamer.py: tee pipeline + enable/disable_wobbling API - media/media_manager.py: forwarding enable/disable_wobbling - reachy_mini.py: public enable/disable_wobbling API - examples/sound_play.py: --wobbling flag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extend head wobbling to GstMediaServer so it works on the wireless Reachy Mini where audio is played daemon-side. Both play_sound() and incoming WebRTC audio paths now include a tee+appsink that feeds the wobbler when enabled. The wobbler calls backend.set_speech_offsets() directly (same process, no WebSocket round-trip). New protocol command SetWobblingCmd lets clients toggle daemon-side wobbling. REST endpoints POST /media/wobbling/enable and POST /media/wobbling/disable allow WebRTC-only clients (no SDK) to control the feature via HTTP. ReachyMini.enable_wobbling() now enables both SDK-side and daemon-side wobbling in a single call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

HuggingFaceDocBuilderDev · 2026-03-30T09:09:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…s fixed on main

v1: Direct envelope, no VAD — energy directly scales oscillator amplitude v2: Multi-band energy — low/mid/high frequencies drive pitch/yaw/roll v3: Onset impulse — syllable onsets trigger random decaying movements

protocol.py: remove stray commas in AnyCommand union that broke the | chain and caused SyntaxError on import. audio_gstreamer.py: drop a duplicated copy of the wobbler helpers plus an orphaned _init_pipeline_playback shell that the merge had inserted between two identical helper blocks. Restore two main-side behaviors that were only present in the now-removed shell: - appsrc 'do-timestamp' = False - audiosink buffer-time / latency-time tuning (using the existing PLAYBACK_SINK_BUFFER_TIME_US / PLAYBACK_SINK_LATENCY_TIME_US constants), placed inside _build_audiosink_element so both the local pipeline and the WebRTC tee bin benefit. Assisted-by: Claude:claude-opus-4-7

The HeadWobbler used to run its own thread + queue and schedule movement against time.monotonic() plus a hand-tuned 0.2s constant. Replace that with a PTS-driven scheduler: - HeadWobbler.feed(pcm, sr, play_at_monotonic_ns) calls SwayRollRT to get per-hop sway dicts, then registers one GLib.timeout_add per hop on the existing pipeline GLib main loop. A generation counter snapshot lets stop()/reset() cancel pending callbacks without tracking source ids. - The wobbler appsink callback in audio_gstreamer.py and media_server.py reads buf.pts, queries pipeline latency once, and converts to a time.monotonic_ns() instant (pts + base_time + sink_latency, rebased onto the monotonic clock). Falls back to now + sink_latency when PTS is unset. Tests rewritten to mock GLib.timeout_add and assert hop count, HOP_MS spacing, generation-counter cancellation, and past-deadline drop. 32/32 pass. Assisted-by: Claude:claude-opus-4-7

- head_wobbler.py: replace inline branched imports in _load_sway_class with a top-level _TAPPER_VERSIONS dict keyed by the WOBBLER_VERSION env var, typed as dict[str, ModuleType] so attribute access on the loaded module is clean for mypy. - speech_tapper_v1/v2/v3.py: drop unused FRAME_MS / _rms_dbfs imports, sort import blocks, wrap _loudness_gain returns in float() to satisfy no-any-return, add docstrings on __init__, reset, and feed. mypy now reports 0 errors across the 116 source files; ruff is clean on src/ and tests/. Assisted-by: Claude:claude-opus-4-7

Adds /api/media/wobbling/enable and /api/media/wobbling/disable to the published OpenAPI spec. Assisted-by: Claude:claude-opus-4-7

The wobbler appsink's hard caps were propagating upstream through the playbin's tee bin and constraining the audiosink branch. On the wireless XMOS PCM (alsasink reachymini_audio_sink), this triggered an IEC958 fallback that ALSA refused to open — the wake-up sound (and all other play_sound output) silently failed. Pipeline shape change: drop the shared audioconvert+audioresample in front of the tee, add per-branch audioconvert+audioresample so each leaf negotiates its own format independently. Applied to _build_audiosink_tee_bin (used by playbin) and the per-peer WebRTC playback pipeline in _setup_remote_audio_playback. With per-branch conversion in place, the wobbler appsink can now declare channels=1 plus a fixed rate, so SwayRollRT receives canonical float32 mono PCM at a known rate. That lets us: - delete _to_float32_mono and _resample_linear (the DSP hot path was duplicating work GStreamer already does); - drop the int16 / resampling tests that exercised that path; - make sample_rate a per-instance attribute on SwayRollRT (frame / hop / samples deque maxlen all derive from it); - thread sample_rate through HeadWobbler.__init__ so the same value drives the appsink caps and the DSP — audio_gstreamer.py uses self.SAMPLE_RATE, media_server.py adds a _WOBBLER_SAMPLE_RATE constant referenced by both the caps string and HeadWobbler. ruff / mypy clean; 24/24 wobbler tests pass. Assisted-by: Claude:claude-opus-4-7

- play_sound previously built a local 'audiosink' element that was never attached to anything — the real sink comes from _build_audiosink_tee_bin() → _build_audiosink_element(). Delete the 26 unused lines. (Incidentally restores the id_audio_card=None fallback to autoaudiosink that the inline block was missing.) - _on_wobbler_sample used to walk get_parent() chains on every buffer via _find_pipeline to locate the owning pipeline (either _pipeline_playback or a playbin from play_sound) for PTS/latency queries. Replace with a cached self._pipeline_for_wobbler, assigned at the two attachment sites: top of _init_pipeline_playback and in play_sound right after playbin.set_property("audio-sink", ...). Drop _find_pipeline. Assisted-by: Claude:claude-opus-4-7

If the wobbler was active when goto_sleep is invoked, leftover speech offsets kept composing into the target pose during the goto, producing a wobbly descent into the sleep pose. Call media_server.disable_wobbling() (which stops the HeadWobbler, fires zero offsets, and cancels pending GLib timeouts via the generation counter) and defensively zero self._speech_offsets at the top of goto_sleep. Assisted-by: Claude:claude-opus-4-7

Two appsink modes for the wobbler tee, depending on the source pipeline: - playbin-based paths (GStreamerAudio.play_sound and the daemon's GstMediaServer.play_sound) use sync=True. The wobbler appsink then emits new-sample at the buffer's PTS on the pipeline clock, which is also when the audiosink outputs that buffer. A/V sync is free; play_at_ns = time.monotonic_ns(). - Live push paths (GStreamerAudio._init_pipeline_playback driven by push_audio_sample, and media_server._setup_remote_audio_playback for WebRTC ingress) use sync=False. is-live appsrc + PTS-sync appsink deadlocks under back-pressure — the appsink's bounded max-buffers fills and drops. So the appsink delivers ASAP and we schedule play_at = now + PLAYBACK_SINK_BUFFER_TIME_US (50 ms). The single _on_wobbler_sample handler branches on appsink.get_property("sync") to pick the right play_at formula. Drops the now-unused _get_sink_latency_ns + _pipeline_for_wobbler (audio_gstreamer.py) and _find_pipeline + DEFAULT_SINK_LATENCY_NS (media_server.py): with sync=True no latency math is needed, and with sync=False a fixed offset works fine because pulsesink's query_latency returns min=0 anyway. HeadWobbler.feed now clamps small negative delays to 0 (only skips hops more than one HOP_MS in the past) so sub-ms scheduling jitter doesn't drop every hop when play_at is "now". Also folds _TAPPER_VERSIONS / _load_sway_class / _ZERO_OFFSETS into the HeadWobbler class; SpeechOffsets stays module-level (imported by media_manager/audio_gstreamer/media_server). ruff / mypy clean; 24/24 wobbler tests pass. Verified on desktop (--wav + --live) and wireless (same). Assisted-by: Claude:claude-opus-4-7

New example ``examples/sound_tts.py`` that submits text to ``Qwen/Qwen3-TTS`` via ``gradio-client``, plays the returned audio on Reachy Mini, and wobbles the head in sync. Probes the audio duration via ``GstPbutils.Discoverer`` so the example works for any format the playbin decodes (wav / mp3 / …), not just wav. The "voice design" endpoint accepts a natural-language style prompt, so ``--voice-description`` is the main knob. ``--lang`` exposes the Space's 10 supported languages (``Auto`` detects). ``--wobbler-version`` lets the demo exercise v0..v3 of the speech tapper. - Adds ``gradio-client`` to the ``examples`` extra in pyproject. - Adds ``docs/source/examples/sound_tts.md`` and a toctree entry under Examples. Assisted-by: Claude:claude-opus-4-7

FabienDanieau · 2026-04-23T14:13:26Z

summary of my changes

Merged main into the branch and cleaned up the resolution (8c6bdf2f): the merge had dropped some main-side changes and duplicated a wobbler helper block in audio_gstreamer.py; fixed that plus a protocol.py syntax error in the AnyCommand union.
Refactored HeadWobbler to be PTS-driven, no thread (8ec95694): got rid of the background thread + audio_queue + MOVEMENT_LATENCY_S = 0.2 constant. Each incoming buffer now calls SwayRollRT.feed(pcm) and registers one GLib.timeout_add per hop on the existing pipeline GLib main loop. A generation counter invalidates pending timeouts cleanly on stop()/reset(). The HOP_MS spacing still comes from whichever speech-tapper module you loaded.
Simplified the speech tappers (6a4fffd2): now that the wobbler appsink caps pin the stream to mono F32LE 16 kHz and the upstream audioresample/audioconvert do the work, dropped _to_float32_mono and _resample_linear from all 4 variants. sample_rate became a per-instance attribute threaded in from the audio backend (no more module-level SR). v2's _bandpass_energy takes sample_rate explicitly.
Wireless audio fixes:
- Per-branch audioconvert + audioresample after the tee so the wobbler's mono/16 kHz caps don't drag the alsasink branch into an IEC958 fallback on XMOS — that was blocking wake-up sound entirely on wireless.
- sync=True on the wobbler appsink for playbin paths (--wav / play_sound): callback fires at the buffer's PTS on the pipeline clock = when the audiosink outputs it → A/V sync for free, no latency math.
- sync=False kept on live push paths (push_audio_sample, WebRTC ingress) because is-live + PTS-sync deadlocks under back-pressure.
Cleanups: goto_sleep now disables wobbling at entry so residual offsets don't fight the sleep pose (c125be78). Removed ~100 lines of dead play_sound scaffolding and a per-sample parent-walk replaced by a cached pipeline ref (ce84d672). Ruff + mypy clean across motion/ (e158da58). openapi.json regenerated for the new POST /api/media/wobbling/{enable,disable} endpoints (6e34964c).
TTS demo (2c66d6af): new examples/sound_tts.py that calls Qwen/Qwen3-TTS via gradio-client with a natural-language voice-style prompt, plays the result, wobbles the head in sync. Duration probed via GstPbutils.Discoverer so any decodable format works. Doc page + toctree entry added; gradio-client in the examples extra. Try with : python examples/sound_tts.py --voice-description "Speak with joyfull voice." --wobbler-version v2

Tests rewritten around the PTS-driven API (24 cases, GLib.timeout_add mocked). Validated end-to-end on desktop (pulsesink) and wireless (alsasink → XMOS), --wav + --live paths.

mypy was flagging the Qwen3-TTS demo's ``synthesize`` (returning the Any-typed gradio-client audio path) and ``probe_duration_s`` (Any from ``Gst.SECOND`` arithmetic) as leaking ``Any`` through functions declared ``str`` / ``float``. Wrap both return expressions in explicit ``str(...)`` / ``float(...)`` casts. Assisted-by: Claude:claude-opus-4-7

FabienDanieau · 2026-04-23T14:16:31Z

Let's choose one of the 4 speech taper versions for now. We can improve / make it possible to configure in another issue

Drops the v1/v2/v3 alternatives and the WOBBLER_VERSION env-var selection mechanism that switched between them. Only v0 (the original speech_tapper module) ships in this branch; the experiments remain preserved on a separate branch for reference. - Delete src/reachy_mini/motion/speech_tapper_v{1,2,3}.py. - Simplify HeadWobbler: remove _TAPPER_VERSIONS dict, _load_sway_class classmethod, _sway_cls field, version logging, and the os/types imports that were only used by them. The constructor now uses speech_tapper.SwayRollRT and speech_tapper.HOP_MS directly. - Drop --wobbler-version from examples/sound_tts.py and examples/sound_play.py (CLI flag, env-var setting, function parameter, docstring example). - Drop the --wobbler-version block from docs/source/examples/sound_tts.md. Public API surface unchanged: mini.enable_wobbling() never took a version argument. No app should need to be updated. The existing test suite (tests/unit_tests/test_head_wobbler.py) passes unchanged since it only exercises v0 and the public HeadWobbler interface.

Always use sync=True on the local wobbler appsink. The local pipeline has a deterministic clock and no network jitter, so PTS-based delivery gives correct A/V timing for both playbin (play_sound) and push (push_audio_sample) paths. Drops the sync parameter on _make_wobbler_appsink and the corresponding branch in _on_wobbler_sample — play_at_ns is just time.monotonic_ns(). media_server.py keeps its sync flag (WebRTC ingress still needs sync=False because incoming RTP timestamps don't sit on the per-peer playback pipeline's clock and PTS-sync would deadlock under jitter). Assisted-by: Claude:claude-opus-4-7

The cache short-circuit in WebRTCClient.play_sound compared by basename only — if a file with the same basename had been uploaded before, the SDK skipped the upload and asked the daemon to play the old copy. That conflated different files when the basename collided, e.g. successive gradio-client outputs that all land at /tmp/gradio/<hash>/audio.wav: every TTS run after the first played the first synthesised audio. The daemon's /api/media/sounds/upload endpoint already overwrites by filename (see media.py:138), so the cache check buys nothing and costs correctness. Drop it and always upload when the path resolves to a local file. The example examples/sound_tts.py now plays the freshly synthesised audio on every invocation. Assisted-by: Claude:claude-opus-4-7

The Qwen/Qwen3-TTS Hugging Face Space is currently broken on the provider side (CONFIG_ERROR on Client init). Replace it with ResembleAI/Chatterbox-Multilingual-TTS, which exposes a stable /generate_tts_audio endpoint, supports 23 languages, and does zero-shot voice cloning from a short reference audio sample. Knobs change accordingly: - --voice-description (free-form Qwen prompt) → --ref-audio (URL or local path; tilde-expansion handled before handing off to gradio-client.handle_file). - --lang switches from human-readable names ("Auto", "English"…) to ISO 639-1 codes (en, fr, ja…) matching the Space's enum. Doc page docs/source/examples/sound_tts.md is rewritten in the same shape, with a note on the longer ~60–90 s synthesis time. Assisted-by: Claude:claude-opus-4-7

FabienDanieau

Looks good on my side. You can even try https://huggingface.co/spaces/FabienDanieau/tts-reachymini
One know bug, when using this feature with a SDK from a remote pc it's not working so well. But it seems to be an edge use case.

WebRTC ingress on the daemon now drives a sync=True audio + wobbler pipeline that lives in a separate Gst.Pipeline from the webrtcsink-managed sender pipeline. For sync=True playback to be sane, both must share a single clock AND a single base_time; otherwise each buffer's PTS lands at a different absolute clock time in each pipeline and the audiosink drops or stalls. media_server changes: - _on_consumer_pad_added: pin the per-peer playback pipeline to _pipeline_sender's clock via use_clock(), disable auto-base-time via set_start_time(CLOCK_TIME_NONE), then PAUSED -> set_base_time -> PLAYING in that order so GStreamer cannot overwrite either. - audiosink and wobbler appsink both sync=True now that timing is coherent. Drops the sync=False fallback in _make_wobbler_appsink, its sync parameter, and the _WOBBLER_LIVE_PLAY_OFFSET_NS constant that was only used by the now-dead sync=False branch. - _on_playback_bus_message + _on_bus_message: handle LATENCY by recalculating on the right pipeline. - Promote playback_pipe to self._pipeline_playback so the bus handlers can reach it. Comment out the dot-dump in _consumer_added. Refactor that fell out of cleaning up push_audio_sample: - AudioBase now owns GAP_RESET_NS = 200 ms, the _appsrc_pts state (initialised to -1 as "no previous buffer" sentinel — keeps the type a plain int), and a single _compute_pts(num_samples, running_time_ns, next_pts_ns) helper returning the (pts, duration, next_pts) tuple. - audio_gstreamer and webrtc_client_gstreamer drop their duplicate constants and per-class helpers. push_audio_sample in both backends now has the same shape: early-return on missing appsrc, _compute_pts(...), assign pts/dts/duration, push_buffer, log on non-OK return. The webrtc one stays silent on missing-appsrc (send chain not ready yet) and the local one still warns ("call start_playing first") since each behaviour is appropriate to its caller's contract. - _playback_next_pts_ns → _appsrc_pts in audio_gstreamer for consistency with the AudioBase attribute. LATENCY recalc also added to the conv-app's webrtc_client bus handler so the receive pipeline can react to dynamic latency changes from webrtcsrc. Assisted-by: Claude:claude-opus-4-7

Every GStreamer-using class in src/reachy_mini/media duplicated a near-identical bus watch (log EOS/ERROR + return False, handle LATENCY, sometimes WARNING). Five copies drifted in small ways (some used ``msg.type``, some cached it in ``t``; some had LATENCY, some did not; phrasing of log messages varied). Adds ``handle_default_bus_message(logger, msg, pipeline)`` in gstreamer_utils.py covering EOS / ERROR / WARNING / LATENCY with uniform behaviour, and reshapes every existing handler to delegate to it. Each watch's user-data is now the owning Gst.Pipeline so the helper can call ``recalculate_latency`` on the right pipeline: - audio_base.AudioBase._on_bus_message: one-line delegate. Was duplicating the EOS/ERROR/LATENCY block. - audio_gstreamer.GStreamerAudio._on_bus_message: only stops the wobbler on EOS, then ``super()._on_bus_message(...)``. - webrtc_client_gstreamer.GstWebRTCClient._on_bus_message: only filters webrtcsrc's non-fatal internal "not-negotiated" appsrc error (return True), then ``super()._on_bus_message(...)``. Also flips the bus watch's user-data from ``self._loop`` to ``self._pipeline_record`` so LATENCY can reach the helper. - media_server.GstMediaServer._on_bus_message: one-line delegate too. Both the sender bus and the per-peer playback bus now register through this single handler, each passing its own pipeline. _on_playback_bus_message is dropped (was 14 lines duplicating the same shape, only adding peer-id prefixes that were not load-bearing). - camera_gstreamer.GStreamerCamera._on_bus_message: keeps its "log warning, but return True so transient errors do not tear the pipeline down" behaviour for ERROR, delegates everything else. ``_handle_bus_calls`` passes ``self.pipeline`` to the watch. - gstreamer_udp_camera.GStreamerUDPCamera._on_bus_message: pure delegate now that WARNING is in the helper. Watch passes ``self.pipeline``. Net result: one canonical bus handler, each subclass only owns the branch that differs from the default. Assisted-by: Claude:claude-opus-4-7

- gstreamer_utils.handle_default_bus_message: LATENCY recalcs now log at debug level. They fire often enough during normal operation that info-level was noisy. - media_server: drop two stale comments left over from the sync=False era of the WebRTC playback chain. The ``set_start_time`` line is self-explanatory, and the "Live WebRTC path: sync=False" comment was contradicted by the surrounding code (the wobbler appsink is sync=True now that both pipelines share clock + base_time). Assisted-by: Claude:claude-opus-4-7

The previous PR moved the helper from ``GStreamerAudio._compute_playback_buffer_timing`` to ``AudioBase._compute_pts``, dropped the ``sample_rate`` and ``gap_reset_ns`` parameters (now ``self.SAMPLE_RATE`` and ``self.GAP_RESET_NS``), and replaced the ``None`` sentinel with ``-1``. The three test cases in ``tests/unit_tests/test_audio_gstreamer.py`` still imported the old name and signature and were failing with AttributeError. Renames the tests to match the new helper, drops the removed positional args, and passes a SimpleNamespace stub carrying the two class constants the helper reads (the old ``cast(GStreamerAudio, object())`` trick worked only because the constants were arguments). Assisted-by: Claude:claude-opus-4-7

The first user-supplied audio buffer was being partially eaten on the receiver side — the conv-app's first spoken word came through clipped. Cause: the Opus encoder + rtpopuspay + webrtcbin send chain needs a few RTP frames before it's actually streaming; during that warm-up window the leading samples of the real audio get dropped or truncated. Push a 0.5 s buffer of zeros on the first push_audio_sample call so the warm-up consumes silence instead of real speech. Subsequent pushes go through unchanged via the existing PTS-aware path. Splits the per-buffer GstBuffer + push_buffer + return-check logic out of push_audio_sample into a private _push_buffer helper so the warm-up and the user buffer share a single push path. Assisted-by: Claude:claude-opus-4-7

FabienDanieau and others added 2 commits March 27, 2026 17:49

RemiFabre linked an issue Mar 30, 2026 that may be closed by this pull request

Transfer head wobbling on the core side #720

Open

RemiFabre assigned FabienDanieau Mar 30, 2026

RemiFabre added the enhancement New feature or request label Mar 30, 2026

RemiFabre added this to Reachy Mini Mar 30, 2026

github-project-automation Bot moved this to Backlog in Reachy Mini Mar 30, 2026

RemiFabre and others added 12 commits April 14, 2026 16:02

temporary fix to be able to test this in simulation, allegedly this i…

b7c10b0

…s fixed on main

Merge branch 'main' into feature/head-wobbler

2354dbd

chore: regenerate openapi.json for wobbling endpoints

6e34964

Adds /api/media/wobbling/enable and /api/media/wobbling/disable to the published OpenAPI spec. Assisted-by: Claude:claude-opus-4-7

RemiFabre and others added 5 commits April 29, 2026 15:01

remove buffer copy

c88ddd1

FabienDanieau marked this pull request as ready for review May 4, 2026 14:35

FabienDanieau self-requested a review May 4, 2026 14:36

FabienDanieau approved these changes May 4, 2026

View reviewed changes

FabienDanieau added 7 commits May 19, 2026 17:07

Merge branch 'main' into feature/head-wobbler

633877d

update uv.lock

b041ef6

alozowski mentioned this pull request May 21, 2026

335 head wobbler is now handled on the core side pollen-robotics/reachy_mini_conversation_app#336

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/head wobbler#1001

Feature/head wobbler#1001
RemiFabre wants to merge 27 commits into
mainfrom
feature/head-wobbler

RemiFabre commented Mar 30, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 30, 2026

Uh oh!

FabienDanieau commented Apr 23, 2026

Uh oh!

FabienDanieau commented Apr 23, 2026

Uh oh!

FabienDanieau left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RemiFabre commented Mar 30, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 30, 2026

Uh oh!

FabienDanieau commented Apr 23, 2026

Uh oh!

FabienDanieau commented Apr 23, 2026

Uh oh!

FabienDanieau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants