Feature/head wobbler#1001
Open
RemiFabre wants to merge 27 commits into
Open
Conversation
Enable the robot to move its head naturally when audio is played. Audio playback pipelines are forked via a GStreamer tee to an appsink that feeds a SwayRollRT analyser producing 6-DOF movement offsets (pitch, yaw, roll, x, y, z). These offsets are composed with the current target head pose via compose_world_offset() before IK, so wobbling layers on top of any running movement. The feature is opt-in: `mini.enable_wobbling()` / `mini.disable_wobbling()`. The tee is always present in the pipeline (negligible overhead when idle). New files: - motion/speech_tapper.py: audio analysis (VAD + oscillator-based sway) - motion/head_wobbler.py: threaded wrapper with timing control - tests/unit_tests/test_head_wobbler.py: 29 unit tests Modified: - io/protocol.py: SetSpeechOffsetsCmd for SDK→daemon offset transport - daemon/backend/abstract.py: speech offsets + IK composition - media/audio_gstreamer.py: tee pipeline + enable/disable_wobbling API - media/media_manager.py: forwarding enable/disable_wobbling - reachy_mini.py: public enable/disable_wobbling API - examples/sound_play.py: --wobbling flag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend head wobbling to GstMediaServer so it works on the wireless Reachy Mini where audio is played daemon-side. Both play_sound() and incoming WebRTC audio paths now include a tee+appsink that feeds the wobbler when enabled. The wobbler calls backend.set_speech_offsets() directly (same process, no WebSocket round-trip). New protocol command SetWobblingCmd lets clients toggle daemon-side wobbling. REST endpoints POST /media/wobbling/enable and POST /media/wobbling/disable allow WebRTC-only clients (no SDK) to control the feature via HTTP. ReachyMini.enable_wobbling() now enables both SDK-side and daemon-side wobbling in a single call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
v1: Direct envelope, no VAD — energy directly scales oscillator amplitude v2: Multi-band energy — low/mid/high frequencies drive pitch/yaw/roll v3: Onset impulse — syllable onsets trigger random decaying movements
protocol.py: remove stray commas in AnyCommand union that broke
the | chain and caused SyntaxError on import.
audio_gstreamer.py: drop a duplicated copy of the wobbler helpers
plus an orphaned _init_pipeline_playback shell that the merge had
inserted between two identical helper blocks. Restore two main-side
behaviors that were only present in the now-removed shell:
- appsrc 'do-timestamp' = False
- audiosink buffer-time / latency-time tuning (using the existing
PLAYBACK_SINK_BUFFER_TIME_US / PLAYBACK_SINK_LATENCY_TIME_US
constants), placed inside _build_audiosink_element so both the
local pipeline and the WebRTC tee bin benefit.
Assisted-by: Claude:claude-opus-4-7
The HeadWobbler used to run its own thread + queue and schedule movement against time.monotonic() plus a hand-tuned 0.2s constant. Replace that with a PTS-driven scheduler: - HeadWobbler.feed(pcm, sr, play_at_monotonic_ns) calls SwayRollRT to get per-hop sway dicts, then registers one GLib.timeout_add per hop on the existing pipeline GLib main loop. A generation counter snapshot lets stop()/reset() cancel pending callbacks without tracking source ids. - The wobbler appsink callback in audio_gstreamer.py and media_server.py reads buf.pts, queries pipeline latency once, and converts to a time.monotonic_ns() instant (pts + base_time + sink_latency, rebased onto the monotonic clock). Falls back to now + sink_latency when PTS is unset. Tests rewritten to mock GLib.timeout_add and assert hop count, HOP_MS spacing, generation-counter cancellation, and past-deadline drop. 32/32 pass. Assisted-by: Claude:claude-opus-4-7
- head_wobbler.py: replace inline branched imports in _load_sway_class with a top-level _TAPPER_VERSIONS dict keyed by the WOBBLER_VERSION env var, typed as dict[str, ModuleType] so attribute access on the loaded module is clean for mypy. - speech_tapper_v1/v2/v3.py: drop unused FRAME_MS / _rms_dbfs imports, sort import blocks, wrap _loudness_gain returns in float() to satisfy no-any-return, add docstrings on __init__, reset, and feed. mypy now reports 0 errors across the 116 source files; ruff is clean on src/ and tests/. Assisted-by: Claude:claude-opus-4-7
Adds /api/media/wobbling/enable and /api/media/wobbling/disable to the published OpenAPI spec. Assisted-by: Claude:claude-opus-4-7
The wobbler appsink's hard caps were propagating upstream through the playbin's tee bin and constraining the audiosink branch. On the wireless XMOS PCM (alsasink reachymini_audio_sink), this triggered an IEC958 fallback that ALSA refused to open — the wake-up sound (and all other play_sound output) silently failed. Pipeline shape change: drop the shared audioconvert+audioresample in front of the tee, add per-branch audioconvert+audioresample so each leaf negotiates its own format independently. Applied to _build_audiosink_tee_bin (used by playbin) and the per-peer WebRTC playback pipeline in _setup_remote_audio_playback. With per-branch conversion in place, the wobbler appsink can now declare channels=1 plus a fixed rate, so SwayRollRT receives canonical float32 mono PCM at a known rate. That lets us: - delete _to_float32_mono and _resample_linear (the DSP hot path was duplicating work GStreamer already does); - drop the int16 / resampling tests that exercised that path; - make sample_rate a per-instance attribute on SwayRollRT (frame / hop / samples deque maxlen all derive from it); - thread sample_rate through HeadWobbler.__init__ so the same value drives the appsink caps and the DSP — audio_gstreamer.py uses self.SAMPLE_RATE, media_server.py adds a _WOBBLER_SAMPLE_RATE constant referenced by both the caps string and HeadWobbler. ruff / mypy clean; 24/24 wobbler tests pass. Assisted-by: Claude:claude-opus-4-7
- play_sound previously built a local 'audiosink' element that was
never attached to anything — the real sink comes from
_build_audiosink_tee_bin() → _build_audiosink_element(). Delete
the 26 unused lines. (Incidentally restores the id_audio_card=None
fallback to autoaudiosink that the inline block was missing.)
- _on_wobbler_sample used to walk get_parent() chains on every
buffer via _find_pipeline to locate the owning pipeline (either
_pipeline_playback or a playbin from play_sound) for PTS/latency
queries. Replace with a cached self._pipeline_for_wobbler,
assigned at the two attachment sites: top of
_init_pipeline_playback and in play_sound right after
playbin.set_property("audio-sink", ...). Drop _find_pipeline.
Assisted-by: Claude:claude-opus-4-7
If the wobbler was active when goto_sleep is invoked, leftover speech offsets kept composing into the target pose during the goto, producing a wobbly descent into the sleep pose. Call media_server.disable_wobbling() (which stops the HeadWobbler, fires zero offsets, and cancels pending GLib timeouts via the generation counter) and defensively zero self._speech_offsets at the top of goto_sleep. Assisted-by: Claude:claude-opus-4-7
Two appsink modes for the wobbler tee, depending on the source
pipeline:
- playbin-based paths (GStreamerAudio.play_sound and the daemon's
GstMediaServer.play_sound) use sync=True. The wobbler appsink
then emits new-sample at the buffer's PTS on the pipeline clock,
which is also when the audiosink outputs that buffer. A/V sync
is free; play_at_ns = time.monotonic_ns().
- Live push paths (GStreamerAudio._init_pipeline_playback driven by
push_audio_sample, and media_server._setup_remote_audio_playback
for WebRTC ingress) use sync=False. is-live appsrc + PTS-sync
appsink deadlocks under back-pressure — the appsink's bounded
max-buffers fills and drops. So the appsink delivers ASAP and we
schedule play_at = now + PLAYBACK_SINK_BUFFER_TIME_US (50 ms).
The single _on_wobbler_sample handler branches on
appsink.get_property("sync") to pick the right play_at formula.
Drops the now-unused _get_sink_latency_ns + _pipeline_for_wobbler
(audio_gstreamer.py) and _find_pipeline + DEFAULT_SINK_LATENCY_NS
(media_server.py): with sync=True no latency math is needed, and
with sync=False a fixed offset works fine because pulsesink's
query_latency returns min=0 anyway.
HeadWobbler.feed now clamps small negative delays to 0 (only skips
hops more than one HOP_MS in the past) so sub-ms scheduling jitter
doesn't drop every hop when play_at is "now".
Also folds _TAPPER_VERSIONS / _load_sway_class / _ZERO_OFFSETS into
the HeadWobbler class; SpeechOffsets stays module-level (imported
by media_manager/audio_gstreamer/media_server).
ruff / mypy clean; 24/24 wobbler tests pass. Verified on desktop
(--wav + --live) and wireless (same).
Assisted-by: Claude:claude-opus-4-7
New example ``examples/sound_tts.py`` that submits text to ``Qwen/Qwen3-TTS`` via ``gradio-client``, plays the returned audio on Reachy Mini, and wobbles the head in sync. Probes the audio duration via ``GstPbutils.Discoverer`` so the example works for any format the playbin decodes (wav / mp3 / …), not just wav. The "voice design" endpoint accepts a natural-language style prompt, so ``--voice-description`` is the main knob. ``--lang`` exposes the Space's 10 supported languages (``Auto`` detects). ``--wobbler-version`` lets the demo exercise v0..v3 of the speech tapper. - Adds ``gradio-client`` to the ``examples`` extra in pyproject. - Adds ``docs/source/examples/sound_tts.md`` and a toctree entry under Examples. Assisted-by: Claude:claude-opus-4-7
Contributor
|
summary of my changes
Tests rewritten around the PTS-driven API (24 cases, |
mypy was flagging the Qwen3-TTS demo's ``synthesize`` (returning the Any-typed gradio-client audio path) and ``probe_duration_s`` (Any from ``Gst.SECOND`` arithmetic) as leaking ``Any`` through functions declared ``str`` / ``float``. Wrap both return expressions in explicit ``str(...)`` / ``float(...)`` casts. Assisted-by: Claude:claude-opus-4-7
Contributor
|
Let's choose one of the 4 speech taper versions for now. We can improve / make it possible to configure in another issue |
Drops the v1/v2/v3 alternatives and the WOBBLER_VERSION env-var
selection mechanism that switched between them. Only v0 (the original
speech_tapper module) ships in this branch; the experiments remain
preserved on a separate branch for reference.
- Delete src/reachy_mini/motion/speech_tapper_v{1,2,3}.py.
- Simplify HeadWobbler: remove _TAPPER_VERSIONS dict,
_load_sway_class classmethod, _sway_cls field, version logging,
and the os/types imports that were only used by them. The
constructor now uses speech_tapper.SwayRollRT and
speech_tapper.HOP_MS directly.
- Drop --wobbler-version from examples/sound_tts.py and
examples/sound_play.py (CLI flag, env-var setting, function
parameter, docstring example).
- Drop the --wobbler-version block from
docs/source/examples/sound_tts.md.
Public API surface unchanged: mini.enable_wobbling() never took a
version argument. No app should need to be updated. The existing
test suite (tests/unit_tests/test_head_wobbler.py) passes unchanged
since it only exercises v0 and the public HeadWobbler interface.
Always use sync=True on the local wobbler appsink. The local pipeline has a deterministic clock and no network jitter, so PTS-based delivery gives correct A/V timing for both playbin (play_sound) and push (push_audio_sample) paths. Drops the sync parameter on _make_wobbler_appsink and the corresponding branch in _on_wobbler_sample — play_at_ns is just time.monotonic_ns(). media_server.py keeps its sync flag (WebRTC ingress still needs sync=False because incoming RTP timestamps don't sit on the per-peer playback pipeline's clock and PTS-sync would deadlock under jitter). Assisted-by: Claude:claude-opus-4-7
The cache short-circuit in WebRTCClient.play_sound compared by basename only — if a file with the same basename had been uploaded before, the SDK skipped the upload and asked the daemon to play the old copy. That conflated different files when the basename collided, e.g. successive gradio-client outputs that all land at /tmp/gradio/<hash>/audio.wav: every TTS run after the first played the first synthesised audio. The daemon's /api/media/sounds/upload endpoint already overwrites by filename (see media.py:138), so the cache check buys nothing and costs correctness. Drop it and always upload when the path resolves to a local file. The example examples/sound_tts.py now plays the freshly synthesised audio on every invocation. Assisted-by: Claude:claude-opus-4-7
The Qwen/Qwen3-TTS Hugging Face Space is currently broken on the
provider side (CONFIG_ERROR on Client init). Replace it with
ResembleAI/Chatterbox-Multilingual-TTS, which exposes a stable
/generate_tts_audio endpoint, supports 23 languages, and does
zero-shot voice cloning from a short reference audio sample.
Knobs change accordingly:
- --voice-description (free-form Qwen prompt) → --ref-audio (URL
or local path; tilde-expansion handled before handing off to
gradio-client.handle_file).
- --lang switches from human-readable names ("Auto", "English"…)
to ISO 639-1 codes (en, fr, ja…) matching the Space's enum.
Doc page docs/source/examples/sound_tts.md is rewritten in the
same shape, with a note on the longer ~60–90 s synthesis time.
Assisted-by: Claude:claude-opus-4-7
FabienDanieau
approved these changes
May 4, 2026
Contributor
FabienDanieau
left a comment
There was a problem hiding this comment.
Looks good on my side. You can even try https://huggingface.co/spaces/FabienDanieau/tts-reachymini
One know bug, when using this feature with a SDK from a remote pc it's not working so well. But it seems to be an edge use case.
WebRTC ingress on the daemon now drives a sync=True audio + wobbler
pipeline that lives in a separate Gst.Pipeline from the
webrtcsink-managed sender pipeline. For sync=True playback to be
sane, both must share a single clock AND a single base_time;
otherwise each buffer's PTS lands at a different absolute clock
time in each pipeline and the audiosink drops or stalls.
media_server changes:
- _on_consumer_pad_added: pin the per-peer playback pipeline to
_pipeline_sender's clock via use_clock(), disable auto-base-time
via set_start_time(CLOCK_TIME_NONE), then PAUSED -> set_base_time
-> PLAYING in that order so GStreamer cannot overwrite either.
- audiosink and wobbler appsink both sync=True now that timing is
coherent. Drops the sync=False fallback in _make_wobbler_appsink,
its sync parameter, and the _WOBBLER_LIVE_PLAY_OFFSET_NS constant
that was only used by the now-dead sync=False branch.
- _on_playback_bus_message + _on_bus_message: handle LATENCY by
recalculating on the right pipeline.
- Promote playback_pipe to self._pipeline_playback so the bus
handlers can reach it. Comment out the dot-dump in
_consumer_added.
Refactor that fell out of cleaning up push_audio_sample:
- AudioBase now owns GAP_RESET_NS = 200 ms, the _appsrc_pts state
(initialised to -1 as "no previous buffer" sentinel — keeps the
type a plain int), and a single _compute_pts(num_samples,
running_time_ns, next_pts_ns) helper returning the (pts,
duration, next_pts) tuple.
- audio_gstreamer and webrtc_client_gstreamer drop their
duplicate constants and per-class helpers. push_audio_sample in
both backends now has the same shape: early-return on missing
appsrc, _compute_pts(...), assign pts/dts/duration, push_buffer,
log on non-OK return. The webrtc one stays silent on
missing-appsrc (send chain not ready yet) and the local one
still warns ("call start_playing first") since each behaviour
is appropriate to its caller's contract.
- _playback_next_pts_ns → _appsrc_pts in audio_gstreamer for
consistency with the AudioBase attribute.
LATENCY recalc also added to the conv-app's webrtc_client bus
handler so the receive pipeline can react to dynamic latency
changes from webrtcsrc.
Assisted-by: Claude:claude-opus-4-7
Every GStreamer-using class in src/reachy_mini/media duplicated a
near-identical bus watch (log EOS/ERROR + return False, handle
LATENCY, sometimes WARNING). Five copies drifted in small ways
(some used ``msg.type``, some cached it in ``t``; some had LATENCY,
some did not; phrasing of log messages varied).
Adds ``handle_default_bus_message(logger, msg, pipeline)`` in
gstreamer_utils.py covering EOS / ERROR / WARNING / LATENCY with
uniform behaviour, and reshapes every existing handler to delegate
to it. Each watch's user-data is now the owning Gst.Pipeline so the
helper can call ``recalculate_latency`` on the right pipeline:
- audio_base.AudioBase._on_bus_message: one-line delegate. Was
duplicating the EOS/ERROR/LATENCY block.
- audio_gstreamer.GStreamerAudio._on_bus_message: only stops the
wobbler on EOS, then ``super()._on_bus_message(...)``.
- webrtc_client_gstreamer.GstWebRTCClient._on_bus_message: only
filters webrtcsrc's non-fatal internal "not-negotiated" appsrc
error (return True), then ``super()._on_bus_message(...)``.
Also flips the bus watch's user-data from ``self._loop`` to
``self._pipeline_record`` so LATENCY can reach the helper.
- media_server.GstMediaServer._on_bus_message: one-line delegate
too. Both the sender bus and the per-peer playback bus now
register through this single handler, each passing its own
pipeline. _on_playback_bus_message is dropped (was 14 lines
duplicating the same shape, only adding peer-id prefixes that
were not load-bearing).
- camera_gstreamer.GStreamerCamera._on_bus_message: keeps its
"log warning, but return True so transient errors do not tear
the pipeline down" behaviour for ERROR, delegates everything
else. ``_handle_bus_calls`` passes ``self.pipeline`` to the
watch.
- gstreamer_udp_camera.GStreamerUDPCamera._on_bus_message: pure
delegate now that WARNING is in the helper. Watch passes
``self.pipeline``.
Net result: one canonical bus handler, each subclass only owns the
branch that differs from the default.
Assisted-by: Claude:claude-opus-4-7
- gstreamer_utils.handle_default_bus_message: LATENCY recalcs now
log at debug level. They fire often enough during normal
operation that info-level was noisy.
- media_server: drop two stale comments left over from the
sync=False era of the WebRTC playback chain. The
``set_start_time`` line is self-explanatory, and the
"Live WebRTC path: sync=False" comment was contradicted by the
surrounding code (the wobbler appsink is sync=True now that
both pipelines share clock + base_time).
Assisted-by: Claude:claude-opus-4-7
The previous PR moved the helper from ``GStreamerAudio._compute_playback_buffer_timing`` to ``AudioBase._compute_pts``, dropped the ``sample_rate`` and ``gap_reset_ns`` parameters (now ``self.SAMPLE_RATE`` and ``self.GAP_RESET_NS``), and replaced the ``None`` sentinel with ``-1``. The three test cases in ``tests/unit_tests/test_audio_gstreamer.py`` still imported the old name and signature and were failing with AttributeError. Renames the tests to match the new helper, drops the removed positional args, and passes a SimpleNamespace stub carrying the two class constants the helper reads (the old ``cast(GStreamerAudio, object())`` trick worked only because the constants were arguments). Assisted-by: Claude:claude-opus-4-7
The first user-supplied audio buffer was being partially eaten on the receiver side — the conv-app's first spoken word came through clipped. Cause: the Opus encoder + rtpopuspay + webrtcbin send chain needs a few RTP frames before it's actually streaming; during that warm-up window the leading samples of the real audio get dropped or truncated. Push a 0.5 s buffer of zeros on the first push_audio_sample call so the warm-up consumes silence instead of real speech. Subsequent pushes go through unchanged via the existing PTS-aware path. Splits the per-buffer GstBuffer + push_buffer + return-check logic out of push_audio_sample into a private _push_buffer helper so the warm-up and the user buffer share a single push path. Assisted-by: Claude:claude-opus-4-7
Open
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.