Skip to content

Feature/head wobbler#1001

Open
RemiFabre wants to merge 27 commits into
mainfrom
feature/head-wobbler
Open

Feature/head wobbler#1001
RemiFabre wants to merge 27 commits into
mainfrom
feature/head-wobbler

Conversation

@RemiFabre
Copy link
Copy Markdown
Member

No description provided.

FabienDanieau and others added 2 commits March 27, 2026 17:49
Enable the robot to move its head naturally when audio is played. Audio
playback pipelines are forked via a GStreamer tee to an appsink that feeds
a SwayRollRT analyser producing 6-DOF movement offsets (pitch, yaw, roll,
x, y, z). These offsets are composed with the current target head pose
via compose_world_offset() before IK, so wobbling layers on top of any
running movement.

The feature is opt-in: `mini.enable_wobbling()` / `mini.disable_wobbling()`.
The tee is always present in the pipeline (negligible overhead when idle).

New files:
- motion/speech_tapper.py: audio analysis (VAD + oscillator-based sway)
- motion/head_wobbler.py: threaded wrapper with timing control
- tests/unit_tests/test_head_wobbler.py: 29 unit tests

Modified:
- io/protocol.py: SetSpeechOffsetsCmd for SDK→daemon offset transport
- daemon/backend/abstract.py: speech offsets + IK composition
- media/audio_gstreamer.py: tee pipeline + enable/disable_wobbling API
- media/media_manager.py: forwarding enable/disable_wobbling
- reachy_mini.py: public enable/disable_wobbling API
- examples/sound_play.py: --wobbling flag

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend head wobbling to GstMediaServer so it works on the wireless
Reachy Mini where audio is played daemon-side. Both play_sound() and
incoming WebRTC audio paths now include a tee+appsink that feeds the
wobbler when enabled. The wobbler calls backend.set_speech_offsets()
directly (same process, no WebSocket round-trip).

New protocol command SetWobblingCmd lets clients toggle daemon-side
wobbling. REST endpoints POST /media/wobbling/enable and
POST /media/wobbling/disable allow WebRTC-only clients (no SDK) to
control the feature via HTTP.

ReachyMini.enable_wobbling() now enables both SDK-side and daemon-side
wobbling in a single call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@RemiFabre RemiFabre linked an issue Mar 30, 2026 that may be closed by this pull request
@RemiFabre RemiFabre added the enhancement New feature or request label Mar 30, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in Reachy Mini Mar 30, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

RemiFabre and others added 12 commits April 14, 2026 16:02
v1: Direct envelope, no VAD — energy directly scales oscillator amplitude
v2: Multi-band energy — low/mid/high frequencies drive pitch/yaw/roll
v3: Onset impulse — syllable onsets trigger random decaying movements
protocol.py: remove stray commas in AnyCommand union that broke
the | chain and caused SyntaxError on import.

audio_gstreamer.py: drop a duplicated copy of the wobbler helpers
plus an orphaned _init_pipeline_playback shell that the merge had
inserted between two identical helper blocks. Restore two main-side
behaviors that were only present in the now-removed shell:
  - appsrc 'do-timestamp' = False
  - audiosink buffer-time / latency-time tuning (using the existing
    PLAYBACK_SINK_BUFFER_TIME_US / PLAYBACK_SINK_LATENCY_TIME_US
    constants), placed inside _build_audiosink_element so both the
    local pipeline and the WebRTC tee bin benefit.

Assisted-by: Claude:claude-opus-4-7
The HeadWobbler used to run its own thread + queue and schedule
movement against time.monotonic() plus a hand-tuned 0.2s constant.
Replace that with a PTS-driven scheduler:

- HeadWobbler.feed(pcm, sr, play_at_monotonic_ns) calls SwayRollRT
  to get per-hop sway dicts, then registers one GLib.timeout_add
  per hop on the existing pipeline GLib main loop. A generation
  counter snapshot lets stop()/reset() cancel pending callbacks
  without tracking source ids.
- The wobbler appsink callback in audio_gstreamer.py and
  media_server.py reads buf.pts, queries pipeline latency once,
  and converts to a time.monotonic_ns() instant
  (pts + base_time + sink_latency, rebased onto the monotonic
  clock). Falls back to now + sink_latency when PTS is unset.

Tests rewritten to mock GLib.timeout_add and assert hop count,
HOP_MS spacing, generation-counter cancellation, and past-deadline
drop. 32/32 pass.

Assisted-by: Claude:claude-opus-4-7
- head_wobbler.py: replace inline branched imports in
  _load_sway_class with a top-level _TAPPER_VERSIONS dict keyed by
  the WOBBLER_VERSION env var, typed as dict[str, ModuleType] so
  attribute access on the loaded module is clean for mypy.
- speech_tapper_v1/v2/v3.py: drop unused FRAME_MS / _rms_dbfs
  imports, sort import blocks, wrap _loudness_gain returns in
  float() to satisfy no-any-return, add docstrings on __init__,
  reset, and feed.

mypy now reports 0 errors across the 116 source files; ruff is
clean on src/ and tests/.

Assisted-by: Claude:claude-opus-4-7
Adds /api/media/wobbling/enable and /api/media/wobbling/disable
to the published OpenAPI spec.

Assisted-by: Claude:claude-opus-4-7
The wobbler appsink's hard caps were propagating upstream through the
playbin's tee bin and constraining the audiosink branch. On the
wireless XMOS PCM (alsasink reachymini_audio_sink), this triggered
an IEC958 fallback that ALSA refused to open — the wake-up sound
(and all other play_sound output) silently failed.

Pipeline shape change: drop the shared audioconvert+audioresample in
front of the tee, add per-branch audioconvert+audioresample so each
leaf negotiates its own format independently. Applied to
_build_audiosink_tee_bin (used by playbin) and the per-peer WebRTC
playback pipeline in _setup_remote_audio_playback.

With per-branch conversion in place, the wobbler appsink can now
declare channels=1 plus a fixed rate, so SwayRollRT receives
canonical float32 mono PCM at a known rate. That lets us:

- delete _to_float32_mono and _resample_linear (the DSP hot path was
  duplicating work GStreamer already does);
- drop the int16 / resampling tests that exercised that path;
- make sample_rate a per-instance attribute on SwayRollRT (frame /
  hop / samples deque maxlen all derive from it);
- thread sample_rate through HeadWobbler.__init__ so the same value
  drives the appsink caps and the DSP — audio_gstreamer.py uses
  self.SAMPLE_RATE, media_server.py adds a _WOBBLER_SAMPLE_RATE
  constant referenced by both the caps string and HeadWobbler.

ruff / mypy clean; 24/24 wobbler tests pass.

Assisted-by: Claude:claude-opus-4-7
- play_sound previously built a local 'audiosink' element that was
  never attached to anything — the real sink comes from
  _build_audiosink_tee_bin() → _build_audiosink_element(). Delete
  the 26 unused lines. (Incidentally restores the id_audio_card=None
  fallback to autoaudiosink that the inline block was missing.)
- _on_wobbler_sample used to walk get_parent() chains on every
  buffer via _find_pipeline to locate the owning pipeline (either
  _pipeline_playback or a playbin from play_sound) for PTS/latency
  queries. Replace with a cached self._pipeline_for_wobbler,
  assigned at the two attachment sites: top of
  _init_pipeline_playback and in play_sound right after
  playbin.set_property("audio-sink", ...). Drop _find_pipeline.

Assisted-by: Claude:claude-opus-4-7
If the wobbler was active when goto_sleep is invoked, leftover speech
offsets kept composing into the target pose during the goto, producing
a wobbly descent into the sleep pose. Call
media_server.disable_wobbling() (which stops the HeadWobbler, fires
zero offsets, and cancels pending GLib timeouts via the generation
counter) and defensively zero self._speech_offsets at the top of
goto_sleep.

Assisted-by: Claude:claude-opus-4-7
Two appsink modes for the wobbler tee, depending on the source
pipeline:

- playbin-based paths (GStreamerAudio.play_sound and the daemon's
  GstMediaServer.play_sound) use sync=True. The wobbler appsink
  then emits new-sample at the buffer's PTS on the pipeline clock,
  which is also when the audiosink outputs that buffer. A/V sync
  is free; play_at_ns = time.monotonic_ns().
- Live push paths (GStreamerAudio._init_pipeline_playback driven by
  push_audio_sample, and media_server._setup_remote_audio_playback
  for WebRTC ingress) use sync=False. is-live appsrc + PTS-sync
  appsink deadlocks under back-pressure — the appsink's bounded
  max-buffers fills and drops. So the appsink delivers ASAP and we
  schedule play_at = now + PLAYBACK_SINK_BUFFER_TIME_US (50 ms).

The single _on_wobbler_sample handler branches on
appsink.get_property("sync") to pick the right play_at formula.

Drops the now-unused _get_sink_latency_ns + _pipeline_for_wobbler
(audio_gstreamer.py) and _find_pipeline + DEFAULT_SINK_LATENCY_NS
(media_server.py): with sync=True no latency math is needed, and
with sync=False a fixed offset works fine because pulsesink's
query_latency returns min=0 anyway.

HeadWobbler.feed now clamps small negative delays to 0 (only skips
hops more than one HOP_MS in the past) so sub-ms scheduling jitter
doesn't drop every hop when play_at is "now".

Also folds _TAPPER_VERSIONS / _load_sway_class / _ZERO_OFFSETS into
the HeadWobbler class; SpeechOffsets stays module-level (imported
by media_manager/audio_gstreamer/media_server).

ruff / mypy clean; 24/24 wobbler tests pass. Verified on desktop
(--wav + --live) and wireless (same).

Assisted-by: Claude:claude-opus-4-7
New example ``examples/sound_tts.py`` that submits text to
``Qwen/Qwen3-TTS`` via ``gradio-client``, plays the returned audio
on Reachy Mini, and wobbles the head in sync. Probes the audio
duration via ``GstPbutils.Discoverer`` so the example works for
any format the playbin decodes (wav / mp3 / …), not just wav.

The "voice design" endpoint accepts a natural-language style
prompt, so ``--voice-description`` is the main knob. ``--lang``
exposes the Space's 10 supported languages (``Auto`` detects).
``--wobbler-version`` lets the demo exercise v0..v3 of the speech
tapper.

- Adds ``gradio-client`` to the ``examples`` extra in pyproject.
- Adds ``docs/source/examples/sound_tts.md`` and a toctree entry
  under Examples.

Assisted-by: Claude:claude-opus-4-7
@FabienDanieau
Copy link
Copy Markdown
Contributor

summary of my changes

  • Merged main into the branch and cleaned up the resolution (8c6bdf2f): the merge had dropped some main-side changes and duplicated a wobbler helper block in audio_gstreamer.py; fixed that plus a protocol.py syntax error in the AnyCommand union.
  • Refactored HeadWobbler to be PTS-driven, no thread (8ec95694): got rid of the background thread + audio_queue + MOVEMENT_LATENCY_S = 0.2 constant. Each incoming buffer now calls SwayRollRT.feed(pcm) and registers one GLib.timeout_add per hop on the existing pipeline GLib main loop. A generation counter invalidates pending timeouts cleanly on stop()/reset(). The HOP_MS spacing still comes from whichever speech-tapper module you loaded.
  • Simplified the speech tappers (6a4fffd2): now that the wobbler appsink caps pin the stream to mono F32LE 16 kHz and the upstream audioresample/audioconvert do the work, dropped _to_float32_mono and _resample_linear from all 4 variants. sample_rate became a per-instance attribute threaded in from the audio backend (no more module-level SR). v2's _bandpass_energy takes sample_rate explicitly.
  • Wireless audio fixes:
    • Per-branch audioconvert + audioresample after the tee so the wobbler's mono/16 kHz caps don't drag the alsasink branch into an IEC958 fallback on XMOS — that was blocking wake-up sound entirely on wireless.
    • sync=True on the wobbler appsink for playbin paths (--wav / play_sound): callback fires at the buffer's PTS on the pipeline clock = when the audiosink outputs it → A/V sync for free, no latency math.
    • sync=False kept on live push paths (push_audio_sample, WebRTC ingress) because is-live + PTS-sync deadlocks under back-pressure.
  • Cleanups: goto_sleep now disables wobbling at entry so residual offsets don't fight the sleep pose (c125be78). Removed ~100 lines of dead play_sound scaffolding and a per-sample parent-walk replaced by a cached pipeline ref (ce84d672). Ruff + mypy clean across motion/ (e158da58). openapi.json regenerated for the new POST /api/media/wobbling/{enable,disable} endpoints (6e34964c).
  • TTS demo (2c66d6af): new examples/sound_tts.py that calls Qwen/Qwen3-TTS via gradio-client with a natural-language voice-style prompt, plays the result, wobbles the head in sync. Duration probed via GstPbutils.Discoverer so any decodable format works. Doc page + toctree entry added; gradio-client in the examples extra. Try with : python examples/sound_tts.py --voice-description "Speak with joyfull voice." --wobbler-version v2

Tests rewritten around the PTS-driven API (24 cases, GLib.timeout_add mocked). Validated end-to-end on desktop (pulsesink) and wireless (alsasink → XMOS), --wav + --live paths.

mypy was flagging the Qwen3-TTS demo's ``synthesize`` (returning the
Any-typed gradio-client audio path) and ``probe_duration_s`` (Any
from ``Gst.SECOND`` arithmetic) as leaking ``Any`` through functions
declared ``str`` / ``float``. Wrap both return expressions in
explicit ``str(...)`` / ``float(...)`` casts.

Assisted-by: Claude:claude-opus-4-7
@FabienDanieau
Copy link
Copy Markdown
Contributor

Let's choose one of the 4 speech taper versions for now. We can improve / make it possible to configure in another issue

RemiFabre and others added 5 commits April 29, 2026 15:01
Drops the v1/v2/v3 alternatives and the WOBBLER_VERSION env-var
selection mechanism that switched between them. Only v0 (the original
speech_tapper module) ships in this branch; the experiments remain
preserved on a separate branch for reference.

  - Delete src/reachy_mini/motion/speech_tapper_v{1,2,3}.py.
  - Simplify HeadWobbler: remove _TAPPER_VERSIONS dict,
    _load_sway_class classmethod, _sway_cls field, version logging,
    and the os/types imports that were only used by them. The
    constructor now uses speech_tapper.SwayRollRT and
    speech_tapper.HOP_MS directly.
  - Drop --wobbler-version from examples/sound_tts.py and
    examples/sound_play.py (CLI flag, env-var setting, function
    parameter, docstring example).
  - Drop the --wobbler-version block from
    docs/source/examples/sound_tts.md.

Public API surface unchanged: mini.enable_wobbling() never took a
version argument. No app should need to be updated. The existing
test suite (tests/unit_tests/test_head_wobbler.py) passes unchanged
since it only exercises v0 and the public HeadWobbler interface.
Always use sync=True on the local wobbler appsink. The local pipeline
has a deterministic clock and no network jitter, so PTS-based delivery
gives correct A/V timing for both playbin (play_sound) and push
(push_audio_sample) paths. Drops the sync parameter on
_make_wobbler_appsink and the corresponding branch in
_on_wobbler_sample — play_at_ns is just time.monotonic_ns().

media_server.py keeps its sync flag (WebRTC ingress still needs
sync=False because incoming RTP timestamps don't sit on the per-peer
playback pipeline's clock and PTS-sync would deadlock under jitter).

Assisted-by: Claude:claude-opus-4-7
The cache short-circuit in WebRTCClient.play_sound compared by
basename only — if a file with the same basename had been uploaded
before, the SDK skipped the upload and asked the daemon to play the
old copy. That conflated different files when the basename collided,
e.g. successive gradio-client outputs that all land at
/tmp/gradio/<hash>/audio.wav: every TTS run after the first played
the first synthesised audio.

The daemon's /api/media/sounds/upload endpoint already overwrites by
filename (see media.py:138), so the cache check buys nothing and
costs correctness. Drop it and always upload when the path resolves
to a local file. The example examples/sound_tts.py now plays the
freshly synthesised audio on every invocation.

Assisted-by: Claude:claude-opus-4-7
The Qwen/Qwen3-TTS Hugging Face Space is currently broken on the
provider side (CONFIG_ERROR on Client init). Replace it with
ResembleAI/Chatterbox-Multilingual-TTS, which exposes a stable
/generate_tts_audio endpoint, supports 23 languages, and does
zero-shot voice cloning from a short reference audio sample.

Knobs change accordingly:
  - --voice-description (free-form Qwen prompt) → --ref-audio (URL
    or local path; tilde-expansion handled before handing off to
    gradio-client.handle_file).
  - --lang switches from human-readable names ("Auto", "English"…)
    to ISO 639-1 codes (en, fr, ja…) matching the Space's enum.

Doc page docs/source/examples/sound_tts.md is rewritten in the
same shape, with a note on the longer ~60–90 s synthesis time.

Assisted-by: Claude:claude-opus-4-7
@FabienDanieau FabienDanieau marked this pull request as ready for review May 4, 2026 14:35
@FabienDanieau FabienDanieau self-requested a review May 4, 2026 14:36
Copy link
Copy Markdown
Contributor

@FabienDanieau FabienDanieau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my side. You can even try https://huggingface.co/spaces/FabienDanieau/tts-reachymini
One know bug, when using this feature with a SDK from a remote pc it's not working so well. But it seems to be an edge use case.

WebRTC ingress on the daemon now drives a sync=True audio + wobbler
pipeline that lives in a separate Gst.Pipeline from the
webrtcsink-managed sender pipeline. For sync=True playback to be
sane, both must share a single clock AND a single base_time;
otherwise each buffer's PTS lands at a different absolute clock
time in each pipeline and the audiosink drops or stalls.

media_server changes:
  - _on_consumer_pad_added: pin the per-peer playback pipeline to
    _pipeline_sender's clock via use_clock(), disable auto-base-time
    via set_start_time(CLOCK_TIME_NONE), then PAUSED -> set_base_time
    -> PLAYING in that order so GStreamer cannot overwrite either.
  - audiosink and wobbler appsink both sync=True now that timing is
    coherent. Drops the sync=False fallback in _make_wobbler_appsink,
    its sync parameter, and the _WOBBLER_LIVE_PLAY_OFFSET_NS constant
    that was only used by the now-dead sync=False branch.
  - _on_playback_bus_message + _on_bus_message: handle LATENCY by
    recalculating on the right pipeline.
  - Promote playback_pipe to self._pipeline_playback so the bus
    handlers can reach it. Comment out the dot-dump in
    _consumer_added.

Refactor that fell out of cleaning up push_audio_sample:
  - AudioBase now owns GAP_RESET_NS = 200 ms, the _appsrc_pts state
    (initialised to -1 as "no previous buffer" sentinel — keeps the
    type a plain int), and a single _compute_pts(num_samples,
    running_time_ns, next_pts_ns) helper returning the (pts,
    duration, next_pts) tuple.
  - audio_gstreamer and webrtc_client_gstreamer drop their
    duplicate constants and per-class helpers. push_audio_sample in
    both backends now has the same shape: early-return on missing
    appsrc, _compute_pts(...), assign pts/dts/duration, push_buffer,
    log on non-OK return. The webrtc one stays silent on
    missing-appsrc (send chain not ready yet) and the local one
    still warns ("call start_playing first") since each behaviour
    is appropriate to its caller's contract.
  - _playback_next_pts_ns → _appsrc_pts in audio_gstreamer for
    consistency with the AudioBase attribute.

LATENCY recalc also added to the conv-app's webrtc_client bus
handler so the receive pipeline can react to dynamic latency
changes from webrtcsrc.

Assisted-by: Claude:claude-opus-4-7
Every GStreamer-using class in src/reachy_mini/media duplicated a
near-identical bus watch (log EOS/ERROR + return False, handle
LATENCY, sometimes WARNING). Five copies drifted in small ways
(some used ``msg.type``, some cached it in ``t``; some had LATENCY,
some did not; phrasing of log messages varied).

Adds ``handle_default_bus_message(logger, msg, pipeline)`` in
gstreamer_utils.py covering EOS / ERROR / WARNING / LATENCY with
uniform behaviour, and reshapes every existing handler to delegate
to it. Each watch's user-data is now the owning Gst.Pipeline so the
helper can call ``recalculate_latency`` on the right pipeline:

  - audio_base.AudioBase._on_bus_message: one-line delegate. Was
    duplicating the EOS/ERROR/LATENCY block.
  - audio_gstreamer.GStreamerAudio._on_bus_message: only stops the
    wobbler on EOS, then ``super()._on_bus_message(...)``.
  - webrtc_client_gstreamer.GstWebRTCClient._on_bus_message: only
    filters webrtcsrc's non-fatal internal "not-negotiated" appsrc
    error (return True), then ``super()._on_bus_message(...)``.
    Also flips the bus watch's user-data from ``self._loop`` to
    ``self._pipeline_record`` so LATENCY can reach the helper.
  - media_server.GstMediaServer._on_bus_message: one-line delegate
    too. Both the sender bus and the per-peer playback bus now
    register through this single handler, each passing its own
    pipeline.  _on_playback_bus_message is dropped (was 14 lines
    duplicating the same shape, only adding peer-id prefixes that
    were not load-bearing).
  - camera_gstreamer.GStreamerCamera._on_bus_message: keeps its
    "log warning, but return True so transient errors do not tear
    the pipeline down" behaviour for ERROR, delegates everything
    else. ``_handle_bus_calls`` passes ``self.pipeline`` to the
    watch.
  - gstreamer_udp_camera.GStreamerUDPCamera._on_bus_message: pure
    delegate now that WARNING is in the helper. Watch passes
    ``self.pipeline``.

Net result: one canonical bus handler, each subclass only owns the
branch that differs from the default.

Assisted-by: Claude:claude-opus-4-7
  - gstreamer_utils.handle_default_bus_message: LATENCY recalcs now
    log at debug level. They fire often enough during normal
    operation that info-level was noisy.
  - media_server: drop two stale comments left over from the
    sync=False era of the WebRTC playback chain. The
    ``set_start_time`` line is self-explanatory, and the
    "Live WebRTC path: sync=False" comment was contradicted by the
    surrounding code (the wobbler appsink is sync=True now that
    both pipelines share clock + base_time).

Assisted-by: Claude:claude-opus-4-7
The previous PR moved the helper from
``GStreamerAudio._compute_playback_buffer_timing`` to
``AudioBase._compute_pts``, dropped the ``sample_rate`` and
``gap_reset_ns`` parameters (now ``self.SAMPLE_RATE`` and
``self.GAP_RESET_NS``), and replaced the ``None`` sentinel with
``-1``. The three test cases in
``tests/unit_tests/test_audio_gstreamer.py`` still imported the
old name and signature and were failing with AttributeError.

Renames the tests to match the new helper, drops the removed
positional args, and passes a SimpleNamespace stub carrying the two
class constants the helper reads (the old ``cast(GStreamerAudio,
object())`` trick worked only because the constants were arguments).

Assisted-by: Claude:claude-opus-4-7
The first user-supplied audio buffer was being partially eaten on
the receiver side — the conv-app's first spoken word came through
clipped. Cause: the Opus encoder + rtpopuspay + webrtcbin send
chain needs a few RTP frames before it's actually streaming;
during that warm-up window the leading samples of the real audio
get dropped or truncated.

Push a 0.5 s buffer of zeros on the first push_audio_sample call
so the warm-up consumes silence instead of real speech. Subsequent
pushes go through unchanged via the existing PTS-aware path.

Splits the per-buffer GstBuffer + push_buffer + return-check logic
out of push_audio_sample into a private _push_buffer helper so the
warm-up and the user buffer share a single push path.

Assisted-by: Claude:claude-opus-4-7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Transfer head wobbling on the core side

3 participants