Skip to content

feat(protocol): add restart_daemon DataChannel command#1131

Open
tfrere wants to merge 1 commit into
mainfrom
feat/restart-daemon-webrtc-cmd
Open

feat(protocol): add restart_daemon DataChannel command#1131
tfrere wants to merge 1 commit into
mainfrom
feat/restart-daemon-webrtc-cmd

Conversation

@tfrere
Copy link
Copy Markdown
Contributor

@tfrere tfrere commented May 19, 2026

Summary

Adds a restart_daemon command to the WebRTC DataChannel protocol so a remote peer can recover a degraded daemon (e.g. closed motor controller asserting on every command) without SSH or LAN-only HTTP access. Same effect as POST /api/daemon/restart, but reachable from a Central-routed peer using the typed transport already used for robot control.

Why

Today, when the motor controller dies mid-session (Motor controller not initialized or already closed. repeating in journalctl), the daemon keeps reporting motor_mode: \"enabled\" while every command silently asserts. The only recovery paths are:

  • sudo systemctl restart reachy_mini over SSH, or
  • POST /api/daemon/restart from localhost.

Neither is reachable from a browser running in a Hugging Face Space (or any remote app routed through the Central signaling relay). This adds a recovery hatch over the same WebRTC channel the app is already using.

Design

Three small changes, decoupled via a callback so the backend doesn't need to import the daemon (avoids the circular daemon → backend → daemon import).

1. Protocol (io/protocol.py)

New Pydantic command:

class RestartDaemonCmd(BaseModel):
    type: Literal[\"restart_daemon\"] = \"restart_daemon\"
    goto_sleep_on_stop: bool = False
    wake_up_on_start: bool = False

Defaults are conservative: robot stays where it is (no sleep before stop, no wake-up trajectory replay). Both flags can be set explicitly to opt in.

2. Backend (daemon/backend/abstract.py)

  • Backend.set_daemon_restart_handler(handler): setter wired by Daemon.start().
  • Dispatcher case in process_command:
    • Acks the caller before scheduling the restart, since stopping the daemon tears down the media pipeline and closes the data channel. Without the pre-ack, a well-behaved client cannot distinguish "request accepted" from a generic transport failure.
    • Schedules the restart on the asyncio loop and logs failures to journalctl (the caller is no longer reachable past that point).
    • Returns a clear error if the handler is not wired (e.g. unit tests instantiating the backend without a daemon).

3. Daemon (daemon/daemon.py)

After _setup_backend():

self.backend.set_daemon_restart_handler(self.restart)

Re-wired on every start() because each start builds a new backend instance. Daemon parameters (sim mode, serial port, kinematics, audio, localhost_only, ...) are preserved automatically by Daemon.restart() from _start_params.

Behaviour

  • Active WebRTC session(s): dropped (intentional: the media pipeline is torn down and rebuilt).
  • Daemon parameters: preserved from the previous start().
  • Robot pose: preserved by default (goto_sleep_on_stop=False).
  • Wake-up: not replayed by default (wake_up_on_start=False).
  • WebSocket peers (/api/ws): also dropped along with the daemon teardown - same as a POST /api/daemon/restart.

Test plan

Smoke-tested locally:

  • Pydantic round-trip: command_adapter.validate_python({\"type\": \"restart_daemon\", \"wake_up_on_start\": true}) parses correctly with the right defaults.
  • Dispatcher unit-tested in two paths:
    • No handler wired → caller receives {\"error\": \"restart_daemon: no daemon lifecycle handler wired\", \"command\": \"restart_daemon\"}.
    • Handler wired → caller receives {\"status\": \"ok\", \"command\": \"restart_daemon\", \"scheduled\": true} synchronously, then the handler is invoked with the requested kwargs on the next event-loop tick.

Pre-merge checklist:

  • On-robot end-to-end: trigger restart_daemon from a browser peer, confirm the daemon restarts, motor controller comes back, and a fresh WebRTC session reconnects cleanly.
  • Verify with the JS SDK (the matching reachyMini.restartDaemon() is intentionally not in this PR; should be a follow-up that lands on js/reachy-mini.js once the protocol is merged here).

Out of scope

  • JS SDK helper (reachyMini.restartDaemon()): follow-up PR.
  • Exposing this over the WebSocket transport: not needed, both transports already share process_command(). WS peers can already use POST /api/daemon/restart locally.

Made with Cursor

Recovery hatch for a degraded daemon (e.g. closed motor controller
asserting on every command) over the same typed WebRTC transport
already used for control, with no SSH or LAN-only HTTP access needed.
Same effect as POST /api/daemon/restart, but reachable from a remote
(Central-routed) peer.

The dispatcher acks the caller before scheduling the restart so the
client can distinguish "request accepted" from a transport failure
(stopping the daemon tears down the media pipeline, which closes the
data channel - the ack would otherwise never reach the caller).

Daemon parameters (sim mode, serial port, kinematics, audio,
localhost_only, ...) are preserved from the previous start(). The
restart is conservative by default: the robot is NOT put to sleep
before stopping (goto_sleep_on_stop=False), and the wake-up
trajectory is NOT replayed after (wake_up_on_start=False). Both
flags can be set to opt in.

Wiring is decoupled: Backend exposes set_daemon_restart_handler()
and Daemon.start() passes Daemon.restart in. This avoids a circular
daemon -> backend -> daemon import and lets non-daemon backends
(tests, future transports) leave the hook unset, in which case
restart_daemon returns a clear error to the caller.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants