P2P Voice Calls #60

imonlinux · 2026-01-16T00:41:33Z

imonlinux
Jan 16, 2026
Maintainer

Key constraints/ingredients that matter for “peer-to-peer voice” in LVA:

Audio I/O today is Python + soundcard on PulseAudio / PipeWire-Pulse, with optional module-echo-cancel AEC.
LVA already has a state machine (IDLE/LISTENING/THINKING/RESPONDING/ERROR), an EventBus, and MQTT plumbing with stable device_id + per-device topics.
HA integration is via ESPHome voice assistant API (LVA as a satellite), but this new feature can be independent of HA if we want.

Here are several viable P2P voice directions (ordered from “most practical” to “most minimal”):

Option A: WebRTC (aiortc) + MQTT signaling (recommended)

What it is: Real-time full-duplex audio using WebRTC’s battle-tested jitter buffering, Opus codec, congestion control, and encryption (SRTP).
Signaling: Use your existing MQTT broker to exchange SDP offers/answers + ICE candidates.

Pros

Best audio quality/latency under real network conditions.
Built-in encryption + NAT traversal options (even if you later leave the LAN).
Handles clock drift/jitter much better than “roll your own UDP”.

Cons

A bit heavier conceptually (ICE/SDP), and aiortc can be CPU-ish on small Pis (still often fine for audio-only).

How it fits LVA

Add a new state like CALLING / IN_CALL (or keep it parallel but it’s usually easier to treat as a state).
Use MQTT discovery/availability you already publish to decide who is “callable”.
While IN_CALL, you likely disable wake word (or require push-to-talk) to avoid accidental triggers.

MQTT topic sketch

lva/{target_id}/call/invite (payload: {call_id, from_id, sdp_offer, timestamp, codec_prefs})
lva/{from_id}/call/answer (payload: {call_id, sdp_answer})
lva/{from_id}/call/ice and lva/{target_id}/call/ice (payload: {call_id, candidate})
lva/{peer_id}/call/bye (payload: {call_id, reason})

Security

WebRTC gives SRTP; add an allowlist of device_ids in config.json + optionally a shared secret/HMAC on signaling messages so random MQTT clients can’t ring every device.

Option B: RTP/Opus over UDP + MQTT signaling (lighter than WebRTC)

What it is: You run your own RTP session (Opus frames) between peers; MQTT still does call setup.

Pros

Simpler mental model than WebRTC; fewer moving parts.
Very low latency possible on a quiet LAN.

Cons

You must implement (or adopt) jitter buffering, packet loss handling, clock drift mitigation.
Encryption becomes “extra work” (DTLS-SRTP, libsodium, etc.).
More debugging pain than it sounds like.

When it’s attractive

If you want “LAN intercom” only, controlled network, and you accept occasional glitches.

Option C: “Intercom broadcast” (multicast or fanout) for announcements

What it is: One device sends a short live stream (or recorded snippet) to many devices (kitchen announcement mode).

Pros

Very useful feature with less complexity than true calls.
You can make it half-duplex to avoid echo/AEC headaches.

Cons

Not really a two-way call unless you add a reply path.

Fit

Great as a first milestone: “announce to all LVAs” using MQTT to coordinate start/stop, and UDP for the audio.

Option D: Use HA as rendezvous / relay (not truly P2P)

What it is: HA coordinates calls and can even relay media (or run a voice server).

Pros

Central place for permissions, UI, history, “ring groups”.

Cons

It’s no longer peer-to-peer, and HA becomes a hard dependency.

My suggested path (fastest to “good”, least regrets)

Milestone 1 (simple + useful): Half-duplex intercom (push-to-talk / hold-to-talk).
- Minimal echo issues.
- Lets you validate discovery, permissions, audio pipelines, and UX.
Milestone 2: Upgrade to full duplex “calls” using WebRTC (Option A).

This sequencing keeps you from sinking time into homegrown jitter buffers + echo problems before you’ve proven the product feel.

Brainstorm: UX + project integration ideas

Voice commands (routed locally, not via HA):
- “Intercom kitchen”, “Call office”, “Answer”, “Hang up”, “Mute call”
MQTT-controlled calls so HA dashboards or automations can trigger:
- e.g., doorbell → call/invite to a specific device group
Priority rules:
- If device is in RESPONDING (TTS playback), decide: auto-duck volume, interrupt, or reject call.
LED states:
- Ringing: pulsing color; In-call: solid color; Muted mic: existing mute LED behavior.
Echo strategy:
- For full duplex: lean on your existing PipeWire/Pulse module-echo-cancel support.
- For half duplex: you can skip AEC initially.

If you want one concrete “north star” architecture to build toward: WebRTC audio-only with aiortc, MQTT signaling, device allowlist, and an IN_CALL state that temporarily disables wakeword/STT. That keeps everything Python-native and fits your current event-driven design.

Tell me whether your target is LAN-only intercom or sometimes remote (outside the house), and I’ll narrow the design to the best option and propose the exact new modules / config schema entries / MQTT topics to add.

JenniferHatches · 2026-01-27T15:07:31Z

JenniferHatches
Jan 27, 2026

sometimes remote, but I'll be doing the VPN (tailscale)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P2P Voice Calls #60

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

P2P Voice Calls #60

Uh oh!

imonlinux Jan 16, 2026 Maintainer

Option A: WebRTC (aiortc) + MQTT signaling (recommended)

Option B: RTP/Opus over UDP + MQTT signaling (lighter than WebRTC)

Option C: “Intercom broadcast” (multicast or fanout) for announcements

Option D: Use HA as rendezvous / relay (not truly P2P)

My suggested path (fastest to “good”, least regrets)

Brainstorm: UX + project integration ideas

Replies: 1 comment

Uh oh!

JenniferHatches Jan 27, 2026

imonlinux
Jan 16, 2026
Maintainer

JenniferHatches
Jan 27, 2026