Update libsrt to v1.5.5 and harden Mux SRT streaming#1087
Merged
Conversation
@eyevinn/srt defaults to a 3-year-old version; this puts us on latest stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use the config:best-practices preset with group:allNonMajor instead of hand-rolling the non-major grouping. Keep the node engines opt-out and the bump rangeStrategy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production has been seeing "Connection was broken" from srt_sendmsg2 mid-stream, which kills the whole party. Three changes work together to make these recoverable: - Mux livestream: set reconnect_window to 30s. Low Latency mode defaults this to 0 (no reconnect allowed), so we have to opt back in. - SRT caller: poll srt_bstats every 2s and log RTT, retrans/loss counters, send-rate, and buffer depth. Until now we had no ground truth on why connections drop; this gives us one. - streamPacketFile: on "Connection was broken", tear down the broken caller and reopen against the same livestream creds, retrying the failed write. Bounded to 5 attempts per file invocation. packetCtx is preserved so PCR/PTS stay continuous across the gap. Disconnect and reconnect messages both reach the Discord debug channel via debugWarn/debugInfo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
fusion2004
added a commit
that referenced
this pull request
May 21, 2026
Two bugs surfaced in production stats and logs after #1087 deployed: - srtWrite retried the same Buffer after a reconnect, but @eyevinn/srt transfers the chunk's ArrayBuffer to its worker via postMessage on the first call, leaving it detached in our process. The retry threw "Cannot transfer object of unsupported type" and killed the party. Refactor srtWrite to take the persistent chunkBuf as a source and allocate a fresh allocUnsafeSlow + copy each loop iteration. - 15-second pacing lead against a 1000ms SRTO_LATENCY meant libsrt was TLPKTDROP'ing ~50% of attempted packets from the very first stats snapshot (msSndBuf hovering at ~800ms with zero loss/retrans). Mux saw 26 seconds of gappy audio before dropping the connection. Convert PACING_LEAD_SEC=15 to PACING_LEAD_MS=900 so the lead stays comfortably under the SRT latency window. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@eyevinn/srt)config:best-practices+group:allNonMajorsrt_sendmsg2: Connection was brokendrops by reconnecting in-place, with stats logging for diagnosisThe streaming changes
Production hit "Connection was broken" mid-party twice, which killed the stream both times. Three coordinated changes make these recoverable:
reconnect_window: 30. Low Latency mode defaults this to 0 (rejects any reconnect), so we have to opt back in.srt_bstatsevery 2s and log RTT, retrans/loss counters, send-rate, flight size, and send-buffer depth. The libsrt error string doesn't tell us why the connection broke; this gives us ground truth for future incidents.streamPacketFile: on "Connection was broken", tear down the broken caller and reopen against the same livestream creds, retrying the failed write. Bounded to 5 attempts per file invocation.packetCtxis preserved across the gap so PCR/PTS stay continuous. Disconnect/reconnect messages route throughdebugWarn/debugInfoso they reach the Discord debug channel.Trade-off worth knowing: a reconnected write re-sends the chunk that failed, which can produce a CC duplicate on Mux's side if libsrt had partially delivered. Acceptable vs. a dead party; the new stats logging will tell us if it's actually a problem.
Test plan
mise run lintcleanmise run test— 178 tests passing🤖 Generated with Claude Code