fix/Full voice functionality in Discord when deployed via Modal/using modal backend#3475
fix/Full voice functionality in Discord when deployed via Modal/using modal backend#3475twilwa wants to merge 12 commits intoNousResearch:mainfrom
Conversation
fix/discord-voice-mode
There was a problem hiding this comment.
Pull request overview
This PR fixes Discord voice-mode regressions when running Hermes via the hosted Modal gateway by preserving voice/text session continuity, relaxing mention-gating only for the active voice-linked text channel, and adding a Modal runtime wrapper (including required voice dependencies) plus bootstrap hardening.
Changes:
- Preserve Discord voice-session continuity by storing/reusing linked text-channel
SessionSourcemetadata for voice input events. - Treat the active voice-linked text channel as a free-response context (skip mention requirement + skip auto-thread creation), without exempting sibling threads.
- Add a Modal hosted runtime + deploy wrapper, including config/env sanitization, base64 bootstrap handling, and installing
ffmpeg+libopus0.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
gateway/run.py |
Reuses stored voice session source metadata; uses provider-aware STT defaults; avoids signal handlers outside main thread. |
gateway/platforms/discord.py |
Tracks voice-linked source metadata; exempts the exact voice-linked text channel from mention-gating and auto-threading. |
gateway/modal_runtime.py |
Adds Modal bootstrap/sanitization helpers and a hosted gateway service + dashboard HTML renderer. |
scripts/modal_gateway.py |
Adds an import-safe Modal deploy wrapper that builds a single-container ASGI dashboard and installs voice deps. |
tests/gateway/test_voice_command.py |
Adds coverage for source metadata reuse and provider-aware STT default behavior. |
tests/gateway/test_discord_free_response.py |
Adds coverage for voice-linked mention-gating behavior (and ensures threads still require mention). |
tests/gateway/test_stt_config.py |
Adds coverage ensuring transcription uses provider-aware defaults (no forced model kwarg). |
tests/gateway/test_modal_runtime.py |
Adds coverage for Modal runtime sanitization, bootstrap, dashboard rendering, and env normalization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if config_text: | ||
| loaded = yaml.safe_load(config_text) or {} | ||
| if isinstance(loaded, dict): | ||
| config_payload = loaded |
There was a problem hiding this comment.
yaml.safe_load(config_text) can raise yaml.YAMLError for malformed (but still base64/UTF-8 valid) payloads, which would crash Modal bootstrap and prevent the gateway from starting. Consider wrapping the YAML parse in a try/except (or using a helper that returns {} on any YAML error) and optionally recording a diagnostic in modal-bootstrap.json or logs.
| os.chdir(self.project_root) | ||
|
|
||
| try: |
There was a problem hiding this comment.
os.chdir(self.project_root) happens outside the try block, so a bad project_root (or missing directory) would terminate the gateway thread without setting _last_error or committing state. Move the chdir into the try and set _last_error on failure so the dashboard/health endpoint reflects the real startup error.
| os.chdir(self.project_root) | |
| try: | |
| try: | |
| os.chdir(self.project_root) |
|
copilot nits have been addressed, ready for review when yall are |
- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR #3475 by twilwa (Modal runtime portions excluded).
- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR #3475 by twilwa (Modal runtime portions excluded).
|
Merged via PR #8984. Your voice session continuity fix and signal handler guard were cherry-picked onto current main with your authorship preserved. The Modal hosted runtime portions were excluded per maintainer decision. Thanks for the contribution! |
- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).
- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).
- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).
Summary
This fixes the remaining Discord voice-mode regressions on the hosted Modal gateway and hardens the hosted runtime around the follow-up review feedback.
Changes in this PR:
@mentionafter/voice channelffmpeg+libopus0in the Modal container so Discord voice playback and transcription work in productionmodal/fastapidependencies are unavailableRoot cause
There were two primary runtime issues behind the broken hosted behavior:
The Modal container was missing voice runtime dependencies.
Before the fix, hosted logs showed:
Opus codec not found — voice channel playback disableddiscord.errors.ClientException: ffmpeg was not foundThat prevented Hermes from speaking in voice and blocked the ffmpeg-based audio conversion path used for Discord voice input.
Voice-linked text channels were not being treated as the active conversation context.
After
/voice channel, Hermes could join voice, but the linked Discord text channel still fell back to mention-gating and failed auto-thread creation with:400 Bad Request (error code: 50024): Cannot execute action on this channel typeThat split voice turns from text turns and made continuity feel broken.
The follow-up review comments also caught two hardening gaps:
What changed
/voice channelsession, which keeps voice and linked text replies in the same session.ffmpeglibopus0scripts/modal_gateway.pyimport cleanly even whenmodalorfastapiare not installed._decode_base64_env()now safely ignores malformed base64 / non-UTF-8 payloads instead of crashing bootstrap.normalize_github_token_env()now prefers already-presentGITHUB_TOKEN/GH_TOKENvalues before falling back to alternate secret names.Testing
Automated:
.venv/bin/python -m pytest tests/gateway/test_discord_free_response.py tests/gateway/test_voice_command.py tests/gateway/test_modal_runtime.py -qResult:
200 passedManual:
/voice channeljoin@hermesto continue the active conversationNotes
The hosted status dashboard retains historical log tail content from the persisted Modal volume, so older pre-fix
ffmpeg/Opuserrors may still appear in dashboard history. The new runtime itself starts cleanly and the voice path was manually verified after redeploy.What does this PR do?
see above
Related Issue
#1559 is related, this addresses gaps in full-duplex voice mode when running via the Modal backend runtime.
Fixes #
Type of Change
Changes Made
see above
How to Test
see above
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs
logs would be best included here but mine are a bit messy, if needed i'll unapply/test>save-logs>reapply/test>save-logs and include.