fix/Full voice functionality in Discord when deployed via Modal/using modal backend by twilwa · Pull Request #3475 · NousResearch/hermes-agent

twilwa · 2026-03-28T02:37:39Z

Summary

This fixes the remaining Discord voice-mode regressions on the hosted Modal gateway and hardens the hosted runtime around the follow-up review feedback.

Changes in this PR:

preserve Discord voice-session continuity by reusing the linked text-channel source metadata for voice input
treat active voice-linked text channels as free-response contexts, so Hermes does not require a fresh @mention after /voice channel
restrict that voice-linked exemption to the exact bound text channel, so unrelated threads under the same parent channel do not bypass mention-gating
add the hosted Modal gateway runtime and install ffmpeg + libopus0 in the Modal container so Discord voice playback and transcription work in production
make the Modal deploy wrapper import-safe when optional modal / fastapi dependencies are unavailable
harden hosted bootstrap env handling for malformed base64 payloads and GitHub token alias normalization

Root cause

There were two primary runtime issues behind the broken hosted behavior:

The Modal container was missing voice runtime dependencies.
Before the fix, hosted logs showed:
- Opus codec not found — voice channel playback disabled
- discord.errors.ClientException: ffmpeg was not found
That prevented Hermes from speaking in voice and blocked the ffmpeg-based audio conversion path used for Discord voice input.
Voice-linked text channels were not being treated as the active conversation context.
After /voice channel, Hermes could join voice, but the linked Discord text channel still fell back to mention-gating and failed auto-thread creation with:
- 400 Bad Request (error code: 50024): Cannot execute action on this channel type
That split voice turns from text turns and made continuity feel broken.

The follow-up review comments also caught two hardening gaps:

importing the Modal deploy wrapper pulled in optional hosted dependencies too early for tests and non-Modal workflows
hosted env bootstrap and token normalization paths were a little too trusting of malformed inputs and alias ordering

What changed

Discord voice input now reuses the bound source metadata from the /voice channel session, which keeps voice and linked text replies in the same session.
Active voice-linked text channels are now handled as free-response channels instead of trying to create unsupported auto-threads.
The voice-linked free-response exemption now only applies to the exact linked text channel, not sibling threads under the same parent.
Added a hosted Modal gateway wrapper/runtime.
The Modal image now installs:
- ffmpeg
- libopus0
Moved the deploy-time secret-name helper out of the Modal wrapper so unit tests do not depend on importing the hosted deploy module.
Made scripts/modal_gateway.py import cleanly even when modal or fastapi are not installed.
_decode_base64_env() now safely ignores malformed base64 / non-UTF-8 payloads instead of crashing bootstrap.
normalize_github_token_env() now prefers already-present GITHUB_TOKEN / GH_TOKEN values before falling back to alternate secret names.

Testing

Automated:

.venv/bin/python -m pytest tests/gateway/test_discord_free_response.py tests/gateway/test_voice_command.py tests/gateway/test_modal_runtime.py -q

Result:

200 passed

Manual:

redeployed the Modal gateway
verified /voice channel join
verified two-way voice behavior in Discord after redeploy
verified the linked text channel no longer needed a fresh @hermes to continue the active conversation
verified the Modal deploy wrapper can still be imported cleanly without optional hosted dependencies installed

Notes

The hosted status dashboard retains historical log tail content from the persisted Modal volume, so older pre-fix ffmpeg / Opus errors may still appear in dashboard history. The new runtime itself starts cleanly and the voice path was manually verified after redeploy.

What does this PR do?

see above

Related Issue

#1559 is related, this addresses gaps in full-duplex voice mode when running via the Modal backend runtime.

Fixes #

Type of Change

[ X] 🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
[X ] 📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

see above

How to Test

see above

Checklist

Code

[X ] I've read the Contributing Guide
[X ] My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
[X ] I searched for existing PRs to make sure this isn't a duplicate
[ X] My PR contains only changes related to this fix/feature (no unrelated commits)
[X ] I've run pytest tests/ -q and all tests pass
[ X] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
[X ] I've tested on my platform:

Documentation & Housekeeping

[ X] I've updated relevant documentation (README, docs/, docstrings) — or N/A
[ X] I've updated cli-config.yaml.example if I added/changed config keys — or N/A
[ X] I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
[ X] I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
[X ] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

logs would be best included here but mine are a bit messy, if needed i'll unapply/test>save-logs>reapply/test>save-logs and include.

fix/discord-voice-mode

Copilot

Pull request overview

This PR fixes Discord voice-mode regressions when running Hermes via the hosted Modal gateway by preserving voice/text session continuity, relaxing mention-gating only for the active voice-linked text channel, and adding a Modal runtime wrapper (including required voice dependencies) plus bootstrap hardening.

Changes:

Preserve Discord voice-session continuity by storing/reusing linked text-channel SessionSource metadata for voice input events.
Treat the active voice-linked text channel as a free-response context (skip mention requirement + skip auto-thread creation), without exempting sibling threads.
Add a Modal hosted runtime + deploy wrapper, including config/env sanitization, base64 bootstrap handling, and installing ffmpeg + libopus0.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`gateway/run.py`	Reuses stored voice session source metadata; uses provider-aware STT defaults; avoids signal handlers outside main thread.
`gateway/platforms/discord.py`	Tracks voice-linked source metadata; exempts the exact voice-linked text channel from mention-gating and auto-threading.
`gateway/modal_runtime.py`	Adds Modal bootstrap/sanitization helpers and a hosted gateway service + dashboard HTML renderer.
`scripts/modal_gateway.py`	Adds an import-safe Modal deploy wrapper that builds a single-container ASGI dashboard and installs voice deps.
`tests/gateway/test_voice_command.py`	Adds coverage for source metadata reuse and provider-aware STT default behavior.
`tests/gateway/test_discord_free_response.py`	Adds coverage for voice-linked mention-gating behavior (and ensures threads still require mention).
`tests/gateway/test_stt_config.py`	Adds coverage ensuring transcription uses provider-aware defaults (no forced model kwarg).
`tests/gateway/test_modal_runtime.py`	Adds coverage for Modal runtime sanitization, bootstrap, dashboard rendering, and env normalization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-28T02:42:50Z

+    if config_text:
+        loaded = yaml.safe_load(config_text) or {}
+        if isinstance(loaded, dict):
+            config_payload = loaded


yaml.safe_load(config_text) can raise yaml.YAMLError for malformed (but still base64/UTF-8 valid) payloads, which would crash Modal bootstrap and prevent the gateway from starting. Consider wrapping the YAML parse in a try/except (or using a helper that returns {} on any YAML error) and optionally recording a diagnostic in modal-bootstrap.json or logs.

Copilot · 2026-03-28T02:42:50Z

+        os.chdir(self.project_root)
+
+        try:


os.chdir(self.project_root) happens outside the try block, so a bad project_root (or missing directory) would terminate the gateway thread without setting _last_error or committing state. Move the chdir into the try and set _last_error on failure so the dashboard/health endpoint reflects the real startup error.

Suggested change

os.chdir(self.project_root)

try:

try:

os.chdir(self.project_root)

fix/discord-voice-mode

twilwa · 2026-04-01T02:44:06Z

copilot nits have been addressed, ready for review when yall are

- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR #3475 by twilwa (Modal runtime portions excluded).

teknium1 · 2026-04-13T11:49:28Z

Merged via PR #8984. Your voice session continuity fix and signal handler guard were cherry-picked onto current main with your authorship preserved. The Modal hosted runtime portions were excluded per maintainer decision. Thanks for the contribution!

- Store source metadata on /voice channel join so voice input shares the same session as the linked text channel conversation - Treat voice-linked text channels as free-response (skip @mention and auto-thread) while voice is active - Scope the voice-linked exemption to the exact bound channel, not sibling threads - Guard signal handler registration in start_gateway() for non-main threads (prevents RuntimeError when gateway runs in a daemon thread) - Clean up _voice_sources on leave_voice_channel Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).

twilwa added 8 commits March 27, 2026 14:48

fix(gateway): use provider-aware STT defaults for Discord voice

50e22d7

Merge origin/main into fix/discord-voice-mode

c9ad9ad

fix(discord): preserve voice channel continuity

fd41e1c

feat(modal): add hosted gateway runtime

299eeb3

test(voice): cover modal runtime and voice cleanup

df47304

fix(review): harden modal imports and voice gating

104f157

Merge origin/main into fix/discord-voice-mode

4d48796

Merge pull request #1 from twilwa/fix/discord-voice-mode

cedc276

fix/discord-voice-mode

Copilot AI review requested due to automatic review settings March 28, 2026 02:37

Copilot started reviewing on behalf of twilwa March 28, 2026 02:38 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

twilwa added 4 commits March 31, 2026 19:36

Merge branch 'NousResearch:main' into main

829e456

fix(modal): ignore malformed bootstrap yaml

3eacc88

fix(modal): report bad project roots

2b52824

Merge pull request #2 from twilwa/fix/discord-voice-mode

1078d96

fix/discord-voice-mode

teknium1 mentioned this pull request Apr 13, 2026

fix(discord): voice session continuity and signal handler thread safety #8984

Merged

teknium1 closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix/Full voice functionality in Discord when deployed via Modal/using modal backend#3475

fix/Full voice functionality in Discord when deployed via Modal/using modal backend#3475
twilwa wants to merge 12 commits intoNousResearch:mainfrom
twilwa:main

twilwa commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

twilwa commented Apr 1, 2026

Uh oh!

teknium1 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

twilwa commented Mar 28, 2026

Summary

Root cause

What changed

Testing

Notes

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

twilwa commented Apr 1, 2026

Uh oh!

teknium1 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants