Skip to content

fix/Full voice functionality in Discord when deployed via Modal/using modal backend#3475

Closed
twilwa wants to merge 12 commits intoNousResearch:mainfrom
twilwa:main
Closed

fix/Full voice functionality in Discord when deployed via Modal/using modal backend#3475
twilwa wants to merge 12 commits intoNousResearch:mainfrom
twilwa:main

Conversation

@twilwa
Copy link
Copy Markdown
Contributor

@twilwa twilwa commented Mar 28, 2026

Summary

This fixes the remaining Discord voice-mode regressions on the hosted Modal gateway and hardens the hosted runtime around the follow-up review feedback.

Changes in this PR:

  • preserve Discord voice-session continuity by reusing the linked text-channel source metadata for voice input
  • treat active voice-linked text channels as free-response contexts, so Hermes does not require a fresh @mention after /voice channel
  • restrict that voice-linked exemption to the exact bound text channel, so unrelated threads under the same parent channel do not bypass mention-gating
  • add the hosted Modal gateway runtime and install ffmpeg + libopus0 in the Modal container so Discord voice playback and transcription work in production
  • make the Modal deploy wrapper import-safe when optional modal / fastapi dependencies are unavailable
  • harden hosted bootstrap env handling for malformed base64 payloads and GitHub token alias normalization

Root cause

There were two primary runtime issues behind the broken hosted behavior:

  1. The Modal container was missing voice runtime dependencies.
    Before the fix, hosted logs showed:

    • Opus codec not found — voice channel playback disabled
    • discord.errors.ClientException: ffmpeg was not found

    That prevented Hermes from speaking in voice and blocked the ffmpeg-based audio conversion path used for Discord voice input.

  2. Voice-linked text channels were not being treated as the active conversation context.
    After /voice channel, Hermes could join voice, but the linked Discord text channel still fell back to mention-gating and failed auto-thread creation with:

    • 400 Bad Request (error code: 50024): Cannot execute action on this channel type

    That split voice turns from text turns and made continuity feel broken.

The follow-up review comments also caught two hardening gaps:

  • importing the Modal deploy wrapper pulled in optional hosted dependencies too early for tests and non-Modal workflows
  • hosted env bootstrap and token normalization paths were a little too trusting of malformed inputs and alias ordering

What changed

  • Discord voice input now reuses the bound source metadata from the /voice channel session, which keeps voice and linked text replies in the same session.
  • Active voice-linked text channels are now handled as free-response channels instead of trying to create unsupported auto-threads.
  • The voice-linked free-response exemption now only applies to the exact linked text channel, not sibling threads under the same parent.
  • Added a hosted Modal gateway wrapper/runtime.
  • The Modal image now installs:
    • ffmpeg
    • libopus0
  • Moved the deploy-time secret-name helper out of the Modal wrapper so unit tests do not depend on importing the hosted deploy module.
  • Made scripts/modal_gateway.py import cleanly even when modal or fastapi are not installed.
  • _decode_base64_env() now safely ignores malformed base64 / non-UTF-8 payloads instead of crashing bootstrap.
  • normalize_github_token_env() now prefers already-present GITHUB_TOKEN / GH_TOKEN values before falling back to alternate secret names.

Testing

Automated:

  • .venv/bin/python -m pytest tests/gateway/test_discord_free_response.py tests/gateway/test_voice_command.py tests/gateway/test_modal_runtime.py -q

Result:

  • 200 passed

Manual:

  • redeployed the Modal gateway
  • verified /voice channel join
  • verified two-way voice behavior in Discord after redeploy
  • verified the linked text channel no longer needed a fresh @hermes to continue the active conversation
  • verified the Modal deploy wrapper can still be imported cleanly without optional hosted dependencies installed

Notes

The hosted status dashboard retains historical log tail content from the persisted Modal volume, so older pre-fix ffmpeg / Opus errors may still appear in dashboard history. The new runtime itself starts cleanly and the voice path was manually verified after redeploy.

What does this PR do?

see above

Related Issue

#1559 is related, this addresses gaps in full-duplex voice mode when running via the Modal backend runtime.

Fixes #

Type of Change

  • [ X] 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • [X ] 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

see above

How to Test

see above

Checklist

Code

  • [X ] I've read the Contributing Guide
  • [X ] My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • [X ] I searched for existing PRs to make sure this isn't a duplicate
  • [ X] My PR contains only changes related to this fix/feature (no unrelated commits)
  • [X ] I've run pytest tests/ -q and all tests pass
  • [ X] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • [X ] I've tested on my platform:

Documentation & Housekeeping

  • [ X] I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • [ X] I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • [ X] I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • [ X] I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • [X ] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

logs would be best included here but mine are a bit messy, if needed i'll unapply/test>save-logs>reapply/test>save-logs and include.

Copilot AI review requested due to automatic review settings March 28, 2026 02:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Discord voice-mode regressions when running Hermes via the hosted Modal gateway by preserving voice/text session continuity, relaxing mention-gating only for the active voice-linked text channel, and adding a Modal runtime wrapper (including required voice dependencies) plus bootstrap hardening.

Changes:

  • Preserve Discord voice-session continuity by storing/reusing linked text-channel SessionSource metadata for voice input events.
  • Treat the active voice-linked text channel as a free-response context (skip mention requirement + skip auto-thread creation), without exempting sibling threads.
  • Add a Modal hosted runtime + deploy wrapper, including config/env sanitization, base64 bootstrap handling, and installing ffmpeg + libopus0.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
gateway/run.py Reuses stored voice session source metadata; uses provider-aware STT defaults; avoids signal handlers outside main thread.
gateway/platforms/discord.py Tracks voice-linked source metadata; exempts the exact voice-linked text channel from mention-gating and auto-threading.
gateway/modal_runtime.py Adds Modal bootstrap/sanitization helpers and a hosted gateway service + dashboard HTML renderer.
scripts/modal_gateway.py Adds an import-safe Modal deploy wrapper that builds a single-container ASGI dashboard and installs voice deps.
tests/gateway/test_voice_command.py Adds coverage for source metadata reuse and provider-aware STT default behavior.
tests/gateway/test_discord_free_response.py Adds coverage for voice-linked mention-gating behavior (and ensures threads still require mention).
tests/gateway/test_stt_config.py Adds coverage ensuring transcription uses provider-aware defaults (no forced model kwarg).
tests/gateway/test_modal_runtime.py Adds coverage for Modal runtime sanitization, bootstrap, dashboard rendering, and env normalization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/modal_runtime.py Outdated
Comment on lines +209 to +212
if config_text:
loaded = yaml.safe_load(config_text) or {}
if isinstance(loaded, dict):
config_payload = loaded
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml.safe_load(config_text) can raise yaml.YAMLError for malformed (but still base64/UTF-8 valid) payloads, which would crash Modal bootstrap and prevent the gateway from starting. Consider wrapping the YAML parse in a try/except (or using a helper that returns {} on any YAML error) and optionally recording a diagnostic in modal-bootstrap.json or logs.

Copilot uses AI. Check for mistakes.
Comment thread gateway/modal_runtime.py Outdated
Comment on lines +334 to +336
os.chdir(self.project_root)

try:
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.chdir(self.project_root) happens outside the try block, so a bad project_root (or missing directory) would terminate the gateway thread without setting _last_error or committing state. Move the chdir into the try and set _last_error on failure so the dashboard/health endpoint reflects the real startup error.

Suggested change
os.chdir(self.project_root)
try:
try:
os.chdir(self.project_root)

Copilot uses AI. Check for mistakes.
@twilwa
Copy link
Copy Markdown
Contributor Author

twilwa commented Apr 1, 2026

copilot nits have been addressed, ready for review when yall are

teknium1 pushed a commit that referenced this pull request Apr 13, 2026
- Store source metadata on /voice channel join so voice input shares the
  same session as the linked text channel conversation
- Treat voice-linked text channels as free-response (skip @mention and
  auto-thread) while voice is active
- Scope the voice-linked exemption to the exact bound channel, not
  sibling threads
- Guard signal handler registration in start_gateway() for non-main
  threads (prevents RuntimeError when gateway runs in a daemon thread)
- Clean up _voice_sources on leave_voice_channel

Salvaged from PR #3475 by twilwa (Modal runtime portions excluded).
teknium1 pushed a commit that referenced this pull request Apr 13, 2026
- Store source metadata on /voice channel join so voice input shares the
  same session as the linked text channel conversation
- Treat voice-linked text channels as free-response (skip @mention and
  auto-thread) while voice is active
- Scope the voice-linked exemption to the exact bound channel, not
  sibling threads
- Guard signal handler registration in start_gateway() for non-main
  threads (prevents RuntimeError when gateway runs in a daemon thread)
- Clean up _voice_sources on leave_voice_channel

Salvaged from PR #3475 by twilwa (Modal runtime portions excluded).
@teknium1
Copy link
Copy Markdown
Contributor

Merged via PR #8984. Your voice session continuity fix and signal handler guard were cherry-picked onto current main with your authorship preserved. The Modal hosted runtime portions were excluded per maintainer decision. Thanks for the contribution!

@teknium1 teknium1 closed this Apr 13, 2026
sosyz pushed a commit to sosyz/hermes-agent that referenced this pull request Apr 13, 2026
- Store source metadata on /voice channel join so voice input shares the
  same session as the linked text channel conversation
- Treat voice-linked text channels as free-response (skip @mention and
  auto-thread) while voice is active
- Scope the voice-linked exemption to the exact bound channel, not
  sibling threads
- Guard signal handler registration in start_gateway() for non-main
  threads (prevents RuntimeError when gateway runs in a daemon thread)
- Clean up _voice_sources on leave_voice_channel

Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).
uucokgis pushed a commit to uucokgis/hermes-agent that referenced this pull request Apr 13, 2026
- Store source metadata on /voice channel join so voice input shares the
  same session as the linked text channel conversation
- Treat voice-linked text channels as free-response (skip @mention and
  auto-thread) while voice is active
- Scope the voice-linked exemption to the exact bound channel, not
  sibling threads
- Guard signal handler registration in start_gateway() for non-main
  threads (prevents RuntimeError when gateway runs in a daemon thread)
- Clean up _voice_sources on leave_voice_channel

Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).
WingedDragon pushed a commit to WingedDragon/hermes-agent that referenced this pull request Apr 16, 2026
- Store source metadata on /voice channel join so voice input shares the
  same session as the linked text channel conversation
- Treat voice-linked text channels as free-response (skip @mention and
  auto-thread) while voice is active
- Scope the voice-linked exemption to the exact bound channel, not
  sibling threads
- Guard signal handler registration in start_gateway() for non-main
  threads (prevents RuntimeError when gateway runs in a daemon thread)
- Clean up _voice_sources on leave_voice_channel

Salvaged from PR NousResearch#3475 by twilwa (Modal runtime portions excluded).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants