feat(stt): add mlx_whisper as first-class local STT provider on macOS#3498
Open
dlkakbs wants to merge 3 commits intoNousResearch:mainfrom
Open
feat(stt): add mlx_whisper as first-class local STT provider on macOS#3498dlkakbs wants to merge 3 commits intoNousResearch:mainfrom
dlkakbs wants to merge 3 commits intoNousResearch:mainfrom
Conversation
- Add mlx_whisper provider to _get_provider() with explicit config support
and auto-detect on macOS when faster-whisper is absent
- Add _transcribe_mlx_whisper() using mlx_whisper.transcribe() with HF repo IDs
- Add MLX_MODEL_ALIASES dict normalizing short names (base, small, turbo, …)
to mlx-community HF repo IDs; cloud model names (whisper-1 etc.) are also
normalized so local/cloud config footgun is avoided
- Auto-detect order: faster_whisper > mlx_whisper (macOS) > local_command > groq > openai
Config example:
stt:
provider: mlx_whisper
mlx_whisper:
model: base # or small, large-v3, turbo, or full HF repo ID
Closes NousResearch#3491
CHANTXU64
pushed a commit
to CHANTXU64/hermes-agent
that referenced
this pull request
Apr 20, 2026
…Research#3498) - Add _HAS_MLX_WHISPER detection - Add MLX_MODEL_ALIASES mapping (tiny/base/small/medium/large-v3/turbo) - Add _normalize_mlx_model() function - Add _transcribe_mlx_whisper() transcription function - Add mlx_whisper branch in _get_provider() (explicit + auto-detect on macOS) - Remove fake faster_whisper module hack - Keep standard log format consistent with other providers
|
I guess need to change https://github.com/dlkakbs/hermes-agent/blob/7c42a193a9c4611ec659a8ac49c50f10b3aa1fd7/tools/voice_mode.py#L741 as well? if not stt_enabled:
details_parts.append("STT provider: DISABLED in config (stt.enabled: false)")
elif stt_provider == "local":
details_parts.append("STT provider: OK (local faster-whisper)")
elif stt_provider == "groq":
details_parts.append("STT provider: OK (Groq)")
elif stt_provider == "openai":
details_parts.append("STT provider: OK (OpenAI)")
else:
details_parts.append(
"STT provider: MISSING (pip install faster-whisper, "
"or set GROQ_API_KEY / VOICE_TOOLS_OPENAI_KEY)"
)The upstream seems have been changed a lot btw. |
…local_command, and mistral check_voice_requirements() reported MISSING for any provider other than local/groq/openai. Add status lines for mlx_whisper, local_command, and mistral to match the providers now handled in transcription_tools. Also extend the fallback hint to mention MISTRAL_API_KEY.
Keeps mlx_whisper provider alongside upstream additions (mistral, xai, _safe_find_spec, _normalize_local_command_model). Auto-detect order: local > mlx_whisper (macOS) > local_command > groq > openai > mistral > xai.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds mlx_whisper as a first-class local STT provider for macOS / Apple Silicon users (Option A from #3491).
Previously, using MLX Whisper required a manual wrapper script mapped through HERMES_LOCAL_STT_COMMAND. This PR removes that friction: users can now set provider: mlx_whisper directly in config, use short model aliases, and get automatic provider selection on macOS when faster-whisper is not installed.
Changes:
Config example:
stt:
provider: mlx_whisper
mlx_whisper:
model: base # tiny | base | small | medium | large-v3 | turbo
# or a full HF repo ID: mlx-community/whisper-large-v3-mlx
Auto-detect order (no explicit config):
faster_whisper → mlx_whisper (macOS only) → local_command → groq → openai
Related Issue
Closes #3491
Type of Change
Changes Made
Why not Option B (generic local backend abstraction)?
Option B proposes restructuring provider: local into a two-level schema (provider: local + local.backend: mlx_whisper). This was intentionally deferred for several reasons:
Option B remains a valid future improvement once more local backends exist and the abstraction is better motivated.
Why not Options C / D / E (transparent auto-detection)?
These approaches auto-select mlx_whisper silently without any config entry. The downside: users can't tell which backend is running, debugging transcription issues becomes harder, and behavior changes unexpectedly across environments. Explicit config (provider: mlx_whisper) with a clear auto-detect fallback on macOS is the better balance between convenience and transparency.
How to Test
Checklist