feat(stt): add mlx_whisper as first-class local STT provider on macOS by dlkakbs · Pull Request #3498 · NousResearch/hermes-agent

dlkakbs · 2026-03-28T07:16:32Z

What does this PR do?

Adds mlx_whisper as a first-class local STT provider for macOS / Apple Silicon users (Option A from #3491).

Previously, using MLX Whisper required a manual wrapper script mapped through HERMES_LOCAL_STT_COMMAND. This PR removes that friction: users can now set provider: mlx_whisper directly in config, use short model aliases, and get automatic provider selection on macOS when faster-whisper is not installed.

Changes:

_HAS_MLX_WHISPER detection at import time (graceful degradation, no hard dependency)
MLX_MODEL_ALIASES dict normalizing short names (base, small, large-v3, turbo, …) to mlx-community HF repo IDs — cloud model names like whisper-1 are also normalized so misconfigured local setups don't silently pass an invalid model name
_normalize_mlx_model() — handles OpenAI/Groq model name footgun for the mlx backend
_transcribe_mlx_whisper() — thin wrapper around mlx_whisper.transcribe()
_get_provider() — explicit mlx_whisper support + auto-detect on macOS after faster-whisper
transcribe_audio() — wired to new provider

Auto-detect order (no explicit config):
faster_whisper → mlx_whisper (macOS only) → local_command → groq → openai

Related Issue

Closes #3491

Type of Change

New feature (non-breaking change that adds functionality)

Changes Made

tools/transcription_tools.py — all changes above; no existing provider logic touched

Why not Option B (generic local backend abstraction)?

Option B proposes restructuring provider: local into a two-level schema (provider: local + local.backend: mlx_whisper). This was intentionally deferred for several reasons:

Breaking config change. Existing users with provider: local would need to migrate. The current schema is already documented and in use.
Scope creep. Introducing a backend abstraction layer requires reworking _get_provider(), the config schema, and all downstream dispatch logic — a significantly larger and riskier change for what is effectively one new backend.
Premature abstraction. There are currently two local backends (faster_whisper, mlx_whisper). A shared abstraction layer is only warranted when there are enough backends to justify the added indirection — and when the shape of those backends is stable enough to abstract correctly.
Option A is sufficient. Adding mlx_whisper as a peer provider (same level as local, groq, openai) covers all the acceptance criteria in Feature: first-class MLX Whisper local STT support on macOS / Apple Silicon #3491 without touching existing behavior.

Option B remains a valid future improvement once more local backends exist and the abstraction is better motivated.

Why not Options C / D / E (transparent auto-detection)?

These approaches auto-select mlx_whisper silently without any config entry. The downside: users can't tell which backend is running, debugging transcription issues becomes harder, and behavior changes unexpectedly across environments. Explicit config (provider: mlx_whisper) with a clear auto-detect fallback on macOS is the better balance between convenience and transparency.

How to Test

Install mlx-whisper: pip install mlx-whisper
Set stt.provider: mlx_whisper in ~/.hermes/config.yaml
Send a voice message on any gateway (Telegram, Discord, etc.) — confirm transcription succeeds and logs show Transcribed … via mlx_whisper
Set model: turbo — confirm it resolves to mlx-community/whisper-large-v3-turbo
Set model: whisper-1 (OpenAI name footgun) — confirm it normalizes to mlx-community/whisper-base-mlx instead of failing
Remove explicit provider config on macOS with faster-whisper uninstalled — confirm mlx_whisper is auto-selected

Checklist

I've read the https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md
My commit messages follow https://www.conventionalcommits.org/
I searched for https://github.com/NousResearch/hermes-agent/pulls to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've considered cross-platform impact — mlx_whisper is macOS/Apple Silicon only; all other platforms are unaffected by graceful import detection and platform check in auto-detect

- Add mlx_whisper provider to _get_provider() with explicit config support and auto-detect on macOS when faster-whisper is absent - Add _transcribe_mlx_whisper() using mlx_whisper.transcribe() with HF repo IDs - Add MLX_MODEL_ALIASES dict normalizing short names (base, small, turbo, …) to mlx-community HF repo IDs; cloud model names (whisper-1 etc.) are also normalized so local/cloud config footgun is avoided - Auto-detect order: faster_whisper > mlx_whisper (macOS) > local_command > groq > openai Config example: stt: provider: mlx_whisper mlx_whisper: model: base # or small, large-v3, turbo, or full HF repo ID Closes NousResearch#3491

…Research#3498) - Add _HAS_MLX_WHISPER detection - Add MLX_MODEL_ALIASES mapping (tiny/base/small/medium/large-v3/turbo) - Add _normalize_mlx_model() function - Add _transcribe_mlx_whisper() transcription function - Add mlx_whisper branch in _get_provider() (explicit + auto-detect on macOS) - Remove fake faster_whisper module hack - Keep standard log format consistent with other providers

mechiland · 2026-04-22T01:50:27Z

I guess need to change https://github.com/dlkakbs/hermes-agent/blob/7c42a193a9c4611ec659a8ac49c50f10b3aa1fd7/tools/voice_mode.py#L741 as well?

if not stt_enabled:
        details_parts.append("STT provider: DISABLED in config (stt.enabled: false)")
    elif stt_provider == "local":
        details_parts.append("STT provider: OK (local faster-whisper)")
    elif stt_provider == "groq":
        details_parts.append("STT provider: OK (Groq)")
    elif stt_provider == "openai":
        details_parts.append("STT provider: OK (OpenAI)")
    else:
        details_parts.append(
            "STT provider: MISSING (pip install faster-whisper, "
            "or set GROQ_API_KEY / VOICE_TOOLS_OPENAI_KEY)"
        )

The upstream seems have been changed a lot btw.

…local_command, and mistral check_voice_requirements() reported MISSING for any provider other than local/groq/openai. Add status lines for mlx_whisper, local_command, and mistral to match the providers now handled in transcription_tools. Also extend the fallback hint to mention MISTRAL_API_KEY.

Keeps mlx_whisper provider alongside upstream additions (mistral, xai, _safe_find_spec, _normalize_local_command_model). Auto-detect order: local > mlx_whisper (macOS) > local_command > groq > openai > mistral > xai.

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription labels Apr 22, 2026

dlkakbs added 2 commits April 23, 2026 12:36

Merge upstream/main — resolve transcription_tools.py conflicts

4ca6a04

Keeps mlx_whisper provider alongside upstream additions (mistral, xai, _safe_find_spec, _normalize_local_command_model). Auto-detect order: local > mlx_whisper (macOS) > local_command > groq > openai > mistral > xai.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): add mlx_whisper as first-class local STT provider on macOS#3498

feat(stt): add mlx_whisper as first-class local STT provider on macOS#3498
dlkakbs wants to merge 3 commits intoNousResearch:mainfrom
dlkakbs:feat/mlx-whisper-stt-provider

dlkakbs commented Mar 28, 2026 •

edited

Loading

Uh oh!

mechiland commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dlkakbs commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes:

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Uh oh!

mechiland commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dlkakbs commented Mar 28, 2026 •

edited

Loading

mechiland commented Apr 22, 2026 •

edited

Loading