Skip to content

feat(stt): add mlx_whisper as first-class local STT provider on macOS#3498

Open
dlkakbs wants to merge 3 commits intoNousResearch:mainfrom
dlkakbs:feat/mlx-whisper-stt-provider
Open

feat(stt): add mlx_whisper as first-class local STT provider on macOS#3498
dlkakbs wants to merge 3 commits intoNousResearch:mainfrom
dlkakbs:feat/mlx-whisper-stt-provider

Conversation

@dlkakbs
Copy link
Copy Markdown
Contributor

@dlkakbs dlkakbs commented Mar 28, 2026

What does this PR do?

Adds mlx_whisper as a first-class local STT provider for macOS / Apple Silicon users (Option A from #3491).

Previously, using MLX Whisper required a manual wrapper script mapped through HERMES_LOCAL_STT_COMMAND. This PR removes that friction: users can now set provider: mlx_whisper directly in config, use short model aliases, and get automatic provider selection on macOS when faster-whisper is not installed.

Changes:

  • _HAS_MLX_WHISPER detection at import time (graceful degradation, no hard dependency)
  • MLX_MODEL_ALIASES dict normalizing short names (base, small, large-v3, turbo, …) to mlx-community HF repo IDs — cloud model names like whisper-1 are also normalized so misconfigured local setups don't silently pass an invalid model name
  • _normalize_mlx_model() — handles OpenAI/Groq model name footgun for the mlx backend
  • _transcribe_mlx_whisper() — thin wrapper around mlx_whisper.transcribe()
  • _get_provider() — explicit mlx_whisper support + auto-detect on macOS after faster-whisper
  • transcribe_audio() — wired to new provider

Config example:
stt:
provider: mlx_whisper
mlx_whisper:
model: base # tiny | base | small | medium | large-v3 | turbo
# or a full HF repo ID: mlx-community/whisper-large-v3-mlx

Auto-detect order (no explicit config):
faster_whisper → mlx_whisper (macOS only) → local_command → groq → openai

Related Issue

Closes #3491

Type of Change

  • New feature (non-breaking change that adds functionality)

Changes Made

  • tools/transcription_tools.py — all changes above; no existing provider logic touched

Why not Option B (generic local backend abstraction)?

Option B proposes restructuring provider: local into a two-level schema (provider: local + local.backend: mlx_whisper). This was intentionally deferred for several reasons:

  1. Breaking config change. Existing users with provider: local would need to migrate. The current schema is already documented and in use.
  2. Scope creep. Introducing a backend abstraction layer requires reworking _get_provider(), the config schema, and all downstream dispatch logic — a significantly larger and riskier change for what is effectively one new backend.
  3. Premature abstraction. There are currently two local backends (faster_whisper, mlx_whisper). A shared abstraction layer is only warranted when there are enough backends to justify the added indirection — and when the shape of those backends is stable enough to abstract correctly.
  4. Option A is sufficient. Adding mlx_whisper as a peer provider (same level as local, groq, openai) covers all the acceptance criteria in Feature: first-class MLX Whisper local STT support on macOS / Apple Silicon #3491 without touching existing behavior.

Option B remains a valid future improvement once more local backends exist and the abstraction is better motivated.

Why not Options C / D / E (transparent auto-detection)?

These approaches auto-select mlx_whisper silently without any config entry. The downside: users can't tell which backend is running, debugging transcription issues becomes harder, and behavior changes unexpectedly across environments. Explicit config (provider: mlx_whisper) with a clear auto-detect fallback on macOS is the better balance between convenience and transparency.

How to Test

  1. Install mlx-whisper: pip install mlx-whisper
  2. Set stt.provider: mlx_whisper in ~/.hermes/config.yaml
  3. Send a voice message on any gateway (Telegram, Discord, etc.) — confirm transcription succeeds and logs show Transcribed … via mlx_whisper
  4. Set model: turbo — confirm it resolves to mlx-community/whisper-large-v3-turbo
  5. Set model: whisper-1 (OpenAI name footgun) — confirm it normalizes to mlx-community/whisper-base-mlx instead of failing
  6. Remove explicit provider config on macOS with faster-whisper uninstalled — confirm mlx_whisper is auto-selected

Checklist

- Add mlx_whisper provider to _get_provider() with explicit config support
  and auto-detect on macOS when faster-whisper is absent
- Add _transcribe_mlx_whisper() using mlx_whisper.transcribe() with HF repo IDs
- Add MLX_MODEL_ALIASES dict normalizing short names (base, small, turbo, …)
  to mlx-community HF repo IDs; cloud model names (whisper-1 etc.) are also
  normalized so local/cloud config footgun is avoided
- Auto-detect order: faster_whisper > mlx_whisper (macOS) > local_command > groq > openai

Config example:
  stt:
    provider: mlx_whisper
    mlx_whisper:
      model: base   # or small, large-v3, turbo, or full HF repo ID

Closes NousResearch#3491
CHANTXU64 pushed a commit to CHANTXU64/hermes-agent that referenced this pull request Apr 20, 2026
…Research#3498)

- Add _HAS_MLX_WHISPER detection
- Add MLX_MODEL_ALIASES mapping (tiny/base/small/medium/large-v3/turbo)
- Add _normalize_mlx_model() function
- Add _transcribe_mlx_whisper() transcription function
- Add mlx_whisper branch in _get_provider() (explicit + auto-detect on macOS)
- Remove fake faster_whisper module hack
- Keep standard log format consistent with other providers
@mechiland
Copy link
Copy Markdown

mechiland commented Apr 22, 2026

I guess need to change https://github.com/dlkakbs/hermes-agent/blob/7c42a193a9c4611ec659a8ac49c50f10b3aa1fd7/tools/voice_mode.py#L741 as well?

if not stt_enabled:
        details_parts.append("STT provider: DISABLED in config (stt.enabled: false)")
    elif stt_provider == "local":
        details_parts.append("STT provider: OK (local faster-whisper)")
    elif stt_provider == "groq":
        details_parts.append("STT provider: OK (Groq)")
    elif stt_provider == "openai":
        details_parts.append("STT provider: OK (OpenAI)")
    else:
        details_parts.append(
            "STT provider: MISSING (pip install faster-whisper, "
            "or set GROQ_API_KEY / VOICE_TOOLS_OPENAI_KEY)"
        )

The upstream seems have been changed a lot btw.

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription labels Apr 22, 2026
dlkakbs added 2 commits April 23, 2026 12:36
…local_command, and mistral

check_voice_requirements() reported MISSING for any provider other than
local/groq/openai. Add status lines for mlx_whisper, local_command, and
mistral to match the providers now handled in transcription_tools.
Also extend the fallback hint to mention MISTRAL_API_KEY.
Keeps mlx_whisper provider alongside upstream additions (mistral, xai,
_safe_find_spec, _normalize_local_command_model). Auto-detect order:
local > mlx_whisper (macOS) > local_command > groq > openai > mistral > xai.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: first-class MLX Whisper local STT support on macOS / Apple Silicon

3 participants