Skip to content

RyuuMeow/MaidWhisper

Repository files navigation

MaidWhisper Banner

MaidWhisper

A system-level Select-to-Speak desktop reader for character voices powered by GPT-SoVITS.

Latest Release License: GPL-3.0 Platform: Windows 10+ GPT-SoVITS Compatible Status

English · 繁體中文 · 简体中文


Important

MaidWhisper is preparing for its first public beta. Features, packaging, and GPT-SoVITS integration may still change during beta testing.

Caution

An NVIDIA GPU is strongly recommended. GPT-SoVITS may start on some CPU-only systems, but synthesis is usually very slow and can feel unsuitable for daily reading. For the intended real-time Select-to-Speak experience, use an NVIDIA GPU with enough VRAM and a compatible CUDA / PyTorch environment.


What Is MaidWhisper?

MaidWhisper is a Windows Select-to-Speak desktop tool for reading selected text with character-based GPT-SoVITS voices.

Instead of opening a heavy WebUI every time, you select text in any app, press a global hotkey, and use a lightweight floating panel to choose the character, language, tone, speed, volume, and playback state.

Select text -> Global hotkey -> Language detection -> Optional LLM preprocessing -> Segmentation -> GPT-SoVITS synthesis -> Playback + lyrics sync

Why Use It?

Highlight Description
System-level trigger Works from browsers, readers, documents, chat apps, and most selectable text areas.
Floating control panel Control character, language, tone, playback, speed, and volume without returning to a main window.
Easy character switching Manage multiple characters, languages, tones, reference audio, and model files.
Daily-use workflow Designed for repeated reading, not one-off manual generation.
No WebUI routine GPT-SoVITS still generates the voice, while MaidWhisper handles capture, routing, segmentation, playback, and lyrics.
Optional AI preprocessing Translate into the character language or rewrite text toward a configured character speaking style.

Preview

Japanese reading quick demo
Select text, trigger MaidWhisper, and read with a character voice
AI translation quick demo
Translate with AI preprocessing, then read in the selected character language
Floating playback panel
Floating playback panel
Lyrics panel
Clickable lyrics and segment navigation
Character model settings
Character model settings
LLM settings
AI character style and translation

Voice Demo

Demo Audio
Character voice demo 1 DemoVoice_1.wav
Character voice demo 2 DemoVoice_2.wav
Character voice demo 3 DemoVoice_3.wav
Character voice demo 4 DemoVoice_4.wav
Character voice demo 5 DemoVoice_5.wav

Demo audio is provided only to show the reading workflow and voice output style. Actual results depend on your GPT-SoVITS model, reference audio, synthesis speed, and playback settings.

Requirements

Item Requirement
OS Windows 10 or later
GPU / VRAM NVIDIA GPU recommended; CPU-only synthesis is usually very slow
GPT-SoVITS model SoVITS .pth, GPT .ckpt, reference audio, and matching reference text
LLM provider Optional; required only for AI character style or AI translation

Quick Start

  1. Download the latest MaidWhisper-*-Setup.exe from GitHub Releases.
  2. Run the installer.
  3. If you do not already have GPT-SoVITS, enable the GPT-SoVITS installation option.
  4. Wait for the download and extraction to finish. This usually takes about 5 to 10 minutes depending on network and disk speed.
  5. Open MaidWhisper and go to Settings -> Characters.
  6. Add a character, then add a language and tone configuration.
  7. Select the .pth, .ckpt, reference audio, and reference text.
  8. Select text in another app and press the default hotkey: Alt+Shift+M.
  9. Choose the character, language, and tone from the floating panel, then press Play.

MaidWhisper starts the GPT-SoVITS API service when needed. It does not launch the training UI or the full GPT-SoVITS WebUI.

AI Character Style And Translation

LLM preprocessing is optional.

Feature Description
AI character style Rewrites text toward the speaking style configured for the selected character.
AI translation Translates input text into the selected character language.
Smart skip If the detected input language already matches the target language and no style rewrite is needed, MaidWhisper skips the LLM call.

API keys are stored in Windows Credential Manager instead of normal settings JSON.

Data And Privacy

MaidWhisper is primarily local. Settings, character metadata, model paths, generated audio cache, and playback preferences are stored in the user data folder.

Text is sent to an external LLM provider only when AI character style or AI translation is enabled. If those features are disabled, MaidWhisper can read text through the local GPT-SoVITS workflow without calling an external LLM service.

GPT-SoVITS Resources

Resource Purpose
GPT-SoVITS official project Installation, versions, and model format documentation
Official Windows package Windows integrated package used by the setup flow
Hugging Face model search Community models
ModelScope model search Community models, especially Chinese resources
GPT-SoVITS Model Collection Community model collection

Warning

Community models may have unclear licenses or unsafe files. Verify the source, license, model version, and usage rights before using them.

FAQ

Does MaidWhisper include character models?

No. MaidWhisper does not provide or host third-party character models, voices, or reference audio. You need to prepare compatible GPT-SoVITS assets yourself.

Can I use my existing GPT-SoVITS installation?

Yes. In Settings, point MaidWhisper to your existing .bat / .cmd launcher and GPT-SoVITS API URL.

Do I need a GPU?

CPU-only usage may work in some environments, but it is usually too slow for daily reading. An NVIDIA GPU is recommended.

Does MaidWhisper replace GPT-SoVITS WebUI?

No. MaidWhisper focuses on daily reading, hotkeys, floating controls, and playback. Training, slicing, labeling, and fine-tuning still belong in the GPT-SoVITS toolchain.

Where are API keys stored?

LLM API keys are stored in Windows Credential Manager. Normal settings JSON stores only non-sensitive provider and model settings.

Developer Documentation

Document Description
Project Guide Architecture overview and entry points
System Design Pipeline contracts and restart strategy
Development Plan Remaining milestone tasks
Release Checklist Release preparation checklist
Release Acceptance Matrix Manual acceptance cases
CI/CD and Windows Installer Automated build and release workflow
2026-06-05 Analysis Coupling, data separation, and beta risk review

AI Voice Generation Disclaimer

MaidWhisper is a local voice-generation workflow tool. It does not provide, host, or license third-party character models, voice assets, text content, or LLM services.

Users are responsible for ensuring they have the rights to use their models, reference audio, text, character names, and generated output. Do not use generated audio for impersonation, fraud, harassment, misinformation, or other harmful activity.

Support Development

Method Address
USDT TRC-20 TXmghw1R9QaQSKGWEWP9ydr1xj8G1MqHrL
ETH ERC-20 0xe7b06b924cbca4b922585a4ccd665cd2c3d0e02c

License

MaidWhisper is released under the GNU General Public License v3.0.