MaidWhisper

A system-level Select-to-Speak desktop reader for character voices powered by GPT-SoVITS.

Important

MaidWhisper is preparing for its first public beta. Features, packaging, and GPT-SoVITS integration may still change during beta testing.

Caution

An NVIDIA GPU is strongly recommended. GPT-SoVITS may start on some CPU-only systems, but synthesis is usually very slow and can feel unsuitable for daily reading. For the intended real-time Select-to-Speak experience, use an NVIDIA GPU with enough VRAM and a compatible CUDA / PyTorch environment.

What Is MaidWhisper?

MaidWhisper is a Windows Select-to-Speak desktop tool for reading selected text with character-based GPT-SoVITS voices.

Instead of opening a heavy WebUI every time, you select text in any app, press a global hotkey, and use a lightweight floating panel to choose the character, language, tone, speed, volume, and playback state.

Select text -> Global hotkey -> Language detection -> Optional LLM preprocessing -> Segmentation -> GPT-SoVITS synthesis -> Playback + lyrics sync

Why Use It?

Highlight	Description
System-level trigger	Works from browsers, readers, documents, chat apps, and most selectable text areas.
Floating control panel	Control character, language, tone, playback, speed, and volume without returning to a main window.
Easy character switching	Manage multiple characters, languages, tones, reference audio, and model files.
Daily-use workflow	Designed for repeated reading, not one-off manual generation.
No WebUI routine	GPT-SoVITS still generates the voice, while MaidWhisper handles capture, routing, segmentation, playback, and lyrics.
Optional AI preprocessing	Translate into the character language or rewrite text toward a configured character speaking style.

Preview

_{Select text, trigger MaidWhisper, and read with a character voice}

_{Translate with AI preprocessing, then read in the selected character language}

_{Floating playback panel}	_{Clickable lyrics and segment navigation}
_{Character model settings}	_{AI character style and translation}

Voice Demo

Demo	Audio
Character voice demo 1	DemoVoice_1.wav
Character voice demo 2	DemoVoice_2.wav
Character voice demo 3	DemoVoice_3.wav
Character voice demo 4	DemoVoice_4.wav
Character voice demo 5	DemoVoice_5.wav

Demo audio is provided only to show the reading workflow and voice output style. Actual results depend on your GPT-SoVITS model, reference audio, synthesis speed, and playback settings.

Requirements

Item	Requirement
OS	Windows 10 or later
GPU / VRAM	NVIDIA GPU recommended; CPU-only synthesis is usually very slow
GPT-SoVITS model	SoVITS `.pth`, GPT `.ckpt`, reference audio, and matching reference text
LLM provider	Optional; required only for AI character style or AI translation

Quick Start

Download the latest MaidWhisper-*-Setup.exe from GitHub Releases.
Run the installer.
If you do not already have GPT-SoVITS, enable the GPT-SoVITS installation option.
Wait for the download and extraction to finish. This usually takes about 5 to 10 minutes depending on network and disk speed.
Open MaidWhisper and go to Settings -> Characters.
Add a character, then add a language and tone configuration.
Select the .pth, .ckpt, reference audio, and reference text.
Select text in another app and press the default hotkey: Alt+Shift+M.
Choose the character, language, and tone from the floating panel, then press Play.

MaidWhisper starts the GPT-SoVITS API service when needed. It does not launch the training UI or the full GPT-SoVITS WebUI.

AI Character Style And Translation

LLM preprocessing is optional.

Feature	Description
AI character style	Rewrites text toward the speaking style configured for the selected character.
AI translation	Translates input text into the selected character language.
Smart skip	If the detected input language already matches the target language and no style rewrite is needed, MaidWhisper skips the LLM call.

API keys are stored in Windows Credential Manager instead of normal settings JSON.

Data And Privacy

MaidWhisper is primarily local. Settings, character metadata, model paths, generated audio cache, and playback preferences are stored in the user data folder.

Text is sent to an external LLM provider only when AI character style or AI translation is enabled. If those features are disabled, MaidWhisper can read text through the local GPT-SoVITS workflow without calling an external LLM service.

GPT-SoVITS Resources

Resource	Purpose
GPT-SoVITS official project	Installation, versions, and model format documentation
Official Windows package	Windows integrated package used by the setup flow
Hugging Face model search	Community models
ModelScope model search	Community models, especially Chinese resources
GPT-SoVITS Model Collection	Community model collection

Warning

Community models may have unclear licenses or unsafe files. Verify the source, license, model version, and usage rights before using them.

FAQ

Does MaidWhisper include character models?

No. MaidWhisper does not provide or host third-party character models, voices, or reference audio. You need to prepare compatible GPT-SoVITS assets yourself.

Can I use my existing GPT-SoVITS installation?

Yes. In Settings, point MaidWhisper to your existing .bat / .cmd launcher and GPT-SoVITS API URL.

Do I need a GPU?

CPU-only usage may work in some environments, but it is usually too slow for daily reading. An NVIDIA GPU is recommended.

Does MaidWhisper replace GPT-SoVITS WebUI?

No. MaidWhisper focuses on daily reading, hotkeys, floating controls, and playback. Training, slicing, labeling, and fine-tuning still belong in the GPT-SoVITS toolchain.

Where are API keys stored?

LLM API keys are stored in Windows Credential Manager. Normal settings JSON stores only non-sensitive provider and model settings.

Developer Documentation

Document	Description
Project Guide	Architecture overview and entry points
System Design	Pipeline contracts and restart strategy
Development Plan	Remaining milestone tasks
Release Checklist	Release preparation checklist
Release Acceptance Matrix	Manual acceptance cases
CI/CD and Windows Installer	Automated build and release workflow
2026-06-05 Analysis	Coupling, data separation, and beta risk review

AI Voice Generation Disclaimer

MaidWhisper is a local voice-generation workflow tool. It does not provide, host, or license third-party character models, voice assets, text content, or LLM services.

Users are responsible for ensuring they have the rights to use their models, reference audio, text, character names, and generated output. Do not use generated audio for impersonation, fraud, harassment, misinformation, or other harmful activity.

Support Development

Method	Address
USDT TRC-20	`TXmghw1R9QaQSKGWEWP9ydr1xj8G1MqHrL`
ETH ERC-20	`0xe7b06b924cbca4b922585a4ccd665cd2c3d0e02c`

License

MaidWhisper is released under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
client		client
doc		doc
packaging		packaging
server		server
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
README.zh-TW.md		README.zh-TW.md
SetupWizard.bat		SetupWizard.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaidWhisper

What Is MaidWhisper?

Why Use It?

Preview

Voice Demo

Requirements

Quick Start

AI Character Style And Translation

Data And Privacy

GPT-SoVITS Resources

FAQ

Does MaidWhisper include character models?

Can I use my existing GPT-SoVITS installation?

Do I need a GPU?

Does MaidWhisper replace GPT-SoVITS WebUI?

Where are API keys stored?

Developer Documentation

AI Voice Generation Disclaimer

Support Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MaidWhisper

What Is MaidWhisper?

Why Use It?

Preview

Voice Demo

Requirements

Quick Start

AI Character Style And Translation

Data And Privacy

GPT-SoVITS Resources

FAQ

Does MaidWhisper include character models?

Can I use my existing GPT-SoVITS installation?

Do I need a GPU?

Does MaidWhisper replace GPT-SoVITS WebUI?

Where are API keys stored?

Developer Documentation

AI Voice Generation Disclaimer

Support Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages