leweex95 · leweex95 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026 · Feb 25, 2026
@@ -9,6 +9,15 @@ repos:
       - id: check-merge-conflict
       - id: debug-statements
 
+  - repo: https://github.com/PyCQA/autoflake
+    rev: v2.3.1
+    hooks:
+      - id: autoflake
+        args:
+          - --in-place
+          - --remove-unused-variables
+          - --remove-all-unused-imports
+
   - repo: https://github.com/pycqa/flake8
     rev: 7.0.0
     hooks:

@@ -0,0 +1,27 @@
+# Performance and Benchmarks
+
+VoiceGenHub is designed for both local CPU-only systems and GPU-accelerated environments.
+
+## Performance Comparison (Single Job)
+
+| Provider | Quality (MOS) | Startup Time | Sequential (per req) | Async (3x parallel) | Model Size | Commercial |
+|----------|---------------|--------------|---------------------|-------------------|------------|------------|
+| **Edge TTS** | 3.8/5 | 4.9s | 3.2s | 2.5s | 0MB (cloud) | ✅ Free |
+| **Kokoro** | 3.5/5 | 94s | 14.2s | 2.5s | 625MB | ✅ Apache 2.0 |
+| **Bark** | 4.2/5 | 180s | 25-40s | 8-12s | 4GB | ✅ MIT |
+| **Chatterbox** | 4.3/5 | 120s | 15-30s | 5-15s | 3.7GB | ✅ MIT |
+| **ElevenLabs** | 4.5/5* | 2s | 3-5s | 2-3s | 0MB (cloud) | ⚠️ Paid API |
+
+*ElevenLabs quality estimate based on reputation; not yet tested.*
+
+## Concurrency Analysis (Chatterbox)
+
+- **Memory Safety**: Chatterbox uses a **shared model instance** (3.6GB) across all threads — **no duplication**.
+- **Performance**: ~2.8x speedup at 4 threads on CPU. Optimal thread count: **2-4 threads**.
+- **Async Concurrency**: Safe to use 2-8 concurrent threads without OOM risk.
+
+## [View Concurrency Plot](assets/concurrency_plot.html)
+Interactive performance analysis showing speedup curves, memory usage, and timing breakdowns.
+
+---
+*For more details on Kaggle GPU benchmarks, see the remote GPU documentation.*
@@ -0,0 +1,51 @@
+# Voice Cloning and Design
+
+VoiceGenHub supports both zero-shot voice cloning (from audio samples) and voice design (from textual descriptions).
+
+## 1. Voice Cloning with [Chatterbox](https://github.com/rsxdalv/chatterbox)
+
+### Steps
+
+1.  **Generate a Reference Audio** (or use an existing sample):
+    ```bash
+    voicegenhub synthesize "Sample text for cloning." \
+        --provider kokoro \
+        --voice kokoro-am_michael \
+        --output reference.wav
+    ```
+
+2.  **Clone the Voice**:
+    ```bash
+    voicegenhub synthesize "Your text to be synthesized in the cloned voice." \
+        --provider chatterbox \
+        --audio-prompt reference.wav \
+        --output cloned_voice.wav
+    ```
+
+3.  **Adjust Emotion and Style**:
+    ```bash
+    voicegenhub synthesize "Your text." \
+        --provider chatterbox \
+        --audio-prompt reference.wav \
+        --exaggeration 0.8 \
+        --cfg-weight 0.7
+    ```
+
+### Tips for Better Quality
+-   Use clear, noise-free reference audio (5-10 seconds recommended).
+-   Chatterbox supports **multilingual cloning** (clone any language, synthesize in any other language).
+
+## 2. Voice Design with [Qwen 3 TTS](https://github.com/QwenLM/Qwen3-TTS)
+
+*Requires `Qwen3-TTS-VoiceDesign` model for full control, available via Python API or remote GPU.*
+
+### Qwen 3 TTS Voice Design Features
+
+-   **Natural Language Instruction**: Design custom voices using descriptions.
+-   **Example Voice Design**:
+    -   `"Female, 25 years old, cheerful and energetic, slightly high-pitched with playful intonation"`
+    -   `"Male, 17 years old, gaining confidence, deeper breath support, vowels tighten when nervous"`
+    -   `"Elderly male, 70 years old, wise and gentle, slightly raspy with warm timbre"`
+
+---
+*For more details on Qwen 3 TTS design modes, see the [Qwen 3 TTS documentation](https://github.com/QwenLM/Qwen3-TTS).*
@@ -0,0 +1,66 @@
+# Installation and Requirements
+
+Detailed installation guide for various TTS providers and optional features.
+
+## Basic Installation
+
+```bash
+pip install voicegenhub
+```
+
+## Optional Provider Dependencies
+
+To use certain providers, you need to install their respective dependencies:
+
+```bash
+# Kokoro TTS (Lightweight, self-hosted)
+pip install voicegenhub[kokoro]
+
+# Bark TTS (High Quality, MIT)
+pip install voicegenhub[bark]
+
+# Chatterbox TTS (High Quality, MIT)
+pip install chatterbox-tts
+
+# Qwen 3 TTS (State-of-the-Art, Apache 2.0)
+pip install voicegenhub[qwen]
+
+# ElevenLabs TTS (Commercial)
+pip install elevenlabs
+```
+
+---
+
+## 2. Dependencies
+
+### Voice Cloning Requirements (Chatterbox)
+
+For voice cloning features with Chatterbox TTS:
+
+```bash
+pip install voicegenhub[voice-cloning]
+```
+
+**System Requirements:**
+- **FFmpeg**: Required when `torchcodec` is installed for voice cloning.
+- **PyTorch**: Required for local model execution.
+
+**Windows Installations**: Download the "full-shared" FFmpeg build from [ffmpeg.org](https://ffmpeg.org/download.html#build-windows) and add the `bin` directory to your system PATH.
+
+---
+
+## Technical Note: CUDA and CPU Execution
+
+- VoiceGenHub automatically detects if a GPU is available.
+- For **Chatterbox** and **Bark**, if no GPU is found, the library will fall back to **CPU execution**.
+- For **Qwen 3 TTS**, high-quality models (1.7B) are recommended for **GPU acceleration** (remote or local).
+
+---
+
+## Windows & Python 3.13+ (Kokoro)
+
+On Windows with Python 3.13+, **Kokoro TTS** may require Microsoft Visual C++ Build Tools for compilation if pre-built wheels are not available.
+
+1.  Download [Microsoft Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/).
+2.  Select "Desktop development with C++" workload.
+3.  Restart terminal and retry installation.
@@ -0,0 +1,52 @@
+# Kaggle Remote GPU Generation
+
+Generate high-quality Qwen3-TTS audio using remote Kaggle GPUs (P100 or T4x2). This is useful for high-quality 1.7B models when you don't have a local GPU.
+
+## Prerequisites
+
+1.  **Kaggle API Credentials**:
+    -   Go to [Kaggle Settings](https://www.kaggle.com/settings) → API → Create New Token.
+    -   Save the `kaggle.json` to `~/.kaggle/kaggle.json` (on Windows: `%USERPROFILE%\.kaggle\kaggle.json`).
+2.  **Kaggle CLI**:
+    ```bash
+    pip install kaggle
+    ```
+3.  **Kaggle Internet Access**:
+    -   Ensure your Kaggle account has phone verification completed (allows internet access in kernels).
+
+## Usage
+
+Use the `--gpu` flag with the `synthesize` command to trigger remote generation.
+
+### P100 GPU (default)
+
+```bash
+voicegenhub synthesize "Hello from the remote P100!" --gpu
+```
+
+### T4 x 2 GPU
+
+```bash
+voicegenhub synthesize "Hello from the remote T4!" --gpu --gpu-type t4
+```
+
+### Advanced Usage
+
+```bash
+voicegenhub synthesize "Chinese test." \
+    --gpu \
+    --gpu-type p100 \
+    --voice Serena \
+    --language zh \
+    --output ./remote_output/serena.wav
+```
+
+## How It Works
+
+1.  **Automation**: VoiceGenHub generates a Jupyter notebook cell-by-cell.
+2.  **Deployment**: It pushes the notebook to Kaggle using the specified accelerator (`nvidia-p100-1` or `nvidia-t4-2`).
+3.  **Execution**: On Kaggle, the notebook installs necessary dependencies (`transformers`, `qwen-tts`), loads the model onto the GPU, and generates the audio.
+4.  **Syncing**: The CLI polls for completion and automatically downloads the generated `.wav` file into a local timestamped directory (or your specified output path).
+
+---
+*Note: Remote generation takes approximately 2-4 minutes due to environment setup on Kaggle's side.*
@@ -0,0 +1,19 @@
+# Licensing and Commercial Usage
+
+VoiceGenHub is compatible with multiple free and commercial TTS licenses.
+
+## Commercially Safe Models (summary)
+-   **Bark** (MIT License) - Unrestricted commercial use, no attribution required.
+-   **Chatterbox** (MIT License) - Unrestricted commercial use, no attribution required.
+-   **Qwen 3 TTS** (Apache 2.0) - Commercial use allowed, attribution required.
+-   **Kokoro** (Apache 2.0) - Commercial use allowed, attribution required.
+-   **Edge TTS** (Microsoft) - Commercial use allowed.
+-   **ElevenLabs** (Paid API) - Commercial use with valid subscription.
+
+### Provider Licenses (links)
+-   **Edge TTS (Microsoft)**: [Microsoft Terms of Use](https://www.microsoft.com/en-us/legal/terms-of-use)
+-   **Kokoro TTS**: [Apache License 2.0](https://github.com/hexgrad/kokoro/blob/main/LICENSE)
+-   **ElevenLabs TTS**: [ElevenLabs Terms of Service](https://elevenlabs.io/terms)
+-   **Bark TTS**: [MIT License](https://github.com/suno-ai/bark/blob/main/LICENSE)
+-   **Chatterbox TTS**: [MIT License](https://github.com/rsxdalv/chatterbox/blob/main/LICENSE)
+-   **Qwen 3 TTS**: [Apache License 2.0](https://github.com/QwenLM/Qwen3-TTS/blob/main/LICENSE)
@@ -0,0 +1,51 @@
+# TTS Providers Detail
+
+VoiceGenHub supports multiple free and commercial TTS providers.
+
+## [Chatterbox TTS](https://github.com/rsxdalv/chatterbox) (MIT)
+Multilingual TTS with emotion control and voice cloning.
+
+### Features
+- **Model selection via voice**: Choose between standard, turbo, or multilingual models.
+- Emotion/intensity control with `exaggeration` parameter (0.0-1.0).
+- Zero-shot voice cloning from audio samples.
+- Built-in Perth watermarking for responsible AI.
+
+### Supported Languages
+ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh
+
+---
+
+## [Qwen 3 TTS](https://github.com/QwenLM/Qwen3-TTS) (Apache 2.0)
+State-of-the-art multilingual TTS with voice design and cloning.
+
+### Features
+- **Three generation modes**: CustomVoice, VoiceDesign, VoiceClone.
+- **10 languages**: Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish.
+- **Native speakers**: Automatic selection of native speakers per language.
+- **Ultra-low latency**: Streaming generation supported.
+
+---
+
+## [Bark TTS](https://github.com/suno-ai/bark) (MIT)
+Self-hosted high-naturalness TTS with prosody control.
+
+### Features
+- Prosody markers: `[laughs]`, `[sighs]`, `[pause]`, `[whisper]`.
+- 100+ speaker presets.
+- Sound effects generation.
+
+---
+
+## [Kokoro TTS](https://github.com/hexgrad/kokoro) (Apache 2.0)
+Self-hosted, extremely lightweight and fast.
+
+---
+
+## [Microsoft Edge TTS](https://github.com/rany2/edge-tts) (Free Cloud)
+Fast, high-quality cloud-based voices.
+
+---
+
+## [ElevenLabs TTS](https://elevenlabs.io) (Commercial)
+Premium high-quality voices (requires API key).
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "voicegenhub"
-version = "1.1.5"
+version = "2.0.0"
 description = "Simple Text-to-Speech library supporting multiple providers"
 authors = ["leweex95 <csibi.levente14@gmail.com>"]
 readme = "README.md"