🐱 Kitten TTS Studio

أداة توليد صوت بشري محلية 100% مع استنساخ الصوت Full-stack offline TTS app — Android · Web · Windows · Linux

📁 Project Structure

KittenTTS-Studio/
│
├── core/                        ← Shared engine (all platforms)
│   ├── __init__.py
│   ├── tts_engine.py            ← KittenTTS ONNX wrapper + voice cloning
│   ├── voice_cloner.py          ← Resemblyzer embedding extraction
│   ├── audio_utils.py           ← Format conversion, waveform helpers
│   └── download_models.py       ← HuggingFace model downloader
│
├── flet_app/                    ← Flet cross-platform app
│   ├── main.py                  ← Full chat UI (Android / Web / Desktop)
│   ├── requirements.txt
│   └── pubspec.yaml             ← Flutter build config (Android APK)
│
├── pyqt5_app/                   ← PyQt5 desktop app (Windows / Linux)
│   ├── main_qt.py               ← Full desktop UI with audio player
│   ├── requirements_qt.txt
│   └── build_scripts/
│       ├── build_exe.py         ← PyInstaller → .exe / Linux binary
│       ├── build_deb.sh         ← fpm → .deb package
│       └── build_appimage.sh    ← appimagetool → .AppImage
│
├── models/                      ← Downloaded ONNX models (auto-created)
│   ├── nano_int8/               ← ~25 MB fastest
│   ├── nano_fp32/               ← ~56 MB balanced
│   ├── micro/                   ← ~41 MB good quality
│   └── mini/                    ← ~80 MB best quality
│
├── custom_voices/               ← Your cloned voices (.npz files)
├── assets/                      ← Icons, fonts
├── requirements.txt             ← Unified deps (all platforms)
└── setup.py

⚡ Quick Start

1. Install dependencies

# Clone this repo or extract the zip
cd KittenTTS-Studio

# Install the core package (editable)
pip install -e ".[all]" --break-system-packages

# Or install only what you need:
pip install -e ".[flet]"   # Flet app only
pip install -e ".[qt]"     # PyQt5 app only

2. Download models

# Download all 4 models (one-time, ~200 MB total)
python core/download_models.py

# Or download a specific model only:
python core/download_models.py --model nano_int8   # fastest (~25 MB)
python core/download_models.py --model mini        # best quality (~80 MB)

# List available models and their status:
python core/download_models.py --list

3. Run the Flet app (cross-platform)

cd flet_app
python main.py

Opens as a desktop window. Use --web for browser mode:

flet run main.py --web --port 8080

4. Run the PyQt5 desktop app (Windows / Linux)

cd pyqt5_app
python main_qt.py

Test

📱 Build Android APK

Prerequisites

# Install Flutter SDK: https://flutter.dev/docs/get-started/install
# Install Flet CLI
pip install flet --break-system-packages

# Verify
flet --version
flutter doctor

Build APK

cd flet_app

# One-time: download models BEFORE building (they'll be bundled inside APK)
python ../core/download_models.py --model nano_int8   # recommended for mobile

# Copy models into assets so Flet bundles them
mkdir -p assets/models
cp -r ../models/nano_int8 assets/models/

# Build APK
flet build apk --project kitten_tts

# Output: build/apk/kitten_tts.apk

On-device model path

In the Flet app, models are loaded from:

Bundled (in APK): assets/models/ — accessible via flet.app.assets_dir
Downloaded at first run: stored in app's internal storage

Adjust MODELS_DIR in core/tts_engine.py for runtime path detection:

import flet as ft

def get_models_dir() -> Path:
    """Resolve correct models path on any platform."""
    if hasattr(ft, 'app') and ft.app.assets_dir:
        return Path(ft.app.assets_dir) / "models"
    return Path(__file__).parent.parent / "models"

🪟 Build Windows .exe

# Install PyInstaller
pip install pyinstaller --break-system-packages

# Build (from project root)
python pyqt5_app/build_scripts/build_exe.py

# For folder bundle (easier to debug):
python pyqt5_app/build_scripts/build_exe.py --onedir

# Output: dist/KittenTTS.exe  (or dist/KittenTTS/ folder)

🐧 Build Linux .deb (Debian / Ubuntu)

# Step 1: Build the PyInstaller bundle first
python pyqt5_app/build_scripts/build_exe.py --onedir

# Step 2: Install fpm (requires Ruby)
gem install fpm

# Step 3: Build .deb
bash pyqt5_app/build_scripts/build_deb.sh

# Output: dist/kittentts_1.0.0_amd64.deb
# Install:
sudo dpkg -i dist/kittentts_1.0.0_amd64.deb

🐧 Build Linux .AppImage (portable, no install needed)

# Step 1: Build PyInstaller bundle
python pyqt5_app/build_scripts/build_exe.py --onedir

# Step 2: Download appimagetool
wget -O ~/bin/appimagetool \
  "https://github.com/AppImage/AppImageKit/releases/download/continuous/appimagetool-x86_64.AppImage"
chmod +x ~/bin/appimagetool

# Step 3: Build AppImage
bash pyqt5_app/build_scripts/build_appimage.sh

# Output: dist/KittenTTS-1.0.0-x86_64.AppImage
# Run:
chmod +x dist/KittenTTS-1.0.0-x86_64.AppImage
./dist/KittenTTS-1.0.0-x86_64.AppImage

🎙 Voice Cloning — Step by Step

Method A — From the UI

Open the app → tap ⚙ Settings → Clone New Voice
Pick a WAV file (minimum 5 seconds of clear speech)
Enter a name (e.g. "صوتي العربي")
Click Extract & Save — the embedding is computed locally
The new voice appears immediately in the Voice list

Method B — Python script

from core.voice_cloner import extract_embedding, save_voice

# Extract 256-dim speaker embedding using Resemblyzer
emb = extract_embedding("my_audio.wav")

# Save to custom_voices/my_voice.npz
save_voice("my_voice", emb)

print(f"Embedding shape: {emb.shape}")  # (256,)

Method C — Generate TTS with a custom voice directly

from core.tts_engine import TTSEngine

engine = TTSEngine()
engine.load_model("nano_int8")

# Generate with a default voice
audio = engine.generate("Hello, world!", voice="Bella")

# Generate with a cloned voice (auto-loaded from custom_voices/)
audio = engine.generate("Hello, world!", voice="my_voice")

engine.save_wav(audio, "output.wav")

🧠 Architecture Overview

User text
    ↓
TTSEngine.generate()
    ├─ chunk_text()         — split long text into ≤400 char chunks
    ├─ phonemizer           — English → IPA phonemes (espeak-ng)
    ├─ TextCleaner          — phonemes → token IDs
    ├─ _get_style()         ─┬─ DEFAULT voices → voices.npz lookup
    │                        └─ CUSTOM voices  → _adapt_embedding()
    └─ ONNX session.run()   — CPU inference
         ↓
    numpy float32 audio @ 24 kHz
         ↓
    AudioUtils.save_audio() — WAV / MP3 / OGG with optional resampling

🎛 Available Models

Key	Name	Size	Quality	Speed
`nano_int8`	Nano INT8	~25 MB	★★☆☆☆	Fastest ⚡
`nano_fp32`	Nano FP32	~56 MB	★★★☆☆	Fast
`micro`	Micro	~41 MB	★★★★☆	Balanced
`mini`	Mini	~80 MB	★★★★★	Best 🏆

All models run on CPU only (onnxruntime — no GPU required).

🎤 Default Voices

Bella · Jasper · Luna · Bruno · Rosie · Hugo · Kiki · Leo

Plus any voices you clone and save in custom_voices/.

🔧 System Requirements

Platform	Min. RAM	Storage	OS
Android	2 GB	200 MB	Android 8+
Windows	4 GB	300 MB	Windows 10/11 x64
Linux	4 GB	300 MB	Ubuntu 20.04+ / Arch
Web	—	server-side	Any modern browser

Dependencies:

Python ≥ 3.10
espeak-ng (installed automatically via espeakng-loader)
ffmpeg (optional, required for MP3/OGG export via pydub)

❓ Troubleshooting

Problem	Solution
`espeak` not found	`pip install espeakng-loader` or `sudo apt install espeak-ng`
Model not found	Run `python core/download_models.py`
`resemblyzer` install fails	`pip install resemblyzer --no-build-isolation`
MP3 export fails	Install ffmpeg: `sudo apt install ffmpeg`
APK crashes on launch	Ensure `nano_int8` model is in `assets/models/nano_int8/`
PyQt5 display issues on Linux	`export QT_QPA_PLATFORM=xcb`

📄 License

MIT — KittenML/KittenTTS engine · Studio wrapper by Abdo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐱 Kitten TTS Studio

📁 Project Structure

⚡ Quick Start

1. Install dependencies

2. Download models

3. Run the Flet app (cross-platform)

4. Run the PyQt5 desktop app (Windows / Linux)

Test

📱 Build Android APK

Prerequisites

Build APK

On-device model path

🪟 Build Windows .exe

🐧 Build Linux .deb (Debian / Ubuntu)

🐧 Build Linux .AppImage (portable, no install needed)

🎙 Voice Cloning — Step by Step

Method A — From the UI

Method B — Python script

Method C — Generate TTS with a custom voice directly

🧠 Architecture Overview

🎛 Available Models

🎤 Default Voices

🔧 System Requirements

❓ Troubleshooting

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
core		core
flet_app		flet_app
pyqt5_app		pyqt5_app
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

🐱 Kitten TTS Studio

📁 Project Structure

⚡ Quick Start

1. Install dependencies

2. Download models

3. Run the Flet app (cross-platform)

4. Run the PyQt5 desktop app (Windows / Linux)

Test

📱 Build Android APK

Prerequisites

Build APK

On-device model path

🪟 Build Windows .exe

🐧 Build Linux .deb (Debian / Ubuntu)

🐧 Build Linux .AppImage (portable, no install needed)

🎙 Voice Cloning — Step by Step

Method A — From the UI

Method B — Python script

Method C — Generate TTS with a custom voice directly

🧠 Architecture Overview

🎛 Available Models

🎤 Default Voices

🔧 System Requirements

❓ Troubleshooting

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages