LightTTS

⚡ Lightning-Fast Text-to-Speech Inference & Service Framework

LightTTS is a lightweight and high-performance text-to-speech (TTS) inference and service framework based on Python. It supports CosyVoice2 and CosyVoice3 models, built upon the CosyVoice architecture and LightLLM framework, with optimizations to support fast, scalable, and service-ready TTS deployment.

✨ Features

🚀 Optimized LLM Inference: The language model part of the TTS pipeline is accelerated using techniques from lightllm and supports high-throughput batch inference
🧩 Shared Memory Timbre Manager with LRU: Manages speaker/timbre embeddings in shared memory for fast access and minimal recomputation
🧱 Modular Architecture (Encode–LLM–Decode): Refactored from LightLLM into three decoupled modules—Encoder, LLM, and Decoder—each running as separate processes for efficient task parallelism and scalability.
🌐 Service Ready and Easy Integration: Comes with an HTTP API for fast deployment and simple APIs for integration into other Python or web projects
🔄 Bi-streaming Mode via WebSocket: Supports interactive bi-directional streaming using WebSocket for low-latency, real-time TTS communication

⚡️ Get Started

Installation

(Option 1 Recommended) Run with Docker

# The easiest way to install LightTTS is by using the official image. You can directly pull and run the official image
docker pull lighttts/light-tts:latest

# Or you can manually build the image
docker build -t light-tts:latest .

# Run the image
docker run -it --gpus all -p 8080:8080 --shm-size 4g -v your_local_path:/data/ light-tts:latest /bin/bash

(Option 2) Install from Source

# Clone the repo
git clone --recursive https://github.com/ModelTC/LightTTS.git
cd LightTTS
# If you failed to clone the submodule due to network failures, please run the following command until success
# cd LightTTS
# git submodule update --init --recursive

# (Recommended) Create a new conda environment
conda create -n light-tts python=3.10
conda activate light-tts

# Install dependencies (We use the latest torch==2.9.1, but other versions are also compatible)
pip install -r requirements.txt

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

Model Download

We now support CosyVoice2 and CosyVoice3 models.

# ModelScope SDK model download (SDK模型下载)
from modelscope import snapshot_download
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')

# For overseas users, HuggingFace SDK model download
from huggingface_hub import snapshot_download
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
snapshot_download('FunAudioLLM/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
snapshot_download('FunAudioLLM/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')

(We have already installed the ttsfrd package in the docker image. If you are using docker image, you can skip this installation) For better text normalization performance, you can optionally install the ttsfrd package and unzip its resources. This step is not required — if skipped, the system will fall back to WeTextProcessing by default.

cd pretrained_models/CosyVoice-ttsfrd/
unzip resource.zip -d .
pip install ttsfrd_dependency-0.1-py3-none-any.whl
pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl

Start the Model Service

Note: It is recommended to enable the load_trt parameter for acceleration. The default flow precision is fp16 for CosyVoice2 and fp32 for CosyVoice3.

For CosyVoice2:

python -m light_tts.server.api_server --model_dir ./pretrained_models/CosyVoice2-0.5B

For CosyVoice3:

python -m light_tts.server.api_server --model_dir ./pretrained_models/Fun-CosyVoice3-0.5B-2512

With custom data type (float32, bfloat16, or float16; default: float16):

# Use float32 for better accuracy or float16 for faster speed
python -m light_tts.server.api_server --model_dir ./pretrained_models/Fun-CosyVoice3-0.5B-2512 --data_type float32

Available Parameters:

The default values are usually the fastest and generally do not need to be adjusted. If you need to customize them, please refer to the following parameter descriptions:

load_trt: Whether to load the flow_decoder in TensorRT mode (default: True).
data_type: The data type for LLM inference (default: float16)
load_jit: Whether to load the flow_encoder in JIT mode (default: False).
max_total_token_num: LLM arg, total token count the GPU and model can support = max_batch * (input_len + output_len) (default: 64 * 1024)
max_req_total_len: LLM arg, maximum value for req_input_len + req_output_len (default: 32768, matches max_position_embeddings)
graph_max_len_in_batch: Maximum sequence length for CUDA graph capture in decoding stage (default: 32768)
graph_max_batch_size: Maximum batch size for CUDA graph capture in decoding stage (default: 16)

For more parameters, see light_tts/server/api_cli.py

Wait for the service to initialize. The default address is http://localhost:8080.

Request Examples

Once the service is running, you can interact with it through the HTTP API. We support three modes: non-streaming, streaming, and bi-streaming.

Non-streaming and Streaming: Use test/test_zero_shot.py for examples, which prints metrics such as RTF (Real-Time Factor) and TTFT (Time To First Token)
Bi-streaming: Uses WebSocket interface. See usage examples in test/test_bistream.py

📊 Performance Benchmarks

We have conducted performance benchmarks on different GPU configurations to demonstrate the throughput and latency characteristics of LightTTS in streaming mode.

Model: Fun-CosyVoice3-0.5B-2512 datatype: float16

NVIDIA GeForce RTX 4090D

non-stream: test/test_zs_speed.py

num_workers	cost time 50%	cost time 90%	cost time 99%	rtf 50%	rtf 90%	rtf 99%	avg rtf	total_cost_time	qps
1	0.61	1.09	1.51	0.13	0.16	0.22	0.13	33.95	1.47
2	0.8	1.24	1.71	0.15	0.22	0.25	0.16	21.46	2.33
4	1.02	1.88	2.27	0.22	0.29	0.38	0.23	15.31	3.27
8	1.76	2.36	3.48	0.33	0.49	0.62	0.36	12.18	4.1

stream: test/test_zs_stream.py

num_workers	cost time 50%	cost time 90%	cost time 99%	ttft 50%	ttft 90%	ttft 99%	rtf 50%	rtf 90%	rtf 99%	avg rtf	total_cost_time	qps
1	1.01	2.15	2.82	0.33	0.34	0.9	0.21	0.25	0.34	0.22	60.13	0.83
2	1.83	3.56	5.16	0.93	1.53	2.3	0.34	0.63	0.81	0.4	52.47	0.95
4	3.43	5.76	7.31	2.62	4.37	5.8	0.7	1.28	2.16	0.81	48.74	1.03
8	7.27	10.01	10.45	6.4	8.55	9.03	1.28	2.67	3.66	1.57	47.37	1.06

NVIDIA GeForce RTX 5090

non-stream

num_workers	cost time 50%	cost time 90%	cost time 99%	rtf 50%	rtf 90%	rtf 99%	avg rtf	total_cost_time	qps
1	0.51	0.81	1.61	0.11	0.13	0.23	0.11	28.9	1.73
2	0.64	1.1	1.48	0.13	0.16	0.26	0.13	17.54	2.85
4	0.87	1.28	1.68	0.17	0.23	0.36	0.18	11.45	4.37
8	1.32	1.86	2.14	0.25	0.4	0.6	0.29	8.97	5.57

stream

num_workers	cost time 50%	cost time 90%	cost time 99%	ttft 50%	ttft 90%	ttft 99%	rtf 50%	rtf 90%	rtf 99%	avg rtf	total_cost_time	qps
1	0.76	1.41	2.27	0.28	0.3	0.31	0.16	0.18	0.22	0.16	44.06	1.13
2	1.45	2.34	3.46	0.74	1.28	1.75	0.27	0.45	0.7	0.3	38.82	1.29
4	2.9	4.04	4.7	2.16	3.03	3.4	0.5	1.04	1.51	0.61	37.75	1.32
8	5.78	7.74	8.49	5.01	6.73	7.35	1.03	2.09	2.85	1.22	37.67	1.33

Metrics Explanation:

num_workers: Number of concurrent workers
cost time: Total request processing time in seconds (50th/90th/99th percentile)
ttft: Time to First Token in seconds (50th/90th/99th percentile)
rtf: Real-Time Factor (50th/90th/99th percentile)
avg rtf: Average Real-Time Factor
total_cost_time: Total benchmark duration in seconds
qps: Queries Per Second

License

This repository is released under the Apache-2.0 license.

Third-Party Code Attribution

This project includes code from CosyVoice (Copyright Alibaba, Inc. and its affiliates), which is also licensed under Apache-2.0. The CosyVoice code is located in the cosyvoice/ directory and has been integrated and modified as part of LightTTS. See the NOTICE file for complete attribution details.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
asset		asset
cosyvoice		cosyvoice
light_tts		light_tts
test		test
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
launcher.sh		launcher.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LightTTS

✨ Features

⚡️ Get Started

Installation

Model Download

Start the Model Service

Request Examples

📊 Performance Benchmarks

NVIDIA GeForce RTX 4090D

NVIDIA GeForce RTX 5090

License

Third-Party Code Attribution

About

Uh oh!

Releases 1

Packages

Languages

License

ModelTC/LightTTS

Folders and files

Latest commit

History

Repository files navigation

LightTTS

✨ Features

⚡️ Get Started

Installation

Model Download

Start the Model Service

Request Examples

📊 Performance Benchmarks

NVIDIA GeForce RTX 4090D

NVIDIA GeForce RTX 5090

License

Third-Party Code Attribution

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages