LatentSync

📖 Introduction

We present LatentSync, an end-to-end lip-sync method based on audio-conditioned latent diffusion models without any intermediate motion representation, diverging from previous diffusion-based lip-sync methods based on pixel-space diffusion or two-stage generation. Our framework can leverage the powerful capabilities of Stable Diffusion to directly model complex audio-visual correlations.

🔧 Setting up the Environment

Install the required packages and download the checkpoints via:

pip install -r requirements.txt
huggingface-cli download ByteDance/LatentSync-1.6 latentsync_unet.pt --local-dir checkpoints
huggingface-cli download ByteDance/LatentSync-1.6 tiny.pt --local-dir checkpoints/whisper

🚀 Inference API

To start the inference server, run the following command:

python app.py

The server will start on http://0.0.0.0:5000. You can then send a POST request to the /generate endpoint with a video and audio file to perform lip-syncing.

Request

Method: POST
Endpoint: /generate
Form Data:
- video: The video file to process.
- audio: The audio file to use for lip-syncing.
- guidance_scale (optional): The guidance scale for the diffusion model. Default is 2.0.
- inference_steps (optional): The number of inference steps. Default is 20.
- seed (optional): The random seed for the generation. Default is a random integer.

Example

curl -X POST -F "video=@/path/to/video.mp4" -F "audio=@/path/to/audio.wav" http://0.0.0.0:5000/generate > output.mp4

🙏 Acknowledgement

Our code is built on AnimateDiff.
Some code are borrowed from MuseTalk, StyleSync, SyncNet, Wav2Lip.

Thanks for their generous contributions to the open-source community!

📖 Citation

If you find our repo useful for your research, please consider citing our paper:

@article{li2024latentsync,
  title={LatentSync: Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision},
  author={Li, Chunyu and Zhang, Chao and Xu, Weikai and Lin, Jingyu and Xie, Jinghui and Feng, Weiguo and Peng, Bingyue and Chen, Cunjian and Xing, Weiwei},
  journal={arXiv preprint arXiv:2412.09262},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
configs		configs
latentsync		latentsync
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LatentSync

📖 Introduction

🔧 Setting up the Environment

🚀 Inference API

Request

Example

🙏 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LatentSync

📖 Introduction

🔧 Setting up the Environment

🚀 Inference API

Request

Example

🙏 Acknowledgement

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages