An AI-powered pipeline that transforms text into realistic lip-synced talking face videos using ElevenLabs Text-to-Speech and Wav2Lip.
Perfect for AI demos, virtual presenters, educational content, and speech-driven facial animation.
Transform a simple text script and a face image into a fully lip-synced video:
📝 Text Script → 🎙️ AI Speech → 👄 Lip Sync → 🎬 Final Video
Input: Text + Face Image
Output: Realistic talking face video with synchronized lips
- 🗣️ Natural Speech Synthesis - Powered by ElevenLabs TTS API
- 👄 Accurate Lip Synchronization - Using state-of-the-art Wav2Lip
- 🤖 Smart Pipeline - Auto-detects audio or script inputs
- ⚡ GPU Acceleration - CUDA support for faster processing
- 📂 Organized Workflow - Clean input/output structure
- 🚀 One-Click Execution - Run
main.pyand you're done
[Input: Script or Audio] → 🤖 main.py (Auto-Pipeline) → [Output: Lip-Synced Video]
lip-sync-video-generator/
│
├── input/ # Your input files
│ ├── script.txt # Text to convert into speech
│ └── face.jpg # Face image (front-facing)
│
├── output/ # Generated results
│ ├── audio.wav # Generated speech audio
│ └── output_video.mp4 # Final lip-synced video
│
├── Wav2Lip/ # Wav2Lip model and scripts
│ ├── checkpoints/
│ │ └── wav2lip.pth # Wav2Lip model file
│ └── face_detection/detection/sfd/
│ └── s3fd.pth # Face detection model
│
├── main.py # 🚀 Unified Pipeline (Run this!)
├── Elevenlab.py # Text-to-speech generator
├── requirements.txt # Python dependencies
├── .env # API key (you create this)
└── README.md
| Requirement | Purpose |
|---|---|
| Python 3.9+ | Core runtime |
| FFmpeg | Video processing |
| ElevenLabs API Key | Speech generation |
| NVIDIA GPU + CUDA (Optional) | Faster processing |
winget install Gyan.FFmpegRestart your terminal after installation.
ffmpeg -versiongit clone https://github.com/Cl0ud-9/lip-sync-video-generator.gitThen open it in your code editor.
python -m venv .venv.\.venv\Scripts\activatepip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpupip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install -r requirements.txtGet your API key from: https://elevenlabs.io/developers
Create a .env file in the root directory:
( Rename the .env.example file to .env and add your API key )
ELEVENLABS_API_KEY=your_api_key_here
Download:
https://drive.google.com/uc?id=1fQtBSYEyuai9MjBOF8j7zZ4oQ9W2N64q
Place here:
Wav2Lip/checkpoints/wav2lip.pth
Download:
https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth
Rename to:
s3fd.pth
Place here:
Wav2Lip/face_detection/detection/sfd/s3fd.pth
- Face: Put your image or video in
input/(e.g.,input/face.jpgorinput/face.mp4). - Audio:
- Option A (Text Script): Put your script in
input/script.txt. - Option B (Audio File): Put your audio in
input/audio.wav.
- Option A (Text Script): Put your script in
python main.pyThat's it! The script will automatically detect your input and generate the video.
Output: output/output_video.mp4
If you want more control (like specific resize factors or specific file paths), you can run the scripts individually.
python Elevenlab.py --script input/script.txt --output output/audio.wavpython Wav2Lip/inference.py --checkpoint_path Wav2Lip/checkpoints/wav2lip.pth --face input/face.jpg --audio input/audio.wav --outfile output/output_video.mp4 --resize_factor 2 --nosmooth --wav2lip_batch_size 256python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU')"| Issue | Solution |
|---|---|
| FFmpeg not found | Restart terminal |
| CUDA not detected | Install GPU PyTorch build |
| Blurry lips | Use a better face crop |
| Model not found | Check file paths |
| Slow processing | Use GPU |
| API key error | Verify .env file |
- ElevenLabs generates speech
- S3FD detects face
- Wav2Lip generates lip motion
- FFmpeg renders final video
Licensed under the MIT License — see LICENSE.
For educational and research use only.
Ensure consent before using any person's face or voice.
Made with ❤️ for the AI community