Deepfake Sinhala Dubbing Pipeline

Introduction

This repository contains a multi-stage pipeline that takes an input video, transcribes the English audio, translates it to Sinhala, romanizes the Sinhala text, synthesizes Sinhala speech, converts the voice with an RVC model, and finally lip-syncs the output video with Wav2Lip. The main orchestration happens in main.py, which chains together the supporting scripts in this repo.

Pipeline Overview

Extract audio from the source video (transcribe_video.py).
Transcribe English speech with WhisperX and store segment metadata (transcribe_video.py).
Translate the transcript to Sinhala with NLLB (en_to_sin.py).
Romanize Sinhala text (sin_to_roman.py).
Generate Sinhala TTS audio segments (sinhala_tts.py).
Convert the voice using an RVC model (convert_voice.py).
Stitch audio segments and mix them into a final track (join_audio_segments.py).
Replace the source video audio (final_video.py).
Lip-sync the video with Wav2Lip (main.py).

Setup

This project relies on external projects and model assets that are intentionally ignored from Git. You will need to set up the following prerequisites before running the pipeline.

System Requirements

Python 3.10+ (CPU-only works; GPU optional for speed)
ffmpeg available on your PATH

Python Dependencies

Create a virtual environment for the top-level pipeline and install the key libraries used by the scripts:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install whisperx transformers sentencepiece accelerate sacremoses aksharamukha TTS soundfile

Note: whisperx and transformers may download large models at runtime. Refer to their docs for GPU support or custom cache locations.

External Projects

The pipeline expects local clones/virtual environments for the following projects:

1) RVC (Realtime Voice Conversion)

main.py calls convert_voice.py using a Python interpreter located at ./rvc/bin/python. Create a virtual environment named rvc and install the RVC dependencies there. You also need the model and index assets under ./assets/.

Expected layout:

./rvc/bin/python
./assets/weights/<your_model>.pth
./assets/indices/<your_index>.index
./assets/rmvpe/...

2) Wav2Lip

main.py runs Wav2Lip’s inference.py using a Python interpreter at ./wav2lip/venv-wav2lip/bin/python. Clone Wav2Lip under ./wav2lip/Wav2Lip, set up its virtual environment, and download the checkpoint.

Expected layout:

./wav2lip/Wav2Lip/inference.py
./wav2lip/Wav2Lip/checkpoints/wav2lip.pth
./wav2lip/venv-wav2lip/bin/python

3) Sinhala TTS Model

./tts_model is expected to contain the trained checkpoint and config file used by sinhala_tts.py.

Expected layout:

./tts_model/checkpoint_80000.pth
./tts_model/config.json

Optional: Lip Enhancer

There is an optional post-processing script in lip-enhancer/ that can be run on Wav2Lip output videos if you want additional smoothing. This is not called from main.py by default.

Running the Full Pipeline

Once the dependencies and external projects are installed, run the pipeline from the repository root:

python main.py /path/to/input_video.mp4

The pipeline will generate intermediate artifacts in folders like audios/, segment_metadata/, sinhala_audio_segments/, joined_sinhala_audio/, and sinhala_video/.

Troubleshooting

Ensure ffmpeg is installed and available on your PATH.
If you see missing model errors, verify that the files in assets/, tts_model/, and wav2lip/Wav2Lip/checkpoints/ exist and match the expected filenames in the scripts.
The RVC and Wav2Lip environments are independent from the main pipeline environment; verify each venv has the proper dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deepfake Sinhala Dubbing Pipeline

Introduction

Pipeline Overview

Setup

System Requirements

Python Dependencies

External Projects

1) RVC (Realtime Voice Conversion)

2) Wav2Lip

3) Sinhala TTS Model

Optional: Lip Enhancer

Running the Full Pipeline

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
configs		configs
infer		infer
lip-enhancer		lip-enhancer
vc_model		vc_model
.env		.env
.gitignore		.gitignore
README.md		README.md
clean_folders.sh		clean_folders.sh
convert_voice.py		convert_voice.py
en_to_sin.py		en_to_sin.py
final_video.py		final_video.py
join_audio_segments.py		join_audio_segments.py
main.py		main.py
print_model_chars.py		print_model_chars.py
sin_to_roman.py		sin_to_roman.py
sinhala_tts.py		sinhala_tts.py
transcribe_video.py		transcribe_video.py

sheronjay/deepfake

Folders and files

Latest commit

History

Repository files navigation

Deepfake Sinhala Dubbing Pipeline

Introduction

Pipeline Overview

Setup

System Requirements

Python Dependencies

External Projects

1) RVC (Realtime Voice Conversion)

2) Wav2Lip

3) Sinhala TTS Model

Optional: Lip Enhancer

Running the Full Pipeline

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages