IP_LAP_256 is ours LangXin_V2 Commercial code, mainly used for 2024. Because we have better models now, So we open sourced it. Facilitating everyone's learning and research.
This is a project about talking faces. It is a commercial digital human project that faithfully reproduces human faces, mouths, and teeth.We use 256X256 sized facial images for training, Because his face is cut from the forehead, the face size of 256X256 is equivalent to the face size of wav2lip384. So it can generate 720p, 1080p, 2k ,4k Digital Humanhuman videos. Transformer is you need. So this model innovatively uses attention mechanism, which can reference the mouth shape of the face in the previous and subsequent frames to generate new mouth shapes. Thus achieving the restoration of one's own teeth and mouth shape. IP_LAP use a network structure of 128x128 , IP_LAP_256 use a network structure of 256x256. We have done the following work:
- Add video cutting codes.
- Optimized the network structure and increased the clarity of face segmentation.
- Trained 1000 people, 50 hours dataset, and over 50000 pieces of data. landmarks model eval_L1_loss needs to be reduced to around 0.004 ,3080 training for 12-24 hours, renderer model FID needs to be reduced to around 15 ,4090 training for 24-48 hours.
- Dear friends, we no released the best landmarks checkpoint, you need load pre training weights for easy subsequent training. but we released the best renderer checkpoint ,you can use it directly.
- Of course, you can also use a 1-minute video for fine-tuning training landmarks to achieve better commercial results.
- If you want to achieve better reasoning results, then refer to my demo video for shooting.
- Requirements, Python==3.7.11 , torch==1.13.1 ,CUDA==11.3 . Of course, you can also choose other versions as long as they correspond well and can run.
Video | Project Page | Code
checkpoints for LangXin_V2 (IP_LAP_256) https://pan.baidu.com/s/1lzqgqO6vkFxa2-0AiS4a1A?pwd=lzzx
📊 The following pictures are comparison images of the training generator training 200000 steps, The second to last image is the generated digital human image.
📊 The following pictures are comparison images of the training generator training 300000 steps, The second to last image is the generated digital human image.
| Original video | Lip-synced video |
555555.mp4 |
output_555555.mp4 |
666666.mp4 |
output_666666.mp4 |
777777.mp4 |
output_777777.mp4 |
888888.mp4 |
output_888888.mp4 |
For digital human projects , we will continue to train and release higher definition weights in the future. The plan is as follows: Pre training checkpoints for wav2lip_288x288 (LangXin_V0) will be released in January 2025. Pre training checkpoints for wav2lip_384x384 (LangXin_V1) will be released in February 2025. Pre training checkpoints for IP_LAP_256 (LangXin_V2) will be released after June 2025. Pre training checkpoints for (LangXin_V3) will be released after June 2026.
- landmark_checkpoints
- renderer_checkpionts
- Dataset processing pipeline
- Training method
- Inference
- Real time Inference
- Higher definition commercial checkpoints
Thank you to the other three authors, Thank you for their wonderful work. https://github.com/Weizhi-Zhong/IP_LAP
This repositories made by langzizhixin from Langzizhixin Technology company 2025.7.20 , in Chengdu, China . The above code and weights can only be used for personal/research/non-commercial purposes. Especially for digital human video models in the warehouse, if commercial use is required, please contact the model themselves for authorization. If you need a higher definition model, please contact us by email 277504483@qq.com, ajian.justdoit@gmail.com or add ours WeChat for communication: langzizhixinkeji









