Skip to content

WeChatCV/Identity-as-Presence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Identity-as-Presence

Official implementation of "Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation"

Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation
Yingjie Chen, Shilun Lin, Cai Xing, Qixin Yan, Wenjing Wang, Dingming Liu, Hao Liu, Chen Li, Jing LYU

๐Ÿ’ก Abstract

Recent advances have demonstrated compelling capabilities in synthesizing real individuals into generated videos, reflecting the growing demand for identity-aware content creation. Nevertheless, an openly accessible framework enabling fine-grained control over facial appearance and voice timbre across multiple identities remains unavailable. In this work, we present a unified and scalable framework for identity-aware joint audio-video generation, enabling high-fidelity and consistent personalization. Specifically, we introduce a data curation pipeline that automatically extracts identity-bearing information with paired annotations across audio and visual modalities, covering diverse scenarios from single-subject to multi-subject interactions. We further propose a flexible and scalable identity injection mechanism for single- and multi-subject scenarios, in which both facial appearance and vocal timbre act as identity-bearing control signals. Moreover, in light of modality disparity, we design a multi-stage training strategy to accelerate convergence and enforce cross-modal coherence. Experiments demonstrate the superiority of the proposed framework.

๐Ÿ”ฅ Updates

  • (2026-03-18) The project page, demo video and technical report are released.

๐Ÿ“‘ TODO List

  • Release inference code and model weights for single-subject scenarios
  • Release inference code and model weights for multi-subject scenarios

Usage

Environment

$ pip install -r requirements.txt

Pretrained Weights

Please download the following pretrained models and place them in the ckpts directory: MMAudio, Wan2.2-TI2V-5B, Identity-as-Presence

After downloading, ensure all model files are placed in the ckpts directory and properly configured.

Inference

$ bash infer.sh

The results will be saved in results directory.

๐ŸŽฅ Demo

Single-subject Personalized Generation

1.mp4
2.mp4
3.mp4
4.mp4
1.mp4
2.mp4
3.mp4
4.mp4

Multi-subject Personalized Generation

1.mp4
1-1.mp4
1-2.mp4
2.mp4
2-1.mp4
2-2.mp4
3.mp4
3-1.mp4
3-2.mp4
4.mp4
4-1.mp4
4-2.mp4

For more details, please refer to our project page.

๐Ÿ”— Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{chen2026identity,
  title={Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation},
  author={Chen, Yingjie and Lin, Shilun and Xing, Cai and Binxin, Yang and Long, Zhou and Yan, Qixin and Wang, Wenjing and Liu, Dingming and Liu, Hao and Li, Chen and LYU, Jing},
  journal={arXiv preprint arXiv:2603.17889},
  website={https://chen-yingjie.github.io/projects/Identity-as-Presence/index.html},
  year={2026}}

Acknowledgements

We would like to thank the contributors to various open-source projects for their research and exploration.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors