[Homepage] [Reference Paper] [Code]
This repository provides baseline methods for the Forth REACT Challenge
- Please send the signed EULA (https://github.com/reactmultimodalchallenge/baseline_react2026/blob/main/EULA_MARS%20dataset.pdf) to Dr Siyang Song at s.song@exeter.ac.uk
Given the spatio-temporal behaviours expressed by a speaker at the time period, the proposed REACT 2025 Challenge will consist of the following two sub-challenges whose theoretical underpinnings have been defined and detailed in this paper.
This task aims to develop a deep learning model that takes the entire speaker behaviour sequence as the input, and generates multiple appropriate and realistic / naturalistic spatio-temporal facial reactions, consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
This task aims to develop a deep learning model that estimates each frame, rather than taking all frames into consideration. The model is expected to gradually generate all facial reaction frames to form multiple appropriate and realistic / naturalistic spatio-temporal facial reactions consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
We provide detailed instructions for setting up the environment using conda. First, create and activate a new environment:
conda create -n react python=3.10
conda activate reactFirst, check your CUDA version:
nvidia-smiVisit Pytorch official website to get the appropriate installation command. For example:
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidiaInstall the following dependencies:
conda install -c fvcore -c iopath -c conda-forge fvcore iopathFor CUDA versions older than 11.7, you will need to install the CUB library.
conda install -c bottler nvidiacubFirst, verify your CUDA version in Python:
import torch
torch.version.cudaDownload the appropriate PyTorch3D package from Anaconda based on your Python, CUDA, and PyTorch versions. For example, for Python 3.10, CUDA 11.6, and PyTorch 1.12.0:
# linux-64_pytorch3d-0.7.5-py310_cu116_pyt1120.tar.bz2
conda install linux-64_pytorch3d-0.7.5-py310_cu116_pyt1120.tar.bz2Install all remaining dependencies specified in requirements.txt:
pip install -r requirements.txtData
Challenge Data Description (Homepage):
We divided the datasets into training, test, and validation sets following an estimated 60%/20%/20% splitting ratio. Specifically, we split the datasets with a subject-independent strategy (i.e., the same subject was never included in the train and test sets).
- video-raw folder contains raw videos (with the resolution of 1920 * 1080)
- video-face-crop folder contains face-cropped videos (with the resolution of 384 * 384)
- facial-attributes folder contains sequences of frame-level 25-dimension facial attributes (15 AUs’ occurrences, valence and arousal intensities, and the probabilities of eight categorical facial expressions)
- coefficients folder contains sequences of 58-dimension (52-d expression, 3-d rotation, and 3-d translation) 3DMM coefficients extracted from corresponding videos
- audio folder contains wav files extracted from raw video files
Appropriate real facial reactions (Ground-Truths):
- During data recording, the semantic contexts are carefully controlled through the 23 distinct sessions (session0, session1, …, session22), each of which is guided by a few pre-defined sentences posted by the speaker. This provides a consistent session-specific context across dyadic interactions between different speakers and listeners. More specifically, for the speaker behaviour expressed in a specific session, we define all facial reactions expressed by different listeners under the same session to be appropriate facial reactions (i.e., ground-truth) for responding to it.
Data organization (./data) is listed below:
The example of data structure.
├── val
├── test
├── train
├── coefficients (.npy)
├── video-face-crop (.mp4)
├── video-raw (.mp4)
├── speaker
├── session0
├── Camera-2024-06-21-103121-103102.mp4
├── ...
├── ...
├── session22
├── Camera-2024-07-17-104338-104241.mp4
├── ...
├── listener
├── session0
├── Camera-2024-06-21-103121-103102.mp4
├── ...
├── ...
├── session22
├── Camera-2024-07-17-104338-104241.mp4
├── ...
├── facial-attributes (.npy)
├── speaker
├── session0
├── Camera-2024-06-21-103121-103102.npy
├── ...
├── ...
├── session22
├── Camera-2024-07-17-104338-104241.npy
├── ...
├── listener
├── session0
├── Camera-2024-06-21-103121-103102.npy
├── ...
├── ...
├── session22
├── Camera-2024-07-17-104338-104241.npy
├── ...
├── audio (.wav)
├── speaker
├── session0
├── Camera-2024-06-21-103121-103102.wav
├── ...
├── ...
├── listener
├── session0
├── Camera-2024-06-21-103121-103102.wav
├── ...
├── ...
External Tool Preparation
We use 3DMM coefficients to represent a 3D listener or speaker, and for further 3D-to-2D frame rendering. The baselines leverage 3DMM model to extract 3DMM coefficients, and render 3D facial reactions.
-
You should first download 3DMM (FaceVerse version 2 model) at this page
and then put it in the folder (
external/FaceVerse/data/).We provide our extracted 3DMM coefficients (which are used for our baseline visualisation) at OneDrive.
We also provide the
mean_face.npyat this OneDrive link andstd_face.npyat this OneDrive link andreference_full.npyat this Onedrive link for 3DMM coefficients Data Normalization. Please download and put them in the folder (external/FaceVerse/).
Then, we use a 3D-to-2D tool PIRender to render final 2D facial reaction frames.
- We re-trained the PIRender, and the well-trained model is provided at the checkpoint. Please put it in the folder (
external/PIRender/).
Finally, please download the compressed folder named pretrained_models from this link, and extract it into the project root directory.
Training
Generic online:
1. PerFRDiff + EEG
python main.py \
--config-name generic_online/motion_diffusion \
trainer.batch_size=8 \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.model.diff_model.eeg_head.enabled=true \
trainer.generic.train_eeg_head_only=false2. TransVAE + EEG
python main.py \
--config-name generic_online/motion_transvae \
trainer.batch_size=2 \
trainer.max_seq_len=256 \
trainer.window_size=16 \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.train_eeg_head_only=false \
trainer.model.eeg_head.enabled=truePersonalized online:
PerFRDiff rewrite-weight + EEG
(a) Condition Input: Listener historical facial behaviours
python main.py \
--config-name personalized_online/perfrdiff_rewrite_weight \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=false \
trainer.main_model.args.personal_condition_mode=3dmm_only \
trainer.pretrained.diffusion_prior=<diffusion-prior-model-path/checkpoint.pth> \
trainer.pretrained.diffusion_decoder=<diffusion-decoder-model-path/checkpoint.pth> \
trainer.pretrained.eeg_head_checkpoint=<eeg-head-checkpoint-path/checkpoint.pth>The paths diffusion-prior-model-path, diffusion-decoder-model-path, and eeg-head-checkpoint-pathpoint point to the checkpoints saved from training generic_online/motion_diffusion.
(b) Condition Input: Personality_only
python main.py \
--config-name personalized_online/perfrdiff_rewrite_weight \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=false \
trainer.main_model.args.personal_condition_mode=personality_only \
trainer.pretrained.diffusion_prior=<diffusion-prior-model-path/checkpoint.pth> \
trainer.pretrained.diffusion_decoder=<diffusion-decoder-model-path/checkpoint.pth> \
trainer.pretrained.eeg_head_checkpoint=<eeg-head-checkpoint-path/checkpoint.pth>The paths diffusion-prior-model-path, diffusion-decoder-model-path, and eeg-head-checkpoint-pathpoint point to the checkpoints saved from training generic_online/motion_diffusion.
(c) Condition Input: Listener historical facial behaviours + Personality_only
python main.py \
--config-name personalized_online/perfrdiff_rewrite_weight \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=false \
trainer.main_model.args.personal_condition_mode=3dmm_personality \
trainer.pretrained.diffusion_prior=<diffusion-prior-model-path/checkpoint.pth> \
trainer.pretrained.diffusion_decoder=<diffusion-decoder-model-path/checkpoint.pth> \
trainer.pretrained.eeg_head_checkpoint=<eeg-head-checkpoint-path/checkpoint.pth>The paths diffusion-prior-model-path, diffusion-decoder-model-path, and eeg-head-checkpoint-pathpoint point to the checkpoints saved from training generic_online/motion_diffusion.
Generic offline:
1. Motion Diffusion + EEG, first-stage backbone for S-PerReactor
Train this generic offline diffusion backbone before training personalized_offline/perreactor_offline. It saves the DiffusionPriorNetwork, TransformerDenoiser, and EEGPredictionHead checkpoints used by S-PerReactor.
python main.py \
--config-name generic_offline/motion_diffusion \
trainer.batch_size=4 \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg_head_only=false \
trainer.model.diff_model.eeg_head.enabled=trueThe checkpoints are saved under:
save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/
2. TransVAE + EEG
python main.py \
--config-name generic_offline/motion_transvae \
trainer.batch_size=4 \
trainer.max_seq_len=750 \
trainer.window_size=8 \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.train_eeg_head_only=false \
trainer.model.eeg_head.enabled=true3. ReGNN + EEG
(a) Run this command from the regnn/ directory:
cd ./regnn(b) Extract the image features using the pre-trained swin_transformer (pretrained weights already provided in ./pretrained_models):
python feature_extraction.py(c) Train the REGNN by running the following shell:
python train.py \
--logs-dir "Gmm-logs-eeg-head" \
--data-dir ./datasets/REACT2026/ \
--enable-eeg-head \
--eeg-loss-weight 0.25 \
--lr 0.0001 \
--gamma 0.1 \
--warmup-factor 0.01 \
--milestones 9 \
--batch-size 64 \
--layers 2 \
--act "ELU" \
--seed 1 \
--train-iters 100 \
--norm \
--neighbor-pattern "all" \
--convert-type "direct" \
--loss-midPersonalized offline:
1. S-PerReactor + EEG
S-PerReactor reuses the pretrained generic offline diffusion prior and decoder, freezes the generic backbone by default, and trains the listener personal adapter. Set trainer.perreactor.personal_condition_mode to choose the personal condition:
history_only, personality_only, or history_personality.
Set trainer.generic.train_eeg=false to train only the S-PerReactor adapter without EEG supervision.
(a) Condition Input: Listener historical emotion behaviours
python main.py \
--config-name personalized_offline/perreactor_offline \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=false \
trainer.perreactor.personal_condition_mode=history_only \
trainer.pretrained.diffusion_prior=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/DiffusionPriorNetwork/checkpoint_best.pth \
trainer.pretrained.diffusion_decoder=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/TransformerDenoiser/checkpoint_best.pth \
trainer.pretrained.eeg_head_checkpoint=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/EEGPredictionHead/checkpoint_best.pth(b) Condition Input: Personality traits
python main.py \
--config-name personalized_offline/perreactor_offline \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=false \
trainer.perreactor.personal_condition_mode=personality_only \
trainer.pretrained.diffusion_prior=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/DiffusionPriorNetwork/checkpoint_best.pth \
trainer.pretrained.diffusion_decoder=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/TransformerDenoiser/checkpoint_best.pth \
trainer.pretrained.eeg_head_checkpoint=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/EEGPredictionHead/checkpoint_best.pth(c) Condition Input: Listener historical emotion behaviours + Personality traits
python main.py \
--config-name personalized_offline/perreactor_offline \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=false \
trainer.perreactor.personal_condition_mode=history_personality \
trainer.pretrained.diffusion_prior=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/DiffusionPriorNetwork/checkpoint_best.pth \
trainer.pretrained.diffusion_decoder=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/TransformerDenoiser/checkpoint_best.pth \
trainer.pretrained.eeg_head_checkpoint=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/EEGPredictionHead/checkpoint_best.pth(d) Second-stage EEG head-only training
python main.py \
--config-name personalized_offline/perreactor_offline \
stage=fit \
data_dir=./datasets/REACT2026/ \
trainer.generic.train_eeg=true \
trainer.generic.train_eeg_head_only=true \
trainer.perreactor.personal_condition_mode=<same-as-adapter-training> \
trainer.pretrained.adapter_checkpoint=<perreactor-adapter-checkpoint/checkpoint_best.pth> \
trainer.pretrained.diffusion_prior=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/DiffusionPriorNetwork/checkpoint_best.pth \
trainer.pretrained.diffusion_decoder=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/TransformerDenoiser/checkpoint_best.pth \
trainer.pretrained.eeg_head_checkpoint=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/EEGPredictionHead/checkpoint_best.pthThe generic offline checkpoints come from the first-stage generic_offline/motion_diffusion run above. The S-PerReactor adapter checkpoint is saved under save/perreactor_offline/<data-name>/offline/<run-id>/PerReactor/.
Pretrained weights
- to be released
Evaluation
For evaluation, please refer to test function in ./trainer/motion_diffusion.py (PerFRDiff baseline) or ./trainer/motion_transvae.py (Trans-VAE baseline). The metric computations are implemented in ./framework/utils/compute_metrics.py. The validation set can be treated as the test set by loading it via the provided dataloader file. As in the baseline paper, all facial reactions from different participants within the same session are defined as ground-truths.
The pretrained model weights will be released soon.
Generic online:
1. PerFRDiff + EEG
python main.py \
--config-name generic_online/motion_diffusion \
trainer.batch_size=1 \
stage=test \
data_dir=./datasets/REACT2026/ \
resume_id=<train-experiment-id> \
trainer.generic.eval_eeg=true \
trainer.model.diff_model.eeg_head.enabled=true2. TransVAE + EEG
python main.py \
--config-name generic_online/motion_transvae \
trainer.batch_size=1 \
trainer.max_seq_len=256 \
trainer.window_size=16 \
stage=test \
data_dir=./datasets/REACT2026/ \
trainer.data_transform=zero_center \
resume_id=<train-experiment-id> \
trainer.eval_eeg=true \
trainer.eval_eeg_metrics=true \
trainer.eval_facial_metrics=true \
trainer.save_results=true \
trainer.renderer.do_render=falsePersonalized online:
PerFRDiff rewrite-weight + EEG
(a) Condition Input: Listener historical facial behaviours
python main.py \
--config-name personalized_online/perfrdiff_rewrite_weight \
trainer.batch_size=1 \
stage=test \
data_dir=./datasets/REACT2026/ \
resume_id=<train-experiment-id> \
trainer.generic.eval_eeg=true \
trainer.main_model.args.personal_condition_mode=3dmm_only \
trainer.pretrained.diffusion_prior=<diffusion-prior-model-path/checkpoint.pth> \
trainer.pretrained.diffusion_decoder=<diffusion-decoder-model-path/checkpoint.pth> \
trainer.pretrained.eeg_head_checkpoint=<eeg-head-checkpoint-path/checkpoint.pth>(b) Condition Input: Personality_only
python main.py \
--config-name personalized_online/perfrdiff_rewrite_weight \
trainer.batch_size=1 \
stage=test \
data_dir=./datasets/REACT2026/ \
resume_id=<train-experiment-id> \
trainer.generic.eval_eeg=true \
trainer.main_model.args.personal_condition_mode=personality_only \
trainer.pretrained.diffusion_prior=<diffusion-prior-model-path/checkpoint.pth> \
trainer.pretrained.diffusion_decoder=<diffusion-decoder-model-path/checkpoint.pth> \
trainer.pretrained.eeg_head_checkpoint=<eeg-head-checkpoint-path/checkpoint.pth>(c) Condition Input: Listener historical facial behaviours + Personality_only
python main.py \
--config-name personalized_online/perfrdiff_rewrite_weight \
trainer.batch_size=1 \
stage=test \
data_dir=./datasets/REACT2026/ \
resume_id=<train-experiment-id> \
trainer.generic.eval_eeg=true \
trainer.main_model.args.personal_condition_mode=3dmm_personality \
trainer.pretrained.diffusion_prior=<diffusion-prior-model-path/checkpoint.pth> \
trainer.pretrained.diffusion_decoder=<diffusion-decoder-model-path/checkpoint.pth> \
trainer.pretrained.eeg_head_checkpoint=<eeg-head-checkpoint-path/checkpoint.pth>Generic offline:
1. TransVAE + EEG
python main.py \
--config-name generic_offline/motion_transvae \
stage=test \
data_dir=./datasets/REACT2026/ \
trainer.batch_size=1 \
trainer.max_seq_len=750 \
trainer.window_size=8 \
trainer.data_transform=zero_center \
resume_id=<train-experiment-id> \
trainer.eval_eeg=true \
trainer.eval_eeg_metrics=true \
trainer.eval_facial_metrics=true \
trainer.save_results=true \
trainer.renderer.do_render=false2. ReGNN + EEG
python train.py \
--test \
--logs-dir "Gmm-logs-eeg-head" \
--data-dir "./datasets/REACT2026/" \
--model-pth "./baseline_react2026-main2/regnn/Gmm-logs-eeg-head/mhp-eeg-head-last-seed1.pth" \
--enable-eeg-head \
--eval-eeg \
--metric-threads 1 \
--eval-clip-batch-size 1 \
--layers 2 \
--act "ELU" \
--seed 1 \
--norm \
--neighbor-pattern "all" \
--convert-type "direct"Personalized offline:
1. S-PerReactor + EEG
Use the same trainer.perreactor.personal_condition_mode as the training run (history_only, personality_only, or history_personality). Set trainer.generic.eval_eeg=false to evaluate facial metrics only, or trainer.generic.eval_eeg=true to also save GT_EEG, PRED_EEG, and EEG_MASK in results.pt.
python main.py \
--config-name personalized_offline/perreactor_offline \
trainer.batch_size=1 \
stage=test \
data_dir=./datasets/REACT2026/ \
resume_id=<perreactor-train-experiment-id> \
trainer.generic.eval_eeg=true \
trainer.perreactor.personal_condition_mode=<same-as-training> \
trainer.pretrained.diffusion_prior=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/DiffusionPriorNetwork/checkpoint_best.pth \
trainer.pretrained.diffusion_decoder=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/TransformerDenoiser/checkpoint_best.pth \
trainer.pretrained.eeg_head_checkpoint=save/motion_diffusion/react_2025/offline/checkpoints/<generic-offline-run-id>/EEGPredictionHead/checkpoint_best.pthIf the resumed S-PerReactor checkpoint was trained with trainer.generic.train_eeg=true, eeg_head.* is loaded from that checkpoint. Otherwise, keep trainer.pretrained.eeg_head_checkpoint set when trainer.generic.eval_eeg=true.
[1] Song, Siyang, Micol Spitale, Yiming Luo, Batuhan Bal, and Hatice Gunes. "Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?." arXiv preprint arXiv:2302.06514 (2023).
[2] Song, Siyang, Micol Spitale, Xiangyu Kong, Hengde Zhu, Cheng Luo, Cristina Palmero, German Barquero et al. "React 2025: the third multiple appropriate facial reaction generation challenge." In Proceedings of the 33rd ACM International Conference on Multimedia, pp. 13979-13984. 2025.
[3] Song, Siyang, Micol Spitale, Cheng Luo, Cristina Palmero, German Barquero, Hengde Zhu, Sergio Escalera et al. "React 2024: the second multiple appropriate facial reaction generation challenge." In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-5. IEEE, 2024.
[4] Song, Siyang, Micol Spitale, Cheng Luo, Germán Barquero, Cristina Palmero, Sergio Escalera, Michel Valstar et al. "REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge." In Proceedings of the 31st ACM International Conference on Multimedia, pp. 9620-9624. 2023.
[6] Song, Siyang, Yuxin Song, Cheng Luo, Zhiyuan Song, Selim Kuzucu, Xi Jia, Zhijiang Guo, Weicheng Xie, Linlin Shen, and Hatice Gunes. "GRATIS: Deep Learning Graph Representation with Task-specific Topology and Multi-dimensional Edge Features." arXiv preprint arXiv:2211.12482 (2022).
[7] Luo, Cheng, Siyang Song, Weicheng Xie, Linlin Shen, and Hatice Gunes. (2022, July) "Learning multi-dimensional edge feature-based au relation graph for facial action unit recognition." Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (pp. 1239-1246).
[8] Toisoul, Antoine, Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, and Maja Pantic. "Estimation of continuous valence and arousal levels from faces in naturalistic conditions." Nature Machine Intelligence 3, no. 1 (2021): 42-50.
[9] Eyben, Florian, Martin Wöllmer, and Björn Schuller. "Opensmile: the munich versatile and fast open-source audio feature extractor." In Proceedings of the 18th ACM international conference on Multimedia, pp. 1459-1462. 2010.
[10] Zhu, Hengde, Xiangyu Kong, Weicheng Xie, Xin Huang, Linlin Shen, Lu Liu, Hatice Gunes, and Siyang Song. "Perfrdiff: Personalised weight editing for multiple appropriate facial reaction generation." In Proceedings of the 32nd ACM International Conference on Multimedia, pp. 9495-9504. 2024.
[11] Zhu, Hengde, Xiangyu Kong, Weicheng Xie, Xin Huang, Xilin He, Lu Liu, Linlin Shen, Wei Zhang, Hatice Gunes, and Siyang Song. "PerReactor: Offline Personalised Multiple Appropriate Facial Reaction Generation." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 2, pp. 1665-1673. 2025.
[12] Song, Siyang, Zilong Shao, Shashank Jaiswal, Linlin Shen, Michel Valstar, and Hatice Gunes. "Learning Person-specific Cognition from Facial Reactions for Automatic Personality Recognition." IEEE Transactions on Affective Computing (2022).
[13] Shao, Zilong, Siyang Song, Shashank Jaiswal, Linlin Shen, Michel Valstar, and Hatice Gunes. "Personality recognition by modelling person-specific cognitive processes using graph representation." In proceedings of the 29th ACM international conference on multimedia, pp. 357-366. 2021.
[14] Huang, Yuchi, and Saad M. Khan. "Dyadgan: Generating facial expressions in dyadic interactions." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 11-18. 2017.
[15] Huang, Yuchi, and Saad Khan. "A generative approach for dynamically varying photorealistic facial expressions in human-agent interactions." In Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 437-445. 2018.
[16] Barquero, German, Sergio Escalera, and Cristina Palmero. "Belfusion: Latent diffusion for behavior-driven human motion prediction." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2317-2327. 2023.
[17] Zhou, Mohan, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, and Tao Mei. "Responsive listening head generation: a benchmark dataset and baseline." In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVIII, pp. 124-142. Cham: Springer Nature Switzerland, 2022.
[18] Luo, Cheng, Siyang Song, Weicheng Xie, Micol Spitale, Zongyuan Ge, Linlin Shen, and Hatice Gunes. "Reactface: Online multiple appropriate facial reaction generation in dyadic interactions." IEEE Transactions on Visualization and Computer Graphics 31, no. 9 (2024): 6190-6207.
[19] Xu, Tong, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, and Siyang Song. "Reversible graph neural network-based reaction distribution learning for multiple appropriate facial reactions generation." IEEE Transactions on Affective Computing (2026).
[20] Liang, Cong, Jiahe Wang, Haofan Zhang, Bing Tang, Junshan Huang, Shangfei Wang, and Xiaoping Chen. "Unifarn: Unified transformer for facial reaction generation." In Proceedings of the 31st ACM International Conference on Multimedia, pp. 9506-9510. 2023.
[21] Yu, Jun, Ji Zhao, Guochen Xie, Fengxin Chen, Ye Yu, Liang Peng, Minglei Li, and Zonghong Dai. "Leveraging the latent diffusion models for offline facial multiple appropriate reactions generation." In Proceedings of the 31st ACM International Conference on Multimedia, pp. 9561-9565. 2023.
[22] Hoque, Ximi, Adamay Mann, Gulshan Sharma, and Abhinav Dhall. "BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial Reactions." In Proceedings of the 31st ACM International Conference on Multimedia, pp. 9536-9540. 2023.
[23] Nguyen, Dang-Khanh, Prabesh Paudel, Seung-Won Kim, Ji-Eun Shin, Soo-Hyung Kim, and Hyung-Jeong Yang. "Multiple facial reaction generation using gaussian mixture of models and multimodal bottleneck transformer." In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-5. IEEE, 2024.
[24] Hu, Guanyu, Jie Wei, Siyang Song, Dimitrios Kollias, Xinyu Yang, Zhonglin Sun, and Odysseus Kaloidas. "Robust facial reactions generation: An emotion-aware framework with modality compensation." In 2024 IEEE International Joint Conference on Biometrics (IJCB), pp. 1-10. IEEE, 2024.
[25] Liu, Zhenjie, Cong Liang, Jiahe Wang, Haofan Zhang, Yadong Liu, Caichao Zhang, Jialin Gui, and Shangfei Wang. "One-to-many appropriate reaction mapping modeling with discrete latent variable." In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-5. IEEE, 2024.
[26] Dam, Quang Tien, Tri Tung Nguyen Nguyen, Dinh Tuan Tran, and Joo-Ho Lee. "Finite scalar quantization as facial tokenizer for dyadic reaction generation." In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-5. IEEE, 2024.
[27] Luo, Jiachen, Jiajun He, Shuai Shen, Lin Wang, Huy Phan, Joshua Reiss, Lin Haijun, Bjoern Schuller, Zeyu Fu, and Siyang Song. "MReactor: Offline Multiple Appropriate Facial Reaction Generation with Hierarchical Cognitive Disentanglement." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3354-3363. 2026.
[28] Xie, Weicheng, Chunlin Yan, Siyang Song, Zitong Yu, Linlin Shen, and Laizhong Cui. "Smooth Online Multiple Appropriate Facial Reaction Generation." In Proceedings of the 33rd ACM International Conference on Multimedia, pp. 5804-5813. 2025.
[29] Mao, Qirong, Qiwei Wu, Na Liu, Yakui Ding, and Lijian Gao. "Scattering-Conditioned Diffusion Models for Multiple Appropriate Facial Reaction Generation." In Proceedings of the 33rd ACM International Conference on Multimedia, pp. 13985-13991. 2025.
[30] Wang, Peng, Pujun Xue, Xiaofeng Liu, and Tongjuan Ji. "Explaining Listener Reactions: Personality-Guided Facial Response Generation with Cross-Modal Attention." In Proceedings of the 33rd ACM International Conference on Multimedia, pp. 13997-14003. 2025.
[31] Huang, Jiajian, and Zitong Yu. "Multiple Appropriate Facial Reaction Generation Based on Multi-View Transformation of Speaker Video." In Proceedings of the 33rd ACM International Conference on Multimedia, pp. 13992-13996. 2025.
[32] Nguyen, Minh-Duc, Hyung-Jeong Yang, Ngoc-Huynh Ho, Soo-Hyung Kim, Seungwon Kim, and Ji-Eun Shin. "Vector quantized diffusion models for multiple appropriate reactions generation." In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-5. IEEE, 2024.
[33] Lv, Qincheng, Xiaofeng Liu, Jie Li, Rongrong Ni, Pujun Xue, and Siyang Song. "Hierarchical multimodal decoupling-fusion framework for offline multiple appropriate facial reaction generation." In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2025.
[34] Luo, Cheng, Siyang Song, Siyuan Yan, Zhen Yu, and Zongyuan Ge. "ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model." In Proceedings of the 33rd ACM International Conference on Multimedia, pp. 5607-5616. 2025.
[35] Li, Jiaming, Sheng Wang, Xin Wang, Yitao Zhu, Honglin Xiong, Zixu Zhuang, and Qian Wang. "Reactdiff: Latent diffusion for facial reaction generation." Neural Networks 189 (2025): 107596.
Thanks to the open source of the following projects: