Official implementation for "[Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation]".
To create the virtual environment, use the following command.
conda env create -f env.ymlOr do it step by step following the modified guidance in TargetDiff Installation
Python<3.10 is a must for Vina's compatibility.
The data preparation follows TargetDiff. For more details, please refer to the repository of TargetDiff.
We use pipeline.py to wrap the whole pipeline of training, sampling, and evaluation for both projects.
python -m pipeline <configs> <sampling_results> [train|sample|eval] [-c resume_from_checkpoint_for_training]
# python -m pipeline configs/training.yml sampling_results/reproduce # for whole pipeline
# python -m pipeline configs/sampling.yml sampling_results/reproduce sample # for pipeline starts from sampling
# python -m pipeline "no matter" sampling_results/reproduce eval # for pipeline for evaluationOr you can manually run the script for each stage like TargetDiff or BindDM.
We remove the
{train,sample,evaluate}.pyin BindDM because they are just the copies of the{train,sample,evaluate}_diffusion.pyinscripts.
It is worth noting that we provide script for plotting and metrics calculation like High Affinity and Diversity which is just based on the metrics_-1.pt (meta file) generated by evaluation.
For more metrics, please refer to <binddm/scripts/jsd_summary.py> after reshape the meta file with eval_result_reshape.py.
These meta files and checkpoints are released in HF.
We conduct the extra experiments on PDBbind.
Download and unzip the PDBbind refined set from https://www.pdbbind-plus.org.cn/download (PDBbind v2020 id=3).
data prepare:
python dataset_prepare.py pdbbind PDBbind_refined_2020.tgz 100
# in BindDM
cd binddm
PYTHONPATH=. python scripts/data_preparation/clean_pdbbind.py PDBbind_refined_2020 --num_workers 64
PYTHONPATH=. python scripts/data_preparation/split_pl_dataset.py --path PDBbind_refined_2020 --dst PDBbind_refined_2020_pocket10_split.pt --fixed_split PDBbind_refined_2020/split_by_name.ptevaluate dataset (for testset baseline):
# in BindDM
PYTHONPATH=. python scripts/eval_testset.py PDBbind_refined_2020_test --docking_mode vina_docksample and evaluate:
# bd_pdbbind.yaml is for BindDM
# gbd_pdbbind.yaml is for BindDM+Ours (DiffGap)
python pipeline.py configs/bd_pdbbind.yaml sampling_results/binddm_pdbbind sample -p PDBbind_refined_2020_test@misc{liu2024gapdiff,
title={Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation},
author={Peidong Liu and Wenbo Zhang and Xue Zhe and Jiancheng Lv and Xianggen Liu},
year={2024},
eprint={2411.05472},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.05472},
}