DiffGap

Official implementation for "[Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation]".

Installation

To create the virtual environment, use the following command.

conda env create -f env.yml

Or do it step by step following the modified guidance in TargetDiff Installation

Python<3.10 is a must for Vina's compatibility.

Data

The data preparation follows TargetDiff. For more details, please refer to the repository of TargetDiff.

Usage

We use pipeline.py to wrap the whole pipeline of training, sampling, and evaluation for both projects.

python -m pipeline <configs> <sampling_results> [train|sample|eval] [-c resume_from_checkpoint_for_training]
# python -m pipeline configs/training.yml sampling_results/reproduce # for whole pipeline
# python -m pipeline configs/sampling.yml sampling_results/reproduce sample # for pipeline starts from sampling
# python -m pipeline "no matter" sampling_results/reproduce eval # for pipeline for evaluation

Or you can manually run the script for each stage like TargetDiff or BindDM.

We remove the {train,sample,evaluate}.py in BindDM because they are just the copies of the {train,sample,evaluate}_diffusion.py in scripts.

It is worth noting that we provide script for plotting and metrics calculation like High Affinity and Diversity which is just based on the metrics_-1.pt (meta file) generated by evaluation.

For more metrics, please refer to <binddm/scripts/jsd_summary.py> after reshape the meta file with eval_result_reshape.py.

These meta files and checkpoints are released in HF.

PDBbind

We conduct the extra experiments on PDBbind.

Download and unzip the PDBbind refined set from https://www.pdbbind-plus.org.cn/download (PDBbind v2020 id=3).

data prepare:

python dataset_prepare.py pdbbind PDBbind_refined_2020.tgz 100

# in BindDM
cd binddm
PYTHONPATH=. python scripts/data_preparation/clean_pdbbind.py PDBbind_refined_2020 --num_workers 64
PYTHONPATH=. python scripts/data_preparation/split_pl_dataset.py --path PDBbind_refined_2020 --dst PDBbind_refined_2020_pocket10_split.pt --fixed_split PDBbind_refined_2020/split_by_name.pt

evaluate dataset (for testset baseline):

# in BindDM
PYTHONPATH=. python scripts/eval_testset.py PDBbind_refined_2020_test --docking_mode vina_dock

sample and evaluate:

# bd_pdbbind.yaml is for BindDM
# gbd_pdbbind.yaml is for BindDM+Ours (DiffGap)
python pipeline.py configs/bd_pdbbind.yaml sampling_results/binddm_pdbbind sample -p PDBbind_refined_2020_test

Citation

@misc{liu2024gapdiff,
      title={Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation}, 
      author={Peidong Liu and Wenbo Zhang and Xue Zhe and Jiancheng Lv and Xianggen Liu},
      year={2024},
      eprint={2411.05472},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2411.05472}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
binddm		binddm
targetdiff		targetdiff
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset_prepare.py		dataset_prepare.py
env.yml		env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffGap

Installation

Data

Usage

PDBbind

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DiffGap

Installation

Data

Usage

PDBbind

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages