SeqGenerator

Function-orientedEnabling Diverse Enzyme Design viathrough Function-Oriented Sequence-driven Diffusion Model

The training and generating process of our protein sequence diffusion model.

Setup:

The code is based on PyTorch and HuggingFace transformers.

pip install -r requirements.txt

Datasets

Prepare datasets and put them under the datasets folder. Take datasets/aspartese as an example.

Training

cd scripts
bash train.sh

Arguments explanation:

--max_len: the maximum length of the natrual sequences
--min_len: the minimum length of the natrual sequences
--dataset: the name of datasets, just for notation
--data_dir: the path to the saved datasets folder, containing train.csv valid.csv
--resume_checkpoint: if not none, restore this checkpoint and continue training
--model_path: the path to the used pretrained ESM-2 model, here we use "esm2_t30_150M_UR50D" which can be download here and put them to "diffusion_models/esm_orig"

Generating

You need to modify the path to model_dir, which is obtained in the training stage.

cd scripts
bash run_decode.sh

Arguments explanation:

--model_dir: the model obtained in the training stage, our trained model can be accessed here, for generating put the model and 'training_args.json' to this folder
--seq_len_sample: the generated sequence length is obtained by sampling the length of the natural sequences of this family
--max_len: the maximum length of the generated sequence
--min_len: the minimum length of the generated sequence
--seq_num: the number of sequences generated

Acknowledgements

The code in this project is based on DiffuSeq and ESM-2. Special thanks to the original authors for their contributions to the open-source community.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
datasets/aspartese		datasets/aspartese
diffuseq		diffuseq
diffusion_models/esm_orig		diffusion_models/esm_orig
img		img
scripts		scripts
LICENSE		LICENSE
README.md		README.md
basic_utils.py		basic_utils.py
requirements.txt		requirements.txt
sample_seq2seq.py		sample_seq2seq.py
train.py		train.py
train_util.py		train_util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeqGenerator

Setup:

Datasets

Training

Generating

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Wublab/SeqGenerator

Folders and files

Latest commit

History

Repository files navigation

SeqGenerator

Setup:

Datasets

Training

Generating

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages