ProVADA: Conditional Generation of Protein Variants via Ensemble-Guided Test-Time Steering

This repository contains the official implementation of ProVADA (Protein Variant Adaptation), a computational method for adapting existing proteins by designing novel variants conditionally. Starting from a wild-type reference sequence, ProVADA steers the design process to optimize for desired functional properties.

Publications & Presentations 📚

Pre-Print
Pacific Symposium on Biocomputing [PSB] 2026
- Manuscript
- Presentation Slides
Final Paper (Coming Soon!)

What is ProVADA? 💡

At its core, ProVADA uses an iterative, population-based sampling algorithm called MADA (Mixture-Adaptation Directed Annealing) to explore the protein sequence space. At each iteration, promising sequences are selected through a down-sample-up-sampling process, partially masked, and then re-completed to generate new proposals. These proposals are accepted or rejected based on a fitness score, guiding the population toward the desired properties.

An illustrated example of the MADA algorithm utilizing ProteinMPNN as a generator.

Set up 🚧

We have created a start up script that installs all dependencies and sets up the conda environment provada-env. Please use the following commands to create and activate the environment:

bash create_env.sh
conda activate provada-env

Example 🚀

We have provided a few example inputs in the inputs directory.

Renin Localization: inputs/renin

Nanobody Localization: inputs/nanobodies

Repository Structure 📂

provada-dev/
├── provada/                    # Main package source code
│   ├── components/            # Core components: Evaluators, Generators, Masking Strategies
│   │   ├── README.md          # Component system overview
│   │   ├── EVALUATORS.md      # Guide to creating custom scoring functions
│   │   ├── GENERATORS.md      # Guide to creating custom sequence generators
│   │   ├── MASKING.md         # Guide to creating custom masking strategies
│   │   ├── evaluator.py       # Evaluator base class and built-in evaluators
│   │   ├── generator.py       # Generator base class and built-in generators
│   │   └── masking.py         # Masking strategy base class and built-ins
│   ├── models/                # ML model wrappers (ESM3, ProteinMPNN, ESM2)
│   ├── sampler/               # Sampling algorithms (MADA, Rejection, etc.)
│   ├── sequences/             # Sequence processing and pairwise metrics
│   ├── utils/                 # Utilities (logging, multiprocessing, registry, etc.)
│   ├── base_variant.py        # Base variant class for starting protein
│   ├── paths.py               # Path configuration
│   └── README.md              # Package-level documentation
├── inputs/                    # Input files and configurations
│   └── renin/                 # Example: renin localization experiment
├── tests/                     # Test suite
├── results/                   # Output directory for experimental results
├── ProteinMPNN/               # Third-party ProteinMPNN integration
├── logs/                      # Application logs
├── wandb/                     # Weights & Biases experiment tracking
├── run_provada.py             # Main entry point for running experiments
├── run_multiple.py            # Run multiple experiments in parallel (multi-GPU)
└── conftest.py                # Pytest configuration

Core Components 🧩

ProVADA's modular design is built around three extensible component types:

Evaluators - Score protein sequences based on desired properties (localization, stability, etc.)
Generators - Generate new sequences by filling masked positions (ESM3, ProteinMPNN, etc.)
Masking Strategies - Adaptively select which positions to redesign (DUCB, Thompson Sampling, etc.)

See the Components README for detailed guides on creating custom components.

Tests

To run tests to ensure all functionality works, use the following command:

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
analyze		analyze
assets		assets
inputs		inputs
provada		provada
tests		tests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
create_env.sh		create_env.sh
requirements.txt		requirements.txt
run_multiple.py		run_multiple.py
run_provada.py		run_provada.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProVADA: Conditional Generation of Protein Variants via Ensemble-Guided Test-Time Steering

Publications & Presentations 📚

What is ProVADA? 💡

Set up 🚧

Example 🚀

Renin Localization: inputs/renin

Nanobody Localization: inputs/nanobodies

Repository Structure 📂

Core Components 🧩

Tests

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

SUwonglab/ProVADA

Folders and files

Latest commit

History

Repository files navigation

ProVADA: Conditional Generation of Protein Variants via Ensemble-Guided Test-Time Steering

Publications & Presentations 📚

What is ProVADA? 💡

Set up 🚧

Example 🚀

Renin Localization: inputs/renin

Nanobody Localization: inputs/nanobodies

Repository Structure 📂

Core Components 🧩

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages