GQP: Gated Query Pooling for Peptide Developability

Official implementation for sequence-only peptide developability screening with GQP.

Designing peptides for microplastic targeting is intrinsically multi-objective: sequence motifs that promote adsorption to hydrophobic polymers frequently elevate developability risks, including hemolysis, non-specific adsorption, and poor aqueous solubility. In this paper, we show that accurate developability screening can be achieved from sequence alone by focusing on the readout that converts token-level foundation model representations into peptide-level decisions. We introduce gated query pooling (GQP), a lightweight, backbone-agnostic evidence-selection head that learns a small set of query vectors to extract complementary signals from protein language model embeddings and gates them adaptively per peptide. Across three developability tasks, GQP paired with sequence-only backbones attains 91.09% accuracy for hemolysis, 86.30% for non-fouling, and 75.56% for solubility, matching or exceeding representative sequence-only and AlphaFold-structure-augmented Multi-Peptide baseline while providing superior performance under limited labeled data. Beyond predictive accuracy, attention diagnostics and controlled counterfactual substitutions enable residue-level, testable design rules that connect model outputs to actionable sequence edits. Finally, integrating these developability constraints with PepBD-derived affinity scores for polyethylene, polypropylene, and polyethylene terephthalate supports scalable multi-objective prioritization of microplastic-binding candidates and reveals non-fouling as a dominant feasibility bottleneck, with coarse-grained molecular dynamics triage providing complementary physical evidence supporting the plausibility of the PepBD-prioritized selections.

Setup

Run inside GQP/:

python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt

Training and Evaluation

Single-task training example (hemo):

python train.py train \
  --train_json datasets/jsonl/hemo/train.jsonl \
  --val_json datasets/jsonl/hemo/val.jsonl \
  --out_dir runs/gqp_demo/hemo \
  --esm_backend hf \
  --esm_model facebook/esm2_t33_650M_UR50D \
  --pool_type prompt \
  --prompt_tokens 4 \
  --train_backbone

Evaluation:

python train.py eval \
  --model_dir runs/gqp_demo/hemo \
  --test_json datasets/jsonl/hemo/val.jsonl

Batch run (hemo/sol/nf):

bash train_gqp.sh

Diagnostics

Attention diagnostics:

python scripts/diagnostics/attention/eval_attn_char_stats.py \
  --model_dir runs/gqp_demo/hemo \
  --test_json datasets/jsonl/hemo/val.jsonl \
  --out_dir outputs/attention/hemo

Main output:

frequency_weighted_attention_mass_difference.png

Controlled counterfactual substitutions (CSE):

python scripts/diagnostics/counterfactual/eval_ism_cse.py \
  --model_dir runs/gqp_demo/hemo \
  --jsonl datasets/jsonl/hemo/val.jsonl \
  --out_dir outputs/cse/hemo

Main outputs:

controlled_substitution_effect_heatmap_<task>.png
residue_intervenability_barplot_<task>.png

Citation

@misc{chen2026gqp_peptide_developability,
  title        = {Rethinking Peptide Developability with Sequence-Only Models: Interpretable Screening of Microplastic-Binding Peptides with Gated Query Pooling},
  author       = {Guangyao Chen and Fengqi You},
  year         = {2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
datasets		datasets
gqp		gqp
imgs		imgs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
train_gqp.sh		train_gqp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GQP: Gated Query Pooling for Peptide Developability

Setup

Training and Evaluation

Diagnostics

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GQP: Gated Query Pooling for Peptide Developability

Setup

Training and Evaluation

Diagnostics

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages