Skip to content

PEESEgroup/GQP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GQP: Gated Query Pooling for Peptide Developability

Official implementation for sequence-only peptide developability screening with GQP.

Designing peptides for microplastic targeting is intrinsically multi-objective: sequence motifs that promote adsorption to hydrophobic polymers frequently elevate developability risks, including hemolysis, non-specific adsorption, and poor aqueous solubility. In this paper, we show that accurate developability screening can be achieved from sequence alone by focusing on the readout that converts token-level foundation model representations into peptide-level decisions. We introduce gated query pooling (GQP), a lightweight, backbone-agnostic evidence-selection head that learns a small set of query vectors to extract complementary signals from protein language model embeddings and gates them adaptively per peptide. Across three developability tasks, GQP paired with sequence-only backbones attains 91.09% accuracy for hemolysis, 86.30% for non-fouling, and 75.56% for solubility, matching or exceeding representative sequence-only and AlphaFold-structure-augmented Multi-Peptide baseline while providing superior performance under limited labeled data. Beyond predictive accuracy, attention diagnostics and controlled counterfactual substitutions enable residue-level, testable design rules that connect model outputs to actionable sequence edits. Finally, integrating these developability constraints with PepBD-derived affinity scores for polyethylene, polypropylene, and polyethylene terephthalate supports scalable multi-objective prioritization of microplastic-binding candidates and reveals non-fouling as a dominant feasibility bottleneck, with coarse-grained molecular dynamics triage providing complementary physical evidence supporting the plausibility of the PepBD-prioritized selections.

Figure 1

Setup

Run inside GQP/:

python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt

Training and Evaluation

Single-task training example (hemo):

python train.py train \
  --train_json datasets/jsonl/hemo/train.jsonl \
  --val_json datasets/jsonl/hemo/val.jsonl \
  --out_dir runs/gqp_demo/hemo \
  --esm_backend hf \
  --esm_model facebook/esm2_t33_650M_UR50D \
  --pool_type prompt \
  --prompt_tokens 4 \
  --train_backbone

Evaluation:

python train.py eval \
  --model_dir runs/gqp_demo/hemo \
  --test_json datasets/jsonl/hemo/val.jsonl

Batch run (hemo/sol/nf):

bash train_gqp.sh

Diagnostics

Attention diagnostics:

python scripts/diagnostics/attention/eval_attn_char_stats.py \
  --model_dir runs/gqp_demo/hemo \
  --test_json datasets/jsonl/hemo/val.jsonl \
  --out_dir outputs/attention/hemo

Main output:

  • frequency_weighted_attention_mass_difference.png

Controlled counterfactual substitutions (CSE):

python scripts/diagnostics/counterfactual/eval_ism_cse.py \
  --model_dir runs/gqp_demo/hemo \
  --jsonl datasets/jsonl/hemo/val.jsonl \
  --out_dir outputs/cse/hemo

Main outputs:

  • controlled_substitution_effect_heatmap_<task>.png
  • residue_intervenability_barplot_<task>.png

Citation

@misc{chen2026gqp_peptide_developability,
  title        = {Rethinking Peptide Developability with Sequence-Only Models: Interpretable Screening of Microplastic-Binding Peptides with Gated Query Pooling},
  author       = {Guangyao Chen and Fengqi You},
  year         = {2026},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors