Skip to content

CharlesScottBradley/pmas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PMAS — Polar-Multipole Amyloid Scoring

License: Apache 2.0 (engine) Parameter table license: see file v0.5

Closed-form four-feature peptide / protein amyloid scorer with a Chou-Fasman / Hovmöller structural sidecar and a two-sequence co-aggregation compatibility scorer. Calibrated on WaltzDB 2.0, iAmyP, and CPAD 2.0.

Live API: pmas.coracleresearch.com · interactive docs · OpenAPI schema

Calibration headline (v0.5)

Benchmark n AUC MCC (production) F1 (production) MCC (per-benchmark Youden-J)
WaltzDB 2.0 hex 1415 0.8677 0.5970 0.7509 0.5988
iAmyP test split 277 0.8810 0.5960 0.7500 0.6299
CPAD 2.0 L=7-10 pool 242 0.8393 0.5633
CPAD 2.0 L=7 50 0.8738 0.7111 0.9412 0.7111

5-fold stratified CV on WaltzDB hex: AUC 0.8681 ± 0.019, MCC 0.5800 ± 0.039.

Production MCC/F1 are at the calibrated length-aware thresholds that ship in release/parameters.yml (L=6 → 1.686, L=7 → 1.172). Per-benchmark Youden-J columns are provided for apples-to-apples comparison against tools that quote that convention. See release/calibration_notes.md for the full evidence set including ablations, paired-bootstrap CI deltas, and competitive landscape.

Quick start

CLI

git clone https://github.com/CharlesScottBradley/pmas.git
cd pmas
pip install -r requirements.txt

# Score a sequence.
python tools/score_engine.py --sequence KLVFFAE

# Compare a wildtype with a mutation.
python tools/mutation_scorer.py \
    DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA E22Q

# Annotate the structural sidecar.
python tools/structural_context.py \
    DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA A21G --hex-delta -0.025

# Render a per-residue PMAS profile plot.
python tools/visualize.py \
    DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA \
    --mutations E22Q --output abeta42_dutch.png

# Two-sequence co-aggregation compatibility.
ABETA="DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA"
ASYN="MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA"
python tools/coaggregation_scorer.py "$ABETA" "$ASYN" \
    --label-a abeta42 --label-b alpha_synuclein

Library

from tools.score_engine import load_parameters, score_sequence
from tools.mutation_scorer import compare_mutations
from tools.coaggregation_scorer import compare_pair

params = load_parameters()                       # release/parameters.yml
result = score_sequence("KLVFFAE", params)       # ScoreResult dataclass
report = compare_mutations(seq, "E22Q")          # MutationReport dataclass
coagg  = compare_pair(seq_a, seq_b)              # CoaggregationReport dataclass

Docker

# CLI image (entrypoint dispatcher across the 5 tools).
docker build -t pmas:0.5 -f release/docker/Dockerfile .
docker run --rm pmas:0.5 score --sequence KLVFFAE

# REST API image (FastAPI + Jinja landing page).
docker build -t pmas-api:0.5 -f server/Dockerfile .
docker run --rm -p 127.0.0.1:8000:8000 pmas-api:0.5
curl http://127.0.0.1:8000/v1/health

REST API (hosted)

curl -X POST https://pmas.coracleresearch.com/v1/score \
    -H "Content-Type: application/json" \
    -d '{"sequence": "KLVFFAE"}'

See server/README.md for the full endpoint catalog plus curl examples for every tool. Interactive Swagger UI at /docs; OpenAPI schema at /openapi.json.

Repo layout

tools/                     Engine + 5 CLI tools.
server/                    FastAPI app + Jinja landing page.
tests/                     233-test suite (run as `python tests/<file>.py`).
release/
├── parameters.yml         v0.5 calibrated parameter table.
├── calibration_notes.md   Evidence for every design choice.
├── canonical_amyloids/    10 curated amyloid sequences + per-residue CSVs + 22 figures.
├── examples/              10 worked walkthroughs (8 mutation + 2 co-aggregation).
├── docker/                CLI Docker image (entrypoint subcommand dispatcher).
├── human_proteome_scores.csv   Pre-computed PMAS scores for the full UniProt human proteome (n=20,416).
└── README.md              Release-side overview.
LICENSE                    Apache 2.0 (engine + REST API + tests).
LICENSE-PARAMETERS.md      Separate non-commercial terms for the calibrated parameter table.
requirements.txt           Pinned runtime + server dependencies.

What PMAS captures vs flags

PMAS scores hex-window chemistry. Where the in-vivo mechanism is local-chemistry-driven, PMAS captures it cleanly. Where the mechanism is structural / conformational / kinetic, PMAS flags it out-of-domain via the structural_context.py sidecar narrative tags rather than scoring it silently wrong.

Worked examples (full reproduction in release/examples/):

Mutation Δ score Narrative tag What it tells you
Aβ42 E22Q (Dutch) +0.0409 hex_propensity_mild_softening clean capture, 2 created hex
Aβ42 E22G (Iowa) +0.0168 mixed_mechanisms capture + structural risk flag
Aβ42 E22K (Italian) +0.0107 hex_propensity_dominant clean capture (negative ctrl)
Tau P301L +0.0072 hex_propensity_mild_softening local capture upstream of PHF6
α-syn A53T (PD) −0.0018 structural_mechanism_dominant out-of-domain (helix destab.)
IAPP S20G (T2D) −0.0227 structural_mechanism_dominant out-of-domain (loop flex)
Aβ42 A21G (Flemish) −0.0250 structural_mechanism_dominant out-of-domain (APP processing)
HTT polyQ Q21/Q36/Q42 subthr. (no mutation; explicit boundary) out-of-domain (side-chain stack)

Co-aggregation:

Pair max sim n_compat Tag
Aβ42 + α-syn 0.864 5 cross_seeding_compatible
Aβ42 + lysozyme D67H 0.748 0 mixed_compatibility (boundary — folded-core artefact)

The honest applicability boundary is the central interpretive contribution. AggBERT, Cordax, Aggrescan, and TANGO share the same direction failures on A53T / S20G / A21G / polyQ but don't surface the boundary explicitly — PMAS does.

License

  • Engine code, REST API, tests, documentation: Apache 2.0. See LICENSE.
  • Calibrated parameter table (release/parameters.yml, release/human_proteome_scores.csv, the per-residue canonical amyloid CSVs): non-commercial research use under the terms in LICENSE-PARAMETERS.md. Commercial use requires a separate license.

Citation

Preprint and DOI pending. For now, please cite as:

PMAS — Polar-Multipole Amyloid Scoring v0.5. Calibrated on WaltzDB 2.0 / iAmyP test split / CPAD 2.0. https://github.com/CharlesScottBradley/pmas

References

PMAS draws on canonical biochemistry references for the structural sidecar:

  • Chou, P.Y. & Fasman, G.D. (1974). Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-222.
  • Pauling, L. & Corey, R.B. (1951). Configurations of polypeptide chains with favored orientations around single bonds. PNAS 37, 729-740.
  • Hovmöller, S., Zhou, T. & Ohlson, T. (2002). Conformations of amino acids in protein structures. Acta Cryst. D58, 768-776.
  • Eisenberg, D., Schwarz, E., Komaromy, M. & Wall, R. (1984). Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179, 125-142.

Per-walkthrough disease-specific references are documented in each markdown file under release/examples/.

About

Polar-Multipole Amyloid Scoring — closed-form peptide/protein amyloid predictor with Chou-Fasman/Hovmöller structural sidecar + two-sequence co-aggregation scorer. Calibrated on WaltzDB 2.0 / iAmyP / CPAD 2.0.

Topics

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-PARAMETERS.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages