Closed-form four-feature peptide / protein amyloid scorer with a Chou-Fasman / Hovmöller structural sidecar and a two-sequence co-aggregation compatibility scorer. Calibrated on WaltzDB 2.0, iAmyP, and CPAD 2.0.
Live API: pmas.coracleresearch.com · interactive docs · OpenAPI schema
| Benchmark | n | AUC | MCC (production) | F1 (production) | MCC (per-benchmark Youden-J) |
|---|---|---|---|---|---|
| WaltzDB 2.0 hex | 1415 | 0.8677 | 0.5970 | 0.7509 | 0.5988 |
| iAmyP test split | 277 | 0.8810 | 0.5960 | 0.7500 | 0.6299 |
| CPAD 2.0 L=7-10 pool | 242 | 0.8393 | — | — | 0.5633 |
| CPAD 2.0 L=7 | 50 | 0.8738 | 0.7111 | 0.9412 | 0.7111 |
5-fold stratified CV on WaltzDB hex: AUC 0.8681 ± 0.019, MCC 0.5800 ± 0.039.
Production MCC/F1 are at the calibrated length-aware thresholds
that ship in release/parameters.yml
(L=6 → 1.686, L=7 → 1.172). Per-benchmark Youden-J columns are
provided for apples-to-apples comparison against tools that quote
that convention. See release/calibration_notes.md
for the full evidence set including ablations, paired-bootstrap CI
deltas, and competitive landscape.
git clone https://github.com/CharlesScottBradley/pmas.git
cd pmas
pip install -r requirements.txt
# Score a sequence.
python tools/score_engine.py --sequence KLVFFAE
# Compare a wildtype with a mutation.
python tools/mutation_scorer.py \
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA E22Q
# Annotate the structural sidecar.
python tools/structural_context.py \
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA A21G --hex-delta -0.025
# Render a per-residue PMAS profile plot.
python tools/visualize.py \
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA \
--mutations E22Q --output abeta42_dutch.png
# Two-sequence co-aggregation compatibility.
ABETA="DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA"
ASYN="MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA"
python tools/coaggregation_scorer.py "$ABETA" "$ASYN" \
--label-a abeta42 --label-b alpha_synucleinfrom tools.score_engine import load_parameters, score_sequence
from tools.mutation_scorer import compare_mutations
from tools.coaggregation_scorer import compare_pair
params = load_parameters() # release/parameters.yml
result = score_sequence("KLVFFAE", params) # ScoreResult dataclass
report = compare_mutations(seq, "E22Q") # MutationReport dataclass
coagg = compare_pair(seq_a, seq_b) # CoaggregationReport dataclass# CLI image (entrypoint dispatcher across the 5 tools).
docker build -t pmas:0.5 -f release/docker/Dockerfile .
docker run --rm pmas:0.5 score --sequence KLVFFAE
# REST API image (FastAPI + Jinja landing page).
docker build -t pmas-api:0.5 -f server/Dockerfile .
docker run --rm -p 127.0.0.1:8000:8000 pmas-api:0.5
curl http://127.0.0.1:8000/v1/healthcurl -X POST https://pmas.coracleresearch.com/v1/score \
-H "Content-Type: application/json" \
-d '{"sequence": "KLVFFAE"}'See server/README.md for the full endpoint catalog
plus curl examples for every tool. Interactive Swagger UI at
/docs; OpenAPI schema at
/openapi.json.
tools/ Engine + 5 CLI tools.
server/ FastAPI app + Jinja landing page.
tests/ 233-test suite (run as `python tests/<file>.py`).
release/
├── parameters.yml v0.5 calibrated parameter table.
├── calibration_notes.md Evidence for every design choice.
├── canonical_amyloids/ 10 curated amyloid sequences + per-residue CSVs + 22 figures.
├── examples/ 10 worked walkthroughs (8 mutation + 2 co-aggregation).
├── docker/ CLI Docker image (entrypoint subcommand dispatcher).
├── human_proteome_scores.csv Pre-computed PMAS scores for the full UniProt human proteome (n=20,416).
└── README.md Release-side overview.
LICENSE Apache 2.0 (engine + REST API + tests).
LICENSE-PARAMETERS.md Separate non-commercial terms for the calibrated parameter table.
requirements.txt Pinned runtime + server dependencies.
PMAS scores hex-window chemistry. Where the in-vivo mechanism is
local-chemistry-driven, PMAS captures it cleanly. Where the
mechanism is structural / conformational / kinetic, PMAS flags it
out-of-domain via the structural_context.py sidecar narrative
tags rather than scoring it silently wrong.
Worked examples (full reproduction in release/examples/):
| Mutation | Δ score | Narrative tag | What it tells you |
|---|---|---|---|
| Aβ42 E22Q (Dutch) | +0.0409 | hex_propensity_mild_softening |
clean capture, 2 created hex |
| Aβ42 E22G (Iowa) | +0.0168 | mixed_mechanisms |
capture + structural risk flag |
| Aβ42 E22K (Italian) | +0.0107 | hex_propensity_dominant |
clean capture (negative ctrl) |
| Tau P301L | +0.0072 | hex_propensity_mild_softening |
local capture upstream of PHF6 |
| α-syn A53T (PD) | −0.0018 | structural_mechanism_dominant |
out-of-domain (helix destab.) |
| IAPP S20G (T2D) | −0.0227 | structural_mechanism_dominant |
out-of-domain (loop flex) |
| Aβ42 A21G (Flemish) | −0.0250 | structural_mechanism_dominant |
out-of-domain (APP processing) |
| HTT polyQ Q21/Q36/Q42 | subthr. | (no mutation; explicit boundary) | out-of-domain (side-chain stack) |
Co-aggregation:
| Pair | max sim | n_compat | Tag |
|---|---|---|---|
| Aβ42 + α-syn | 0.864 | 5 | cross_seeding_compatible |
| Aβ42 + lysozyme D67H | 0.748 | 0 | mixed_compatibility (boundary — folded-core artefact) |
The honest applicability boundary is the central interpretive contribution. AggBERT, Cordax, Aggrescan, and TANGO share the same direction failures on A53T / S20G / A21G / polyQ but don't surface the boundary explicitly — PMAS does.
- Engine code, REST API, tests, documentation: Apache 2.0. See
LICENSE. - Calibrated parameter table (
release/parameters.yml,release/human_proteome_scores.csv, the per-residue canonical amyloid CSVs): non-commercial research use under the terms inLICENSE-PARAMETERS.md. Commercial use requires a separate license.
Preprint and DOI pending. For now, please cite as:
PMAS — Polar-Multipole Amyloid Scoring v0.5. Calibrated on WaltzDB 2.0 / iAmyP test split / CPAD 2.0. https://github.com/CharlesScottBradley/pmas
PMAS draws on canonical biochemistry references for the structural sidecar:
- Chou, P.Y. & Fasman, G.D. (1974). Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-222.
- Pauling, L. & Corey, R.B. (1951). Configurations of polypeptide chains with favored orientations around single bonds. PNAS 37, 729-740.
- Hovmöller, S., Zhou, T. & Ohlson, T. (2002). Conformations of amino acids in protein structures. Acta Cryst. D58, 768-776.
- Eisenberg, D., Schwarz, E., Komaromy, M. & Wall, R. (1984). Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179, 125-142.
Per-walkthrough disease-specific references are documented in each
markdown file under release/examples/.