A deep learning model that predicts where on a substrate molecule a P450 enzyme will perform oxidation — not just whether it will.
Status: exploratory / work in progress. This is a learning project and a research sketch, not a finished or validated model. The code, design choices, and results below are all subject to change as I experiment, make mistakes, and learn. Nothing here should be treated as benchmarked, peer-reviewed, or production-ready.
Cytochrome P450 enzymes (CYPs) are responsible for oxidizing app. 75% of all clinically used drugs. Knowing which P450 acts on a drug is useful; knowing where on the drug molecule it acts is transformative.
The site of metabolism (SOM) — the specific atom that gets oxidized — determines:
- The identity and toxicity of the resulting metabolite
- Drug-drug interaction potential when two drugs compete at the same CYP
- Whether a metabolite retains or loses pharmacological activity
- Where to install metabolic "blocking groups" during lead optimization
RegioP450/
├── data/
│ ├── raw/ # Downloaded source databases
│ ├── processed/ # Featurized PyG graphs + ESM embeddings
│ └── splits/ # Train/val/test JSON splits
├── src/
│ ├── data/
│ │ ├── featurizer.py # Molecule → graph, protein → ESM embedding
│ │ ├── dataset.py # PyTorch Dataset / DataLoader
│ │ └── preprocessing.py # Raw data → labeled triples
│ ├── models/
│ │ ├── protein_encoder.py # ESM2 wrapper + projection
│ │ ├── molecule_encoder.py # GAT-based molecular graph encoder
│ │ ├── cross_attention.py # Atom–residue cross-attention module
│ │ └── regio_p450.py # Full RegioP450 model
│ ├── training/
│ │ ├── trainer.py # Training loop, checkpointing, logging
│ │ └── losses.py # Focal loss + auxiliary losses
│ ├── evaluation/
│ │ └── metrics.py # Top-k accuracy, AUC, isoform breakdown
│ └── utils/
│ └── utils.py # Logging, seeding, config helpers
├── scripts/
│ ├── prepare_data.py # Download + preprocess raw data
│ ├── train.py # Launch training run
│ └── predict.py # Run inference on new pairs
├── configs/
│ └── default.yaml # All hyperparameters
├── environment.yml
└── README.md