Skip to content

popchanovska/Regio450

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RegioP450: Atom-level metabolism prediction for P450 enzymes

A deep learning model that predicts where on a substrate molecule a P450 enzyme will perform oxidation — not just whether it will.


Status: exploratory / work in progress. This is a learning project and a research sketch, not a finished or validated model. The code, design choices, and results below are all subject to change as I experiment, make mistakes, and learn. Nothing here should be treated as benchmarked, peer-reviewed, or production-ready.

Research Summary

Motivation

Cytochrome P450 enzymes (CYPs) are responsible for oxidizing app. 75% of all clinically used drugs. Knowing which P450 acts on a drug is useful; knowing where on the drug molecule it acts is transformative.

The site of metabolism (SOM) — the specific atom that gets oxidized — determines:

  • The identity and toxicity of the resulting metabolite
  • Drug-drug interaction potential when two drugs compete at the same CYP
  • Whether a metabolite retains or loses pharmacological activity
  • Where to install metabolic "blocking groups" during lead optimization

Project structure

RegioP450/
├── data/
│   ├── raw/                    # Downloaded source databases
│   ├── processed/              # Featurized PyG graphs + ESM embeddings
│   └── splits/                 # Train/val/test JSON splits
├── src/
│   ├── data/
│   │   ├── featurizer.py       # Molecule → graph, protein → ESM embedding
│   │   ├── dataset.py          # PyTorch Dataset / DataLoader
│   │   └── preprocessing.py    # Raw data → labeled triples
│   ├── models/
│   │   ├── protein_encoder.py  # ESM2 wrapper + projection
│   │   ├── molecule_encoder.py # GAT-based molecular graph encoder
│   │   ├── cross_attention.py  # Atom–residue cross-attention module
│   │   └── regio_p450.py       # Full RegioP450 model
│   ├── training/
│   │   ├── trainer.py          # Training loop, checkpointing, logging
│   │   └── losses.py           # Focal loss + auxiliary losses
│   ├── evaluation/
│   │   └── metrics.py          # Top-k accuracy, AUC, isoform breakdown
│   └── utils/
│       └── utils.py            # Logging, seeding, config helpers
├── scripts/
│   ├── prepare_data.py         # Download + preprocess raw data
│   ├── train.py                # Launch training run
│   └── predict.py              # Run inference on new pairs
├── configs/
│   └── default.yaml            # All hyperparameters
├── environment.yml
└── README.md

About

Atom-level metabolism prediction for P450 enzymes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages