Graph Convolutional Matrix Factorization #442
Draft
PascalIversen wants to merge 1 commit into
Draft
Conversation
Graph-convolutional matrix factorization for drug-response prediction. Predicts the
cell-line x drug response matrix as R = U V^T (as in ordinary matrix factorization),
but the latent factors are learned end-to-end by graph convolutions over
feature-similarity or prior-knowledge graphs (k-NN gene-expression graph for cell
lines, Morgan-fingerprint Tanimoto graph for drugs). Each convolution smooths a node's
embedding over its graph neighbours; the smoothed factors are read out by the dot
product, plus per-cell/per-drug biases and an optional MLP head.
Four selectable models, all registered in MULTI_DRUG_MODEL_FACTORY:
* GCMF - base single-graph model.
* RGCMF - relational/multi-graph variant (multi-omics cell graphs; pathway and
bioassay drug relations) fused by a relational graph convolution.
Works best on leave-cell-out in our experiments.
* PGCMF / PRGCMF - probabilistic variants with a heteroscedastic Gaussian-NLL head
emitting calibrated per-prediction aleatoric uncertainty.
Includes hyperparameters.yaml for all four, bundled gzipped drug-relation resources
for RGCMF/PRGCMF, smoke tests (train/predict/save-load round-trip), docs (model page
+ toctree + usage table), and a check-added-large-files exclude for the bundled
resources.
Lint/type clean: flake8 0, mypy 0, black/isort clean, pre-commit green.
60e0898 to
9ec282e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Graph convolutional matrix factorization for drug-response prediction. Predicts the cell-line x drug response matrix as R = U V^T (as in normal matrix fact.) But the latent factors are learned end2end by graph convolutions over feature-similarity or prior knowledge graphs. One can use mutiple graphs for drugs and cell lines in the relational gcmf version.
The P-versions also produce aleatoric uncertainty estimates.
The main idea is to combine learning in primal and dual space (one can use a large-dimensional feature space and transform it to a similarity, and exploit it without blowing up the model complexity). Also, the graph convolution smooths the embeddings with those of the neighboring cells, which I hope prevents overfitting/helps generalization.
The variants are
bioassay drug relations) fused by a relational graph convolution. (works best on LCO per my experiments)
head emitting calibrated per-prediction uncertainty.
Per my experiments this is chocolate worthy but will rerun on the leaderboard scripts.
I will remove the csvs etc, just wanted to backup here