Skip to content

Graph Convolutional Matrix Factorization #442

Draft
PascalIversen wants to merge 1 commit into
developmentfrom
feat/gcmf-geometric-mf
Draft

Graph Convolutional Matrix Factorization #442
PascalIversen wants to merge 1 commit into
developmentfrom
feat/gcmf-geometric-mf

Conversation

@PascalIversen

Copy link
Copy Markdown
Collaborator

Graph convolutional matrix factorization for drug-response prediction. Predicts the cell-line x drug response matrix as R = U V^T (as in normal matrix fact.) But the latent factors are learned end2end by graph convolutions over feature-similarity or prior knowledge graphs. One can use mutiple graphs for drugs and cell lines in the relational gcmf version.
The P-versions also produce aleatoric uncertainty estimates.
The main idea is to combine learning in primal and dual space (one can use a large-dimensional feature space and transform it to a similarity, and exploit it without blowing up the model complexity). Also, the graph convolution smooths the embeddings with those of the neighboring cells, which I hope prevents overfitting/helps generalization.

The variants are

  • GCMF: base single-graph model.
  • RGCMF: relational/multi-graph variant (currently with multi-omics cell graphs; pathway and
    bioassay drug relations) fused by a relational graph convolution. (works best on LCO per my experiments)
  • PGCMF / PRGCMF: probabilistic variants with a heteroscedastic Gaussian-NLL
    head emitting calibrated per-prediction uncertainty.

Per my experiments this is chocolate worthy but will rerun on the leaderboard scripts.

I will remove the csvs etc, just wanted to backup here

@PascalIversen PascalIversen marked this pull request as draft June 23, 2026 14:09
Graph-convolutional matrix factorization for drug-response prediction. Predicts the
cell-line x drug response matrix as R = U V^T (as in ordinary matrix factorization),
but the latent factors are learned end-to-end by graph convolutions over
feature-similarity or prior-knowledge graphs (k-NN gene-expression graph for cell
lines, Morgan-fingerprint Tanimoto graph for drugs). Each convolution smooths a node's
embedding over its graph neighbours; the smoothed factors are read out by the dot
product, plus per-cell/per-drug biases and an optional MLP head.

Four selectable models, all registered in MULTI_DRUG_MODEL_FACTORY:
* GCMF   - base single-graph model.
* RGCMF  - relational/multi-graph variant (multi-omics cell graphs; pathway and
           bioassay drug relations) fused by a relational graph convolution.
           Works best on leave-cell-out in our experiments.
* PGCMF / PRGCMF - probabilistic variants with a heteroscedastic Gaussian-NLL head
           emitting calibrated per-prediction aleatoric uncertainty.

Includes hyperparameters.yaml for all four, bundled gzipped drug-relation resources
for RGCMF/PRGCMF, smoke tests (train/predict/save-load round-trip), docs (model page
+ toctree + usage table), and a check-added-large-files exclude for the bundled
resources.

Lint/type clean: flake8 0, mypy 0, black/isort clean, pre-commit green.
@PascalIversen PascalIversen force-pushed the feat/gcmf-geometric-mf branch from 60e0898 to 9ec282e Compare June 23, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant