🧬 GLIMPS-GNN

Graph-based Liquid-biopsy Inductive Modeling for PreeclampSia

GNN Challenge: cfRNA → Placenta Inductive Prediction

This repository hosts a prediction-only challenge focused on maternal-fetal health modeling using graph learning. Participant code is run outside this repository. Submissions are scored in CI against hidden labels.

🏆 Click me to join competition

Scientific Focus

Inductive graph learning across cfRNA and placental transcriptomics to detect maternal-fetal health issues.
Learn transferable representations that generalize to unseen samples and domains rather than treating each dataset independently.

Alignment with BASIRA Lab's Mission

Prioritizes robust generalization across heterogeneous datasets.
Uses compute-efficient, non-data-hungry graph learning methods that can run on standard hardware.

Inspiration from GNN Literature

Draws from studies on inductive learning, message passing, and representation transfer.
Model design follows DGL Lectures 1.1-4.6, covering:
- Graph construction from tabular data
- Node feature encoding
- Neighborhood aggregation (GraphSAGE-style inductive updates)
- Mini-batch training via neighborhood sampling
- Inductive inference on unseen nodes

Overview

Task: Binary classification (0=Control, 1=Preeclampsia)
Setting: Inductive transfer from cfRNA (train) to placenta (test)
Primary metric: F1 Score
Additional metrics: Accuracy, Precision, Recall
Public leaderboard: Auto-updated after merged submissions

Dataset Source and Description

Source

Public datasets from Gene Expression Omnibus (GEO, NIH)
Maternal plasma cfRNA: GSE192902
Placental RNA-seq: GSE234729

Data Splits

Training set: cfRNA samples
Test set: placenta samples (unseen during training)
Labels: binary disease status

Purpose and Integration Goal

Identify and validate cfRNA biomarkers for early prediction of preeclampsia, often before clinical symptoms appear.
Support research in maternal-fetal health and early detection of preeclampsia.
Integrate gene expression and clinical metadata to capture subtle risk patterns while handling noisy and imbalanced data for robust and equitable predictions.

🧩 Mandatory Graph Specification

This competition explicitly provides both required graph components:

Adjacency matrix A: data/public/adjacency_matrix.csv
Node feature matrix X: derived from data/public/train.csv and data/public/test.csv

Related graph files:

data/public/graph_edges.csv
data/public/node_types.csv
data/public/graph_artifacts.pt

Interpretation:

A[i, j] = 1 indicates an edge between nodes i and j, else 0
X is node-by-feature and includes harmonized expression features and released covariates
Node alignment is by node_id; use data/public/test_nodes.csv (and node files) as the ordering reference so rows in X correspond to the same nodes indexed in A.

🌍 Dataset Difficulty and Realism

The benchmark includes meaningful modeling difficulty:

🧪 Noisy and partially missing metadata
⚖️ Label imbalance pressure
🧬 High-dimensional features relative to sample size (sparsity pressure)
🔄 Cross-domain distribution shift (cfRNA -> placenta)
🕸️ Inductive generalization to unseen test nodes

⏱️ Computational Affordability

Full training should not exceed 3 hours on CPU per competition.
If needed, downsize graph complexity (for example by reducing node count, edge density, or neighborhood sampling size) while preserving task integrity.

Dataset Construction and Preprocessing

build_dataset.ipynb and Kaggle

Objective: Ensure structural compatibility for graph construction and inductive learning by handling expression data, parsing and cleaning metadata, and expression-metadata fusion.

Advanced GNN Implementation

advanced_GNN_model.py

Objective: Implement an advanced inductive GNN for cfRNA -> placenta prediction, ensuring generalizable node representations and inductive learning.

Key Components:

Graph Construction: Build hetero-graphs using similarity and ancestry edges.
Node Feature Encoding: Integrate gene expression and metadata into node-level features.
Neighborhood Aggregation: GraphSAGE-style layers with BatchNorm and ReLU for neighbor information propagation.
Mini-Batch Training: Use neighborhood sampling for efficient training on large graphs.
Inductive Inference: Generate predictions for unseen placenta nodes without label leakage.

Starter Assets

starter_code/advanced_GNN_model.py
starter_code/baseline.py
starter_code/build_adjacency_matrix.py
starter_code/build_graph_artifacts.py

Submission Policy

Submission instructions are in CONTRIBUTING.md.

Key policy:

Only one submission attempt per participant (enforced in CI)
Submission files are public but participant predictions are encrypted at rest (predictions.csv.enc); only CI with organizer secrets decrypts for scoring.

Leaderboard

Public page: https://mubarraqqq.github.io/gnn-challenge/leaderboard.html
Source CSV: leaderboard/leaderboard.csv
Rendered markdown: leaderboard.md
Tie handling: equal scores share rank

Maintainer Regeneration Command

Use this command to regenerate all leaderboard outputs from the canonical pipeline:

python update_leaderboard.py && python competition/render_leaderboard.py

Citation

@dataset{gnn_challenge_2026,
  title={GNN Challenge: cfRNA -> Placenta Inductive GNN for Maternal-Fetal Health Prediction},
  author={Mubaraq Onipede},
  year={2026},
  url={https://github.com/Mubarraqqq/gnn-challenge}
}

License

See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 GLIMPS-GNN

GNN Challenge: cfRNA → Placenta Inductive Prediction

Scientific Focus

Alignment with BASIRA Lab's Mission

Inspiration from GNN Literature

Overview

Dataset Source and Description

Source

Data Splits

Purpose and Integration Goal

🧩 Mandatory Graph Specification

🌍 Dataset Difficulty and Realism

⏱️ Computational Affordability

Dataset Construction and Preprocessing

Advanced GNN Implementation

Starter Assets

Submission Policy

Leaderboard

Maintainer Regeneration Command

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.github		.github
competition		competition
data		data
docs		docs
images		images
leaderboard		leaderboard
organizer_scripts		organizer_scripts
starter_code		starter_code
submissions		submissions
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
leaderboard.md		leaderboard.md
scoring_script.py		scoring_script.py
test_submission_infrastructure.py		test_submission_infrastructure.py
update_leaderboard.py		update_leaderboard.py

Folders and files

Latest commit

History

Repository files navigation

🧬 GLIMPS-GNN

GNN Challenge: cfRNA → Placenta Inductive Prediction

Scientific Focus

Alignment with BASIRA Lab's Mission

Inspiration from GNN Literature

Overview

Dataset Source and Description

Source

Data Splits

Purpose and Integration Goal

🧩 Mandatory Graph Specification

🌍 Dataset Difficulty and Realism

⏱️ Computational Affordability

Dataset Construction and Preprocessing

Advanced GNN Implementation

Starter Assets

Submission Policy

Leaderboard

Maintainer Regeneration Command

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages