✨ GNN Challenge: Liar Nodes✨

The Problem

For most neural networks, classification tasks are made individually based on the embedding of each data point. Samples that belong to the same class tend to have similar embeddings and therefore lie close to each other in the embedding space.

Graph Neural Networks (GNNs) fundamentally alter this paradigm by performing collective decision making. Rather than relying solely on the representation (embedding) of an individual node, the classification decision is influenced by the structure of the graph and by aggregated information from neighboring nodes.

💡 The Core Question

This raises an important question: what should a model do when there is a contradiction between the information provided by a node's embedding and the information coming from its neighborhood? aka adversary learning

🎮 Challenge Overview

In this challenge, the goal is to implement mechanisms that balance these two sources of information that might contradict each other. Node features are manually corrupted, while graph connectivity encodes contextual relationships that may either reinforce or contradict a node's individual features.

We apply this concept to Cancer biology! A cell's environment can give misleading cues , a malignant cell might “look normal” in isolation, or neighboring immune cells might influence its signaling,the node’s own features (gene expression) might conflict with neighborhood information (similar cells in the microenvironment), mimicking real biological “contradictions.”

Participants must adapt GNN models using techniques such as neighborhood sampling and aggregation to learn when to trust the node embedding, when to trust the neighborhood, and how to effectively combine both in order to perform node classification and detect the cancerous cells.

Critical Constraint

The provided node embeddings are fixed and cannot be modified or recomputed; participants must rely solely on graph-based aggregation and sampling strategies to resolve conflicting signals.

💭 Hint

This challenge is not about inventing a new GNN, it is about choosing the right way to listen.

You can’t tell which cells are sending corrupted signals , you must be clever about how you gather and trust information from the tumor microenvironment!

What Data You Will Use

The Cancer Single-cell Expression Map (CancerSCEM) dataset provides comprehensive single-cell RNA-seq data across multiple cancer types. Instead of raw sequencing reads, the dataset offers UMI count matrices in standard .tsv format for downstream analysis. Each dataset includes: Gene Expression Matrices per cell enabling gene descriptio/Cell-Type Annotations and Components /Functional Molecules /Cell-Cell Interaction Data . You can read more about this data source on https://ngdc.cncb.ac.cn/cancerscem/index . As for this task, we have altered samples from this data to present it for a GNN training tasks, you will be given:

🔗 A (Adjacency Matrix)

Encodes the connections and structure of your input graph.
Most connections reflect real interactions between cells, but some edges may have been altered, introducing noise.

📈 X (Node Embeddings)

Embeddings of nodes == Genes encodings
⚠️ Some nodes have been corrupted to trick you!
You cannot modify these embeddings

🎭 Mask (Corruption Indicator)

Gives you a hint about which nodes have been corrupted
Available in train.csv
NOT available in test.csv (this is what you need to test on)
The corruption mask may be used only during training to modulate aggregation, weighting, or attention — not to filter nodes or labels.

📁 Data Files

train.npz - Contains features AND corruption mask.
test.csv - Contains features only (no labels, no mask)
labels.csv - Contains labels for the full training data
edges.csv - Describes the communication between the cells Please review the baseline code to help you better navigate the data.

🚫 Constraints

You CANNOT

Remove the noisy nodes based on the mask from the training data
Alter the embedding of the nodes
Use external data

You CAN:

Use the mask to guide your aggregation strategy
Implement creative neighborhood sampling
Design custom aggregation functions
Use attention mechanisms
Combine multiple GNN layers strategically
Look up additional resources to understand the behavior of the cancerous cells.

Your Mission ✨✨

Build a GNN that can navigate the noise and make accurate predictions by:

Learning when to trust individual node features
Learning when to trust neighborhood aggregation
Build robust GNNs can detect anomalies or corrupted signals — similar to spotting a misbehaving cell in a tumor.

Evaluation Metrics

Your submissions will be evaluated on the test set using the overall classification correctness

🏆 Winning Criteria

The winner will be determined by the submission that achieves the best classification accuracy . Check the leaderboard to see other's rankings : https://noormajdoub.github.io/Challenge/ ✨✨

Submission Process

Fork this repository
Add your predictions: submissions/submission_<username>.csv (create the submissions folder and put your csv file in it)
Create Pull Request: *Base : Main

*Name Submission: <your_username>

*Compare: your fork/branch with the submission file
Merge
Leaderboard updated within 24 hours :https://noormajdoub.github.io/Challenge/

Good luck, and may the best aggregation strategy win!

Resources

Convex Adversarial Collective Classification ;MohamadAli Torkamani, Daniel Lowd Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):642-650, 2013.

Adversarial Training for Graph Neural Networks: Pitfalls, Solutions, and New Directions Lukas Gosch, Simon Geisler, Daniel Sturm, Bertrand Charpentier, Daniel Zügner, Stephan Günnemann

The Reason Why Cancer is so Hard to Beat -Kurzgesagt – In a Nutshell

A Multimodal Graph Neural Network Framework of Cancer Molecular Subtype Classification

Cancer Single-cell Expression Map

Graph Neural Networks in Cancer and Oncology Research: Emerging and Future Trends Grigoriy Gogoshin 1,*, Andrei S Rodin 1,

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github		.github
data		data
docs		docs
leaderboard		leaderboard
starter_code		starter_code
.gitignore		.gitignore
Challenge Picture!.png		Challenge Picture!.png
LICENSE		LICENSE
README.md		README.md
picture.png		picture.png
render_leaderboard.py		render_leaderboard.py
scoring_script.py		scoring_script.py
update_leaderboard.py		update_leaderboard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ GNN Challenge: Liar Nodes✨

The Problem

💡 The Core Question

🎮 Challenge Overview

Critical Constraint

💭 Hint

What Data You Will Use

🔗 A (Adjacency Matrix)

📈 X (Node Embeddings)

🎭 Mask (Corruption Indicator)

📁 Data Files

🚫 Constraints

You CANNOT

You CAN:

Your Mission ✨✨

Evaluation Metrics

🏆 Winning Criteria

Submission Process

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ GNN Challenge: Liar Nodes✨

The Problem

💡 The Core Question

🎮 Challenge Overview

Critical Constraint

💭 Hint

What Data You Will Use

🔗 A (Adjacency Matrix)

📈 X (Node Embeddings)

🎭 Mask (Corruption Indicator)

📁 Data Files

🚫 Constraints

You CANNOT

You CAN:

Your Mission ✨✨

Evaluation Metrics

🏆 Winning Criteria

Submission Process

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages