NodeLink Nexus

Multi-Task Graph Representation Challenge

This challenge explores how well a single graph neural network (GNN) can learn shared node representations that generalize across multiple graph tasks on the same dataset.

Participants must train a model that performs well on both node classification and link prediction, using only the provided graph data and within strict constraints.

View Live Leaderboard

1. Problem Statement

Given a citation graph where:

nodes represent research papers,
edges represent citation relationships, and
nodes have high-dimensional feature vectors,

your goal is to learn node embeddings that can simultaneously:

Classify unseen nodes into research areas
Predict unseen edges (citation links) between nodes

The challenge is intentionally designed so that optimizing one task alone is insufficient—successful solutions must learn general-purpose representations useful for multiple objectives.

2️. Dataset Description

The dataset is derived from a citation network and consists of the following files (located in data/public/):

Node Features

nodes.csv
2708 nodes
1433 features per node (x0 to x1432)
Each row corresponds to a unique node ID

Node Labels

Training labels: nodes 0–639
Test labels: nodes 1708–2707
Nodes 640–1707 are unlabeled and must not be used for node supervision

Graph Structure

Directed edges represent citations
The source cites the destination

3️. Task Definition

This is a multi-task learning challenge with two tasks:

Task 1: Node Classification

Objective:
Predict the research category of unseen nodes.

Input: node features + graph structure
Output: class label (integer) from 0 to 6

Task 2: Link Prediction

Objective:
Predict whether a citation link exists between two nodes.

Input: pair of node embeddings
Output: probability ∈ [0, 1]

Both tasks must be solved using a shared node embedding space.

4️. Evaluation Metric

Each submission is evaluated on both tasks, and a single final score is computed.

Metrics

Node Classification: Macro F1-score
Link Prediction: ROC-AUC

Final Score

Final Score = 0.5 × Node Macro-F1 + 0.5 × Link ROC-AUC
Equal weighting ensures that neither task can be ignored.

5️. Rules & Constraints

To keep the challenge fair and focused:

No external datasets or pretrained models are allowed
No manual label engineering
Any GNN architecture allowed (GCN, GraphSAGE, etc.)
Solutions should not use different embeddings for both tasks

Submissions that violate these rules may be disqualified.

6️. How to Submit

Fork this repository.
Generate predictions for all rows in the public test file:
data/public/test.csv

Create a folder in your fork:

submissions/inbox/<team_name>/<run_id>/

Inside that folder, create:
- predictions.csv
- metadata.json (optional but recommended)
Your predictions.csv must look like:
```
id,y_pred
node_1708,3
edge_12_45,0.82
```
- Node rows → y_pred must be a class label (integer)
- Edge rows → y_pred must be a probability in [0,1]
Encrypt Your Predictions (Required)

For security and fairness, you must encrypt your submission before uploading.

Install dependency (if not already installed):
```
pip install cryptography
```

Encrypt your file from the root of the repository:

python encryption/encrypt.py submissions/inbox/<team_name>/<run_id>/predictions.csv

This will generate:

predictions.csv.enc

Delete the plaintext file:
```
rm submissions/inbox/<team_name>/<run_id>/predictions.csv
```
Only the encrypted .enc file should be submitted.

Your folder should now contain:

submissions/inbox/<team_name>/<run_id>/
    predictions.csv.enc
    metadata.json

Submit via Pull Request

Commit and push only the encrypted file:

git add submissions/inbox/<team_name>/<run_id>/predictions.csv.enc
git commit -m "Submission: <team_name>"
git push

Then open a Pull Request to this repository.

Important Rules

Only one submission per GitHub user is allowed.
If you submit more than once, your PR will not update the leaderboard.
If your PR does not close automatically, it most likely means:
- You have already submitted once.
- Please check the PR comments for details.

Note: If your submission is not scored automatically, it is likely because your GitHub account is considered a first-time or new contributor. In this case, make any prior public contribution on GitHub (e.g., open a PR anywhere, even a typo fix), then re-submit. If the scoring fails, check to make sure your predictions.csv has the correct format, number of rows and columns.

7️. Leaderboard

The live leaderboard is maintained automatically:

Scores update instantly after PR submission

View Live Leaderboard

Getting Started

A simple baseline using a GraphSAGE-style model is provided in baseline.py It demonstrates:

shared node embeddings
joint optimization of node + link tasks
correct submission format

Participants are encouraged to improve upon it. Focus on improving the GNN's learnt features rather than modifying the complete model architecture. A GNN with two MLP heads for prediction as in the baseline should suffice.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
competition		competition
data		data
docs		docs
encryption		encryption
leaderboard		leaderboard
submissions		submissions
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
baseline.py		baseline.py
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NodeLink Nexus

Multi-Task Graph Representation Challenge

View Live Leaderboard

1. Problem Statement

2️. Dataset Description

Node Features

Node Labels

Graph Structure

3️. Task Definition

Task 1: Node Classification

Task 2: Link Prediction

4️. Evaluation Metric

Metrics

Final Score

5️. Rules & Constraints

6️. How to Submit

7️. Leaderboard

View Live Leaderboard

Getting Started

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NodeLink Nexus

Multi-Task Graph Representation Challenge

View Live Leaderboard

1. Problem Statement

2️. Dataset Description

Node Features

Node Labels

Graph Structure

3️. Task Definition

Task 1: Node Classification

Task 2: Link Prediction

4️. Evaluation Metric

Metrics

Final Score

5️. Rules & Constraints

6️. How to Submit

7️. Leaderboard

View Live Leaderboard

Getting Started

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages