You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Setup & Data Splitting
Starting by loading a standard dataset, like a simple citation network using MLDatasets.jl.
The Code Goal: Using rand_edge_split(g, 0.1) to hold out 10% of the edges for testing.
The Tutorial Explanation: Explain why we split edges. (e.g., "To evaluate our model, we pretend some edges don't exist during training. Our model must learn to predict those hidden edges based on the remaining graph structure.")
Negative Sampling
I would need to generate "fake" edges.
The Code Goal: Writing a small function to randomly sample pairs of nodes that do not currently have an edge connecting them.
The Tutorial Explanation: Clearly explain that for every "positive" edge in the training set, we need a "negative" edge (label 0) to compute our loss.
The Encoder-Decoder Model
Link prediction usually uses a two-part architecture. I will use Flux.jl for this.
The Encoder (GNN.jl): A standard 2-layer Graph Convolutional Network (GCNConv) or GraphSAGE layer. This takes the node features and the training graph, and outputs node embeddings.
The Decoder (Standard Julia/Flux): A function that takes the embeddings of two nodes (a source and a target) and computes their dot product. If the dot product is high, the model predicts an edge exists.
The Loss Function & Training Loop
The Loss: Because we are predicting 1 (edge) or 0 (no edge), we use Binary Cross-Entropy with Logits (Flux.logitbinarycrossentropy).
The Loop: 1. Passing the graph through the Encoder to get node embeddings.
Passing the positive and negative edge indices to the Decoder to get predictions.
Calculating the loss against the true labels (1s for real edges, 0s for fake edges).
Updating the GNN weights.
Evaluation
Calculating the Area Under the ROC Curve (AUC-ROC) on the test edges I hid in Step 1. Would this be a suitable example?
Starting by loading a standard dataset, like a simple citation network using MLDatasets.jl.
The Code Goal: Using rand_edge_split(g, 0.1) to hold out 10% of the edges for testing.
The Tutorial Explanation: Explain why we split edges. (e.g., "To evaluate our model, we pretend some edges don't exist during training. Our model must learn to predict those hidden edges based on the remaining graph structure.")
I would need to generate "fake" edges.
The Code Goal: Writing a small function to randomly sample pairs of nodes that do not currently have an edge connecting them.
The Tutorial Explanation: Clearly explain that for every "positive" edge in the training set, we need a "negative" edge (label 0) to compute our loss.
Link prediction usually uses a two-part architecture. I will use Flux.jl for this.
The Encoder (GNN.jl): A standard 2-layer Graph Convolutional Network (GCNConv) or GraphSAGE layer. This takes the node features and the training graph, and outputs node embeddings.
The Decoder (Standard Julia/Flux): A function that takes the embeddings of two nodes (a source and a target) and computes their dot product. If the dot product is high, the model predicts an edge exists.
The Loss: Because we are predicting 1 (edge) or 0 (no edge), we use Binary Cross-Entropy with Logits (Flux.logitbinarycrossentropy).
The Loop: 1. Passing the graph through the Encoder to get node embeddings.
Calculating the Area Under the ROC Curve (AUC-ROC) on the test edges I hid in Step 1. Would this be a suitable example?