Skip to content

Proposal: Add comprehensive Link Prediction tutorial to documentation. #669

@rainerrodrigues

Description

@rainerrodrigues
  1. The Setup & Data Splitting
    Starting by loading a standard dataset, like a simple citation network using MLDatasets.jl.
    The Code Goal: Using rand_edge_split(g, 0.1) to hold out 10% of the edges for testing.
    The Tutorial Explanation: Explain why we split edges. (e.g., "To evaluate our model, we pretend some edges don't exist during training. Our model must learn to predict those hidden edges based on the remaining graph structure.")
  2. Negative Sampling
    I would need to generate "fake" edges.
    The Code Goal: Writing a small function to randomly sample pairs of nodes that do not currently have an edge connecting them.
    The Tutorial Explanation: Clearly explain that for every "positive" edge in the training set, we need a "negative" edge (label 0) to compute our loss.
  3. The Encoder-Decoder Model
    Link prediction usually uses a two-part architecture. I will use Flux.jl for this.
    The Encoder (GNN.jl): A standard 2-layer Graph Convolutional Network (GCNConv) or GraphSAGE layer. This takes the node features and the training graph, and outputs node embeddings.
    The Decoder (Standard Julia/Flux): A function that takes the embeddings of two nodes (a source and a target) and computes their dot product. If the dot product is high, the model predicts an edge exists.
  4. The Loss Function & Training Loop
    The Loss: Because we are predicting 1 (edge) or 0 (no edge), we use Binary Cross-Entropy with Logits (Flux.logitbinarycrossentropy).
    The Loop: 1. Passing the graph through the Encoder to get node embeddings.
  5. Passing the positive and negative edge indices to the Decoder to get predictions.
  6. Calculating the loss against the true labels (1s for real edges, 0s for fake edges).
  7. Updating the GNN weights.
  8. Evaluation
    Calculating the Area Under the ROC Curve (AUC-ROC) on the test edges I hid in Step 1. Would this be a suitable example?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions