Graph representational learning: When Does More Expressivity Hurt Generalization?

Experiments

Experiments

Median-Based Labeling with Cycle Counts

We generate 3,000 random graphs. For each graph, we count the number of cycles of length 3 and 4. Then, we compare these counts to the median across all graphs in the dataset. If the count is below the median, we assign a label of 0; otherwise, we assign a label of 1.

To reproduce the experiment, run

python synthetic.py --dataset er --task sum_basis_C4 --pe basis_C4

For the explanation on the arguments, please run

python synthetic.py --help

The images below show a 2-dimensional embedding of graphs using Multidimensional Scaling (MDS) based on the pairwise Tree Mover's Distances for the first 100 Erdős–Rényi graphs.

The leftmost plot represents embeddings without positional encoding. The middle plot incorporates cycle counts of length 3, while the rightmost plot includes cycle counts of length 3 and 4. As more structural information - which is correlated to the graph labels - is added, the separation between classes becomes more distinct. This trend is reflected in model performance, which improves from 84.80% (no encoding) to 86.57% (cycle counts of length 3) and 97.93% (cycle counts of length 3 and 4).

MDS-Based Labeling via TMD Distances

We generate 500 random graphs. We compute the pairwise Tree Mover's distances on the graphs equipped with counts of cycles of length up to 5. Labels are assigned using a clustering algorithm on a 2-dimensional embedding of the dataset.

python synthetic.py --dataset er --task tmd --pe basis_C5 --num_graphs 500

Real-World Datasets

We employ some molecular dataset from TUDataset and plot the performance w.r.t. the Tree Mover's distance to the training dataset.

python real.py --dataset Mutagenicity --num_layers 3

Performance	Error bound

As reported, performance deteriorates as the testing samples get further from the training dataset. The effect is less pronounced in PROTEINS: for this dataset, methods that neglect the graph structure outperform graph-based approaches; hence, the graph structure is not crucial for the classification task.

Citation

If you found our work useful in your research, please cite our paper:

@inproceedings{
    maskey2026graph,
    title={Graph Representational Learning: When Does More Expressivity Hurt Generalization?},
    author={Sohir Maskey and Raffaele Paolino and Fabian Jogl and Gitta Kutyniok and Johannes F. Lutzeyer},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=C6vpifaZvU}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
imgs		imgs
results		results
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
job.sh		job.sh
real.py		real.py
synthetic.py		synthetic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph representational learning: When Does More Expressivity Hurt Generalization?

Experiments

Median-Based Labeling with Cycle Counts

MDS-Based Labeling via TMD Distances

Real-World Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graph representational learning: When Does More Expressivity Hurt Generalization?

Experiments

Median-Based Labeling with Cycle Counts

MDS-Based Labeling via TMD Distances

Real-World Datasets

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages