Nikos Tsikouras1,2, Yorgos Pantis1,2, Ioannis Mitliagkas2,3, Christos Tzamos1,2
1National and Kapodistrian University of Athens, Greece
2Archimedes, Athena Research Center, Greece
3Mila & Université de Montréal, Canada
Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g. perturbed gradient descent (PGD). At the core of our approach is a key derandomization lemma, which states that optimizing the function Ex [gθ(W x + b)] converges to a point where W = 0, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.
If you use this work, please cite:
@inproceedings{tsikouras2024derandomization,
title={A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond},
author={Tsikouras, Nikos and Pantis, Yorgos and Mitliagkas, Ioannis and Tzamos, Christos},
booktitle = {Proceedings of the Fourteenth Annual International Conference on Learning Representations},
year={2026}
}# StructureDiscovery Project Directory
StructureDiscovery/
│
├── NeuralNetorks/
│ ├── activations.py # Activation functions
│ ├── model.py # Neural network model architecture
│ ├── train.py # Training pipeline and routines
│ ├── plot.py # Training curve and evaluation visualizations
│ └── main.py # Main file that needs all the above
│
├── JohnsonLindenstrauss/
│ ├── model.py # Model architecture
│ ├── train.py # Optimization loop for training the distortion model
│ ├── plot.py # Plots of distortion and variance evolution
│ └── main.py # Main file that needs all the above
│
├── MAXCUT/
│ ├── utilities.py # Helper functions: exact MAXCUT, e.g. graph generation
│ ├── optimizer.py # Optimization routine for MAXCUT using gradients
│ ├── plot.py # Plot sigma and cut values over iterations
│ └── main.py # Main file that needs all the above
│
├── LICENSE # MIT open-source license
├── requirements.txt # Python dependencies
└── README.md # Project overview and instructions
The authors would like to thank Alireza Mousavi-Hosseini for useful discussions and feedback. This work has been partially supported by project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the NextGenerationEU Program. Ioannis Mitliagkas acknowledges support by Archimedes, Athena Research Center, Greece and the Canada CIFAR AI Chair program. The majority of work was performed at Archimedes in Athens.