This repository implements a clustering neural network for constraining the dark matter self-interaction cross-section
from galaxy clusters using cosmological simulations.
The method builds an interpretable latent space where galaxy clusters are grouped by physical similarity, enabling
robust parameter estimation in sparsely sampled parameter spaces while providing confidence measurements to detect
out-of-domain data and exploration of secondary features in the data.
We apply this method to the BAHAMAS-SIDM and DARKSKIES simulations and demonstrate accurate recovery of the
self-interaction cross-section when the test data lies within the training domain and reliable rejection of foreign
datasets, offering a blueprint for robust machine learning for scientific inference.
The research paper can be found on Astronomy & Astrophysics (in-review) or the arXiv.
The code is based on the PyTorch-Network-Loader (heron referred
to as netloader) framework, which provides a flexible and modular way to build neural networks for scientific
applications.
The code is structured to allow easy modification and extension for different datasets and architectures.
Please refer to the documentation in the repository for more
details on how to use the framework.
- Clone the repository.
- Install the requirements:
pip install -r requirements.txt - Obtain data for training, the BAHAMAS-SIDM and DARKSKIES simulations can be obtained from the original authors: BAHAMAS-SIDM and DARKSKIES, or use your own data. This method was designed around predicting macroscopic parameters obtained from several image samples.
- Create a PyTorch dataset based on the netloader
BaseDatasetclass.- If using the BAHAMAS-SIDM or DARKSKIES simulations, you can use the class
DarkDatasetfromsrc.utils.dataafter pre-processing the data usingsrc.preprocessingand adding the key 'name' with the value of the dataset name (this is just for plotting) to the generated pickled dictionary. - If using your own data, you can follow the class
DarkDatasetfromsrc.utils.dataand the documentation as templates and modifying them for your own data formatting.
- If using the BAHAMAS-SIDM or DARKSKIES simulations, you can use the class
- Configure the
config.yamlfile to your needs, including:- Dataset path (
global-variables→data-variables→data-dir) - Model architecture (
main→training→network-name) - Training parameters (
main→training) - Output directories (
global-variables→output-variables) - Save and load names (
main→training→network-saveandnetwork-load)
- Dataset path (
- Run the training script:
python -m src.main. - After training, several plots will be generated in the plots directory, including loss curves, latent space PCA, and full latent space visualisations.
- The trained network can then be loaded and used for inference or further training by loading the saved file by first
making sure that both
srcandnetloaderhave been initialised (through an import) and either usingtorch.load('path/to/saved/network.pth')orload_net(save_num, states_dir, network_name)fromnetloader.networks. If there is a pickling error due to weights only load failed, then if this is due to a class from this repository or netloader, make sure to importsrcandnetloader, if it is due toslice, then you can addsliceto the safe globals usingtorch.serialization.add_safe_globals([slice])before loading the network. - To generate predictions or to further train, first create the dataset using the class from step 4, then create the
data loaders using
loader_init(dataset, batch_size=batch_size, ratios=(1,), idxs=dataset.idxs[np.isin(dataset.extra['ids'], net.idxs)])fromnetloader.data, wheredataset.extra['ids']are unique ids for each sample in all the datasets andnet.idxsare the ids used during training, this will return the train and validation data loaders.- For further training, you can use
net.training((train_loader, val_loader), epochs=epochs). - For predictions, you can use
net.predict(val_loader)to get the predictions for the validation set.
- For further training, you can use
See jupyter_notebooks→compact_clustering_example.ipynb for an example of using the code with the simulations
used in the paper.
Alternatively, you can look at src→main.py for the full script to train the compact clustering.
The system specifications used to develop the network are:
- Operating system: Ubuntu 24.04
- CPU: AMD Ryzen 7 7700 8C/16T
- GPU: NVIDIA RTX 4080 16 GB VRAM
NVIDIA GPUs with more than 4 GB of VRAM are recommended for training the full network, the batch size can be reduced to reduce VRAM usage, the reduced network can be trained on GPUs with more than 500 MB of VRAM. If a GPU with the minimum VRAM requirements in not available, you can train on the CPU, but this will be significantly slower. SSDs are strongly recommended for storing the data as this will significantly reduce data loading times.
Training the full network for 150 epochs with 10,000 samples takes around 19.7 minutes on the GPU and 7.3 hours on the
CPU.
Training the reduced network for 150 epochs with 10,000 samples takes around 6.4 minutes on the GPU and 20.3 minutes on
the CPU.
