A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks
This repository contains the code needed to execute the SIMBA algorithm and reproduice the results presented in Singlan, N., Abou Choucha, F. & Pasquier, C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 15, 11360 (2025). https://doi.org/10.1038/s41598-025-95749-6.
- Installation
- Reproduce paper results
- Usage on your own data
- Repository structure
- Data
- Requirements
- Cite
- Contact
To use this algorithm, it is needed to have Python 3.11 installed on your machine. Moreover, it is needed
to install the required packages described in the Requirements section and listed in the
environment.yml file.
To run the algorithm, it is needed to clone this repository, install the required packages and run the code using the commands described in the Reproduce paper results and the Usage on your own data sections.
To reproduce the results of the paper on 'Fully simulated datasets you can use the following command:
./launch_simulated.shTo reproduce the results of the paper on 'Rewire datasets' you can use the following command:
./launch_rewire.shTo reproduce the results of the paper on 'Value simulated dataset' you can use the following command:
./launch_value.shTo reproduce the results of the paper on 'Real dataset' you can use the following command:
./launch_real.shTo run the code on your own data, you can use the following command:
python main.py -d ./data/your_dataAvailable options are:
-dor--data: Required the path to the data file.-gtor--ground_truth: Optional flag to indicate that graph contains ground truth.-nor--name: Optional flag to indicate that graph contains node names.-resor--results: Optional the path to the results directory. Default is./results/.-oor--output: Optional the path to the output file. Default is./results/output.txt.-minor--min: Optional the minimum size of the communities. Default is 5.-no_filterpr--no_filter: Optional flag to indicate that the filter should not be applied.-cor--clustering: Optional the clustering algorithm to use. Default issimilarity. Available options aresimilarity,louvainangNone.-por--priority: Optional the priority to use in the clustering algorithm (only for similarity-based). Default isworst. Available options areworstandbest.-csor--community_selection: Optional flag to indicate that the community selection should be applied.-tor--threshold: Optional the threshold to use in the community selection. Default is 0.05.-kor--k: Optional the number of communities to find (used in metrics calculation). Default isNone.-vor--verbose: Optional flag to indicate that the verbose mode should be activated.
The repository is structured as follows:
data/: contains the data used in the experiments.clustering/: contains the code of the similarity-based clustering algorithm.graph/: contains the code to represent the graph structure and the community detection algorithms.utils/: contains the code of the utility functions used in the experiments.Real_Graphs_Result/: contains the results of the real datasets experiments.main.py: contains the main code to run the experiments.Supplementary_Table_S13_all_results.xlsx: contains the results of the paper.
The data used in the experiments are available in the data/ folder. Here is the list of the available data:
[One_Cluster_Dataset.zip](data/One_Cluster_Dataset.zip): contains 1000 graphs of 1000 nodes with 1 community of 10 nodes to find.[Two_Clusters_Dataset.zip](data/Two_Clusters_Dataset.zip): contains 1000 graphs of 1000 nodes with 2 communities of 10 nodes to find.[Three_Clusters_Dataset.zip](data/Three_Clusters_Dataset.zip): contains 1000 graphs of 1000 nodes with 3 communities of 10 nodes to find.[Ten_Clusters_Dataset.zip](data/Ten_Clusters_Dataset.zip): contains 1000 graphs of 1000 nodes with 10 communities of 10 nodes to find.[Medium_Dataset.zip](data/Medium_Dataset.zip): contains 100 graphs of 6344 nodes with 10 communities of 10 nodes to find.[Dataset_Rewire_on_Human_0-99_CutOff.zip](data/Dataset_Rewire_on_Human_0-99_CutOff.zip): contains 100 graphs of 6344 nodes with 10 communities of 10 nodes to find.[Dataset_Rewire_on_Human_0-8_CutOff.zip](data/Dataset_Rewire_on_Human_0-8_CutOff.zip): contains 100 graphs of 14219 nodes with 50 communities of 15 nodes to find.[Tammaro_2024_brain_STRING_v12.npz](data/Tammaro_2024_brain_STRING_v12.npz): contains the graph obtained from article: HDAC1/2 inhibitor therapy improves multiple organ systems in aged mice (Tammaro, 2024).[Tammaro_2024_heart_STRING_v12.npz](data/Tammaro_2024_heart_STRING_v12.npz): contains the graph obtained from article: HDAC1/2 inhibitor therapy improves multiple organ systems in aged mice (Tammaro, 2024).[Tammaro_2024_kidney_STRING_v12.npz](data/Tammaro_2024_kidney_STRING_v12.npz): contains the graph obtained from article: HDAC1/2 inhibitor therapy improves multiple organ systems in aged mice (Tammaro, 2024).
Each graph is stored in a .npz file. The file contains the following keys:
adjacency_data: the adjacency sparse matrix of the graph.adjacency_indices: the indices of the adjacency sparse matrix of the graph.adjacency_indptr: the indptr of the adjacency sparse matrix of the graph.adjacency_shape: the shape of the adjacency sparse matrix of the graph.feature_data: the feature sparse matrix of the graph.feature_indices: the indices of the feature sparse matrix of the graph.feature_indptr: the indptr of the feature sparse matrix of the graph.feature_shape: the shape of the feature sparse matrix of the graph.labels: Optional - Use with the-gtoption the ground truth of the graph.label_indices: Optional - Use with the-gtoption the ground truth indices of the graph.name: Optional - Use with the-noption the name of the nodes.
The code is written in Python 3.11.
The following packages are required to run the code:
- scipy
- scikit-learn
- numpy
- pyunionfind
- scikit-network
If you use this code, please cite the following paper:
@article{Singlan2025,
author = {Nina Singlan and Fadi Abou Choucha and Claude Pasquier},
doi = {10.1038/s41598-025-95749-6},
issn = {2045-2322},
issue = {1},
journal = {Scientific Reports 2025 15:1},
keywords = {Bioinformatics,Gene expression analysis,Similarity-based clustering},
month = {4},
pages = {1-16},
publisher = {Nature Publishing Group},
title = {A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks},
volume = {15},
url = {https://www.nature.com/articles/s41598-025-95749-6},
year = {2025}
}If you have any question, please contact me at singlan.nina@gmail.com