This repository contains the optimized implementation of the SIMBA algorithm presented in Singlan, N., Abou Choucha, F. & Pasquier, C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 15, 11360 (2025). https://doi.org/10.1038/s41598-025-95749-6 used in the SIMBAnalysis framework available at https://huggingface.co/spaces/simba-clustering/SIMBAnalysis.
To use this algorithm, it is needed to have Python 3.11 installed on your machine. Moreover, it is needed to install the required packages described in the Requirements section and listed in the environment.yml file.
To run the algorithm, it is needed to clone this repository, install the required packages and run the code using the commands described in the Parameters section.
-vor--value: mandatory. Path to the file containing the p-values. Must be a CSV or XLSX file.-cvor--column_values_name: mandatory. Name of the column containing the p-values.-cgor--column_genes_name: mandatory. Name of the column containing the gene names.
-ogor--original_graph: mandatory. Specify the original graph to be used. Possible values are STRING, BioGRID, IntAct or file.
-for--file: mandatory. Path to the file containing the graph. Must be a CSV, TXT or XLSX file.-sepor--separator: optional. Separator used in the file. Mandatory if the file is a TXT file.
-spor--species: optional. Specify the species to be used. Mandatory if the original graph is STRING, BioGRID or IntACT. Possible values are Homo sapiens, Mus musculus, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, Escherichia coli K12, Pseudomonas aeruginosa PAO1, Rattus norvegicus, Oryctolagus cuniculus or a valid taxonomy identifier.-dor--datadir: optional. Directory to store the database files. Mandatory if the original graph is STRING, BioGRID or IntACT.
-min_neighborhood: optional. Minimum neighborhood score to consider an edge. Default is 0.0.-min_fusion: optional. Minimum fusion score to consider an edge. Default is 0.0.-min_cooccurrence: optional. Minimum co-occurrence score to consider an edge. Default is 0.0.-min_coexpression: optional. Minimum co-expression score to consider an edge. Default is 0.0.-min_experimental: optional. Minimum experimental score to consider an edge. Default is 0.0.-min_database: optional. Minimum database score to consider an edge. Default is 0.0.-min_textmining: optional. Minimum text-mining score to consider an edge. Default is 0.0.-min_combined_score: optional. Minimum combined score to consider an edge. Default is 0.0.
-inter_type: optional. Specify the type of interaction between proteins (options: physical, genetic, None). Default is None (all types).
-min_confidence: optional. Minimum confidence score to consider an edge. Default is 0.0.
-mor--min_nodes: optional. Minimum number of nodes in a community. Default is 5.
-oor--output: mandatory. Path to the output file. Must be a CSV file.
The repository is structured as follows:
clustering/: contains the code of the similarity-based clustering algorithm.download/: contains the code to download and process the biological networks.graph/: contains the code to represent the graph structure and the community detection algorithms.utils/: contains the code of the utility functions used.main.py: contains the main code to run the algorithm.
The code is written in Python 3.11.
The following packages are required to run the code:
- numpy
- pandas
- networkx
- openpyxl
- pyunionfind
If you use this code, please cite the following papers:
@article{Singlan2025,
author = {Nina Singlan and Fadi {Abou Choucha} and Claude Pasquier},
doi = {10.1038/s41598-025-95749-6},
issn = {2045-2322},
issue = {1},
journal = {Scientific Reports 2025 15:1},
keywords = {Bioinformatics,Gene expression analysis,Similarity-based clustering},
month = {4},
pages = {1-16},
publisher = {Nature Publishing Group},
title = {A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks},
volume = {15},
url = {https://www.nature.com/articles/s41598-025-95749-6},
year = {2025}
}ADD SIMBAnalysis reference when available.If you have any question, please contact me at singlan.nina@gmail.com