Time optimized SIMBA implementation for SIMBAnalysis tool

This repository contains the optimized implementation of the SIMBA algorithm presented in Singlan, N., Abou Choucha, F. & Pasquier, C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 15, 11360 (2025). https://doi.org/10.1038/s41598-025-95749-6 used in the SIMBAnalysis framework available at https://huggingface.co/spaces/simba-clustering/SIMBAnalysis.

Installation

To use this algorithm, it is needed to have Python 3.11 installed on your machine. Moreover, it is needed to install the required packages described in the Requirements section and listed in the environment.yml file.

To run the algorithm, it is needed to clone this repository, install the required packages and run the code using the commands described in the Parameters section.

Parameters:

p-values

-v or --value: mandatory. Path to the file containing the p-values. Must be a CSV or XLSX file.
-cv or --column_values_name: mandatory. Name of the column containing the p-values.
-cg or --column_genes_name: mandatory. Name of the column containing the gene names.

Graph structure

-og or --original_graph: mandatory. Specify the original graph to be used. Possible values are STRING, BioGRID, IntAct or file.

If the original graph is a file:

-f or --file: mandatory. Path to the file containing the graph. Must be a CSV, TXT or XLSX file.
-sep or --separator: optional. Separator used in the file. Mandatory if the file is a TXT file.

If the original graph is a STRING, BioGRID or IntAct:

-sp or --species: optional. Specify the species to be used. Mandatory if the original graph is STRING, BioGRID or IntACT. Possible values are Homo sapiens, Mus musculus, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, Escherichia coli K12, Pseudomonas aeruginosa PAO1, Rattus norvegicus, Oryctolagus cuniculus or a valid taxonomy identifier.
-d or --datadir: optional. Directory to store the database files. Mandatory if the original graph is STRING, BioGRID or IntACT.

If the original graph is a STRING:

-min_neighborhood: optional. Minimum neighborhood score to consider an edge. Default is 0.0.
-min_fusion: optional. Minimum fusion score to consider an edge. Default is 0.0.
-min_cooccurrence: optional. Minimum co-occurrence score to consider an edge. Default is 0.0.
-min_coexpression: optional. Minimum co-expression score to consider an edge. Default is 0.0.
-min_experimental: optional. Minimum experimental score to consider an edge. Default is 0.0.
-min_database: optional. Minimum database score to consider an edge. Default is 0.0.
-min_textmining: optional. Minimum text-mining score to consider an edge. Default is 0.0.
-min_combined_score: optional. Minimum combined score to consider an edge. Default is 0.0.

If the original graph is BioGRID:

-inter_type: optional. Specify the type of interaction between proteins (options: physical, genetic, None). Default is None (all types).

If the original graph is IntAct:

-min_confidence: optional. Minimum confidence score to consider an edge. Default is 0.0.

SIMBA parameters:

-m or --min_nodes: optional. Minimum number of nodes in a community. Default is 5.

Output file:

-o or --output: mandatory. Path to the output file. Must be a CSV file.

Repository structure

The repository is structured as follows:

clustering/: contains the code of the similarity-based clustering algorithm.
download/: contains the code to download and process the biological networks.
graph/: contains the code to represent the graph structure and the community detection algorithms.
utils/: contains the code of the utility functions used.
main.py: contains the main code to run the algorithm.

Requirements

The code is written in Python 3.11.

The following packages are required to run the code:

numpy
pandas
networkx
openpyxl
pyunionfind

Cite

If you use this code, please cite the following papers:

@article{Singlan2025,
   author = {Nina Singlan and Fadi {Abou Choucha} and Claude Pasquier},
   doi = {10.1038/s41598-025-95749-6},
   issn = {2045-2322},
   issue = {1},
   journal = {Scientific Reports 2025 15:1},
   keywords = {Bioinformatics,Gene expression analysis,Similarity-based clustering},
   month = {4},
   pages = {1-16},
   publisher = {Nature Publishing Group},
   title = {A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks},
   volume = {15},
   url = {https://www.nature.com/articles/s41598-025-95749-6},
   year = {2025}
}

ADD SIMBAnalysis reference when available.

Contact

If you have any question, please contact me at singlan.nina@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Time optimized SIMBA implementation for SIMBAnalysis tool

Table of contents

Installation

Parameters:

p-values

Graph structure

If the original graph is a file:

If the original graph is a STRING, BioGRID or IntAct:

If the original graph is a STRING:

If the original graph is BioGRID:

If the original graph is IntAct:

SIMBA parameters:

Output file:

Repository structure

Requirements

Cite

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
clustering		clustering
download		download
graph		graph
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
main.py		main.py

License

nsgln/SIMBAnalysis

Folders and files

Latest commit

History

Repository files navigation

Time optimized SIMBA implementation for SIMBAnalysis tool

Table of contents

Installation

Parameters:

p-values

Graph structure

If the original graph is a file:

If the original graph is a STRING, BioGRID or IntAct:

If the original graph is a STRING:

If the original graph is BioGRID:

If the original graph is IntAct:

SIMBA parameters:

Output file:

Repository structure

Requirements

Cite

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages