Skip to content

nsgln/SIMBAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Time optimized SIMBA implementation for SIMBAnalysis tool

This repository contains the optimized implementation of the SIMBA algorithm presented in Singlan, N., Abou Choucha, F. & Pasquier, C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 15, 11360 (2025). https://doi.org/10.1038/s41598-025-95749-6 used in the SIMBAnalysis framework available at https://huggingface.co/spaces/simba-clustering/SIMBAnalysis.

Table of contents

Installation

To use this algorithm, it is needed to have Python 3.11 installed on your machine. Moreover, it is needed to install the required packages described in the Requirements section and listed in the environment.yml file.

To run the algorithm, it is needed to clone this repository, install the required packages and run the code using the commands described in the Parameters section.

Parameters:

p-values

  • -v or --value: mandatory. Path to the file containing the p-values. Must be a CSV or XLSX file.
  • -cv or --column_values_name: mandatory. Name of the column containing the p-values.
  • -cg or --column_genes_name: mandatory. Name of the column containing the gene names.

Graph structure

  • -og or --original_graph: mandatory. Specify the original graph to be used. Possible values are STRING, BioGRID, IntAct or file.

If the original graph is a file:

  • -f or --file: mandatory. Path to the file containing the graph. Must be a CSV, TXT or XLSX file.
  • -sep or --separator: optional. Separator used in the file. Mandatory if the file is a TXT file.

If the original graph is a STRING, BioGRID or IntAct:

  • -sp or --species: optional. Specify the species to be used. Mandatory if the original graph is STRING, BioGRID or IntACT. Possible values are Homo sapiens, Mus musculus, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, Escherichia coli K12, Pseudomonas aeruginosa PAO1, Rattus norvegicus, Oryctolagus cuniculus or a valid taxonomy identifier.
  • -d or --datadir: optional. Directory to store the database files. Mandatory if the original graph is STRING, BioGRID or IntACT.

If the original graph is a STRING:

  • -min_neighborhood: optional. Minimum neighborhood score to consider an edge. Default is 0.0.
  • -min_fusion: optional. Minimum fusion score to consider an edge. Default is 0.0.
  • -min_cooccurrence: optional. Minimum co-occurrence score to consider an edge. Default is 0.0.
  • -min_coexpression: optional. Minimum co-expression score to consider an edge. Default is 0.0.
  • -min_experimental: optional. Minimum experimental score to consider an edge. Default is 0.0.
  • -min_database: optional. Minimum database score to consider an edge. Default is 0.0.
  • -min_textmining: optional. Minimum text-mining score to consider an edge. Default is 0.0.
  • -min_combined_score: optional. Minimum combined score to consider an edge. Default is 0.0.

If the original graph is BioGRID:

  • -inter_type: optional. Specify the type of interaction between proteins (options: physical, genetic, None). Default is None (all types).

If the original graph is IntAct:

  • -min_confidence: optional. Minimum confidence score to consider an edge. Default is 0.0.

SIMBA parameters:

  • -m or --min_nodes: optional. Minimum number of nodes in a community. Default is 5.

Output file:

  • -o or --output: mandatory. Path to the output file. Must be a CSV file.

Repository structure

The repository is structured as follows:

  • clustering/: contains the code of the similarity-based clustering algorithm.
  • download/: contains the code to download and process the biological networks.
  • graph/: contains the code to represent the graph structure and the community detection algorithms.
  • utils/: contains the code of the utility functions used.
  • main.py: contains the main code to run the algorithm.

Requirements

The code is written in Python 3.11.

The following packages are required to run the code:

  • numpy
  • pandas
  • networkx
  • openpyxl
  • pyunionfind

Cite

If you use this code, please cite the following papers:

@article{Singlan2025,
   author = {Nina Singlan and Fadi {Abou Choucha} and Claude Pasquier},
   doi = {10.1038/s41598-025-95749-6},
   issn = {2045-2322},
   issue = {1},
   journal = {Scientific Reports 2025 15:1},
   keywords = {Bioinformatics,Gene expression analysis,Similarity-based clustering},
   month = {4},
   pages = {1-16},
   publisher = {Nature Publishing Group},
   title = {A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks},
   volume = {15},
   url = {https://www.nature.com/articles/s41598-025-95749-6},
   year = {2025}
}
ADD SIMBAnalysis reference when available.

Contact

If you have any question, please contact me at singlan.nina@gmail.com

About

This repository contains the optimized implementation of the SIMBA algorithm used in the SIMBAnalysis framework.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages