Skip to content

Running mainRDIgraphs.py

Orieus edited this page Sep 2, 2020 · 1 revision

In order to run mainRDIgraphs.py, we must specify a project folder. All results of the software execution will be saved in the folder structure below the project folder. If the project folder does not exist, it will be created.

In order to run this script, you can write

python mainRDIgraphs.py --p PATH --source SOURCE

where

  • PATH is the path to the project folder.
  • SOURCE is the path to the folder containing the source data

If the project folder does not exist, it will be created, along with the basic structure required to run any graph processing project, which consists of the following files and folders:

  • graphs/: it will contain all graph created for the project.
  • bigraphs/: it will contain the bipartite graphs created for the project
  • metagraph/: a graph describing the supergraph structure of the project (i.e. a graph where each node represents a graph in graphs/ and each edge represents a bipartite graph in bigraph
  • output/: it will contain files with results of the graph analysis and processing: data files and figures
  • import/: a folder where data can be located to be imported by the application. It is not bein used in v1.0.
  • export/: it will be used to store some results exported to different formats
  • parameters.yaml: the configuration file of the application
  • metadata.pkl: a file containing information about the project state
  • msgs.log: a log files storing messages printed by the application during execution.

1. Configuration file

The first time a project folder is selected, a default configuration file parameters.yaml is placed in the project folder. You should edit this file before starting to work with the applications.

The configuration file in release v1.0 has the following options:

1.1. Parameters to access SQL and graph databases:

connections:
  SQL:
    db_selection:
      Pr: # Name of the database for projects
      Pa: # Name of the database for patents
      Pu: # Name of the database for publications
    databases:
      # For every available SQL database, complete the following data 
      # and replace [DB_NAME] by the name of each database.
      [DBE_NAME]:
        category: 
        connector: 
        server: 
        user: 
        password: 
  neo4j:
    server:
    user:
    password:

1.2. Parameters to access the Halo software:

# Path to Halo software
path2halo: '../myhalo'

1.3. Parameters for algorithms:

algorithms:
  # Size of blocks for the computation of similarity graphs. 25_000 is ok for
  # computation in a standard PC. Larger values may cause large processing
  # times caused by memory swapping.
  blocksize: 25_000   

# Parameters for model validation
validate_all_models:
  spf: 1                # Sampling factor
  rescale: False        # True if graph similarities should be normalized
  n_edges_t: 100_000    # Target number of edges for all validation graphs
  g: 1                  # Not 1 to apply a non-linear transformation to similarity values

1.4. Configuration of logging messages:

# Specify format for the log outputs
logformat:
  filename: msgs.log
  datefmt: '%m-%d %H:%M:%S'
  file_format: '%(asctime)s %(levelname)-8s %(message)s'
  file_level: INFO
  cons_level: DEBUG
  cons_format: '%(levelname)-8s %(message)s'

2. Menu options.

The first time you run the software, you will see a short menu with two options:

1. Activate configuration file
0. Exit the application

Once you have edited configuration file parameters.yaml you can select option 1.

At this point, you will enter the complete set of options. They are organized in a hierarchical structure. The whole tree of available options is shown below:

1. Create new project  
2. Load existing project    
3. Activate configuration file
4. Show SQL data sources
   4.1. Publications
   4.2. Projects
   4.3. Patents
5. Manage Neo4J database
   5.1. Show Neo4J Super Graph
       5.1.1. Overview of the whole graph databases
       5.1.2. Show information about a specific snode
       5.1.3. Show information about a specific sedge
   5.2. Reset Neo4J Graphs
       5.2.1. Reset the whole Neo4J graph databases
       5.2.2. Reset a specific Neo4J snode
       5.2.3. Reset a specific Neo4J sedge
   5.3. Export graphs to Neo4J
       5.3.1. Project
       5.3.2. Publication
       5.3.3. Patent
       5.3.4. Author
       5.3.5. Organization
   5.4. Export bigraph to Neo4J
       5.4.1. Project
       5.4.2. Publication
       5.4.3. Patent
       5.4.4. Author
       5.4.5. Organization
6. Pre-visualize supergraph
   6.1. Show supergraph structure
   6.2. Quick preview of graph
   6.3. Quick preview of bipartite graph 
7. Reset supernode
8. Reset superedge
9. Import_data
   9.1. Import co-citations graph from DB
   9.2. Import complete citations graph from SCOPUS
       9.2.1. Directed graph from citing papers to cited papers
       9.2.2. Directed graph from cited papers to citing papers
       9.2.3. Undirected graph
   9.3. Import citations subgraph from SCOPUS
       9.3.1. Directed graph from citing papers to cited papers
       9.3.2. Directed graph from cited papers to citing papers
       9.3.3. Undirected graph
   9.4. Load node attributes from SQL databases
       9.4.1. Publications
       9.4.2. Projects
       9.4.3. Patents
   9.5 Import project-researchers bipartite graph from file
10. Graph tools
   10.1. Subsample snode
       10.1.1. Replace the original snode
       10.1.2. Keep the original snode and create a new one 
   10.2. Make a subgraph with the largest community
   10.3. Remove isolated nodes
   10.4. Remove attribute from graph nodes
   10.5. Generate a synthetic graph for simple testing
11. Graph inference tools
   11.1. Cluster equivalence classes
   11.2. Equivalent Similarity graph:    from A_X to A-A
   11.3. Similarity graph:               from A_X to A-A
       11.3.1. He: 1 minus squared Hellinger's distance (JS) (sklearn-based)"
       11.3.2. He2: 'He2: self implementation of He (faster)'
       11.3.3. l1:  'l1: 1 minus l1 distance'
       11.3.4. JS:  'JS: Jensen-Shannon similarity (too slow)'
       11.3.5. Gauss: 'Gauss: An exponential function of the squared l2 distance'
       11.3.6. He->JS: 'He->JS: JS through He and a theoretical bound'
       11.3.7. He2->JS: 'He2->JS: Same as He->JS, but using implementation He2'
       11.3.8. l1->JS: 'l1->JS: JS through l1 and a theoretical bound'
   11.4. Bipartite graph from attributes: from A_B to A->B
   11.5. Transductive graph:              from A-A->B to B-B
       11.5.1. First-order graph (for transduced similarity graphs)
       11.5.2. Zero-order graph (for cooperation graphs)
   11.6. Transitive graph:                from A->B->C to A->C
12. Local graph analysis
   12.1. Eigenvector Centrality
   12.2. Degree Centrality
   12.3. Betweenness centrality
   12.4. Closeness centrality
   12.5. Clustering Coefficient
   12.6. PageRank
   12.7. Katz centrality
   12.8. Absolute (i.e. unnormalized) in-degree
   12.9. Absolute (i.e. unnormalized) out_degree
13. Community detection tools
   13.1. Louvain
   13.2. Fastgreedy
   13.3. Walktrap
   13.4. Infomap
   13.5. Label Propagation
   13.6. kmeans
   13.7. Agglomerative Kmeans
   13.8. Connected components
14. Evaluate community partitions
   14.1. Coverage
   14.2. Performance
   14.3. Modularity
15. Compare two communities
   15.1. VI: Variation of information metric, Meila (2003)
   15.2. NMI: Normalized mutual information, Danon et al (2005)
   15.3. RI: Rand index, Rand (1971)
   15.4. ARI: Adjusted Rand index, Hubert and Arabie (1985)
   15.5. SJD: Split-join distance of van Dongen (2000)
   15.6. SJP: Split-join projection of van Dongen (200)
16. Graph visualization
   16.1. Show top nodes ranked by attribute value
   16.2. Graph layout
   16.3. Visualize bipartite graph
0. Exit the application

Some options (such as those related to the processing or analysis of specific graphs) will request some more information: the graph of graphs to be processes, the attributes to be analyzed, etc.)

3. Databases.

The application allows the exportation of all graphs to a Neo4J database. You can find some instructions about Neo4J here:

Clone this wiki locally