-
Notifications
You must be signed in to change notification settings - Fork 1
Running mainRDIgraphs.py
In order to run mainRDIgraphs.py, we must specify a project folder. All results of the software execution will be saved in the folder structure below the project folder. If the project folder does not exist, it will be created.
In order to run this script, you can write
python mainRDIgraphs.py --p PATH --source SOURCE
where
-
PATHis the path to the project folder. -
SOURCEis the path to the folder containing the source data
If the project folder does not exist, it will be created, along with the basic structure required to run any graph processing project, which consists of the following files and folders:
-
graphs/: it will contain all graph created for the project. -
bigraphs/: it will contain the bipartite graphs created for the project -
metagraph/: a graph describing the supergraph structure of the project (i.e. a graph where each node represents a graph ingraphs/and each edge represents a bipartite graph inbigraph -
output/: it will contain files with results of the graph analysis and processing: data files and figures -
import/: a folder where data can be located to be imported by the application. It is not bein used in v1.0. -
export/: it will be used to store some results exported to different formats -
parameters.yaml: the configuration file of the application -
metadata.pkl: a file containing information about the project state -
msgs.log: a log files storing messages printed by the application during execution.
The first time a project folder is selected, a default configuration file parameters.yaml is placed in the project folder. You should edit this file before starting to work with the applications.
The configuration file in release v1.0 has the following options:
connections:
SQL:
db_selection:
Pr: # Name of the database for projects
Pa: # Name of the database for patents
Pu: # Name of the database for publications
databases:
# For every available SQL database, complete the following data
# and replace [DB_NAME] by the name of each database.
[DBE_NAME]:
category:
connector:
server:
user:
password:
neo4j:
server:
user:
password:
# Path to Halo software
path2halo: '../myhalo'
algorithms:
# Size of blocks for the computation of similarity graphs. 25_000 is ok for
# computation in a standard PC. Larger values may cause large processing
# times caused by memory swapping.
blocksize: 25_000
# Parameters for model validation
validate_all_models:
spf: 1 # Sampling factor
rescale: False # True if graph similarities should be normalized
n_edges_t: 100_000 # Target number of edges for all validation graphs
g: 1 # Not 1 to apply a non-linear transformation to similarity values
# Specify format for the log outputs
logformat:
filename: msgs.log
datefmt: '%m-%d %H:%M:%S'
file_format: '%(asctime)s %(levelname)-8s %(message)s'
file_level: INFO
cons_level: DEBUG
cons_format: '%(levelname)-8s %(message)s'
The first time you run the software, you will see a short menu with two options:
1. Activate configuration file
0. Exit the application
Once you have edited configuration file parameters.yaml you can select option 1.
At this point, you will enter the complete set of options. They are organized in a hierarchical structure. The whole tree of available options is shown below:
1. Create new project
2. Load existing project
3. Activate configuration file
4. Show SQL data sources
4.1. Publications
4.2. Projects
4.3. Patents
5. Manage Neo4J database
5.1. Show Neo4J Super Graph
5.1.1. Overview of the whole graph databases
5.1.2. Show information about a specific snode
5.1.3. Show information about a specific sedge
5.2. Reset Neo4J Graphs
5.2.1. Reset the whole Neo4J graph databases
5.2.2. Reset a specific Neo4J snode
5.2.3. Reset a specific Neo4J sedge
5.3. Export graphs to Neo4J
5.3.1. Project
5.3.2. Publication
5.3.3. Patent
5.3.4. Author
5.3.5. Organization
5.4. Export bigraph to Neo4J
5.4.1. Project
5.4.2. Publication
5.4.3. Patent
5.4.4. Author
5.4.5. Organization
6. Pre-visualize supergraph
6.1. Show supergraph structure
6.2. Quick preview of graph
6.3. Quick preview of bipartite graph
7. Reset supernode
8. Reset superedge
9. Import_data
9.1. Import co-citations graph from DB
9.2. Import complete citations graph from SCOPUS
9.2.1. Directed graph from citing papers to cited papers
9.2.2. Directed graph from cited papers to citing papers
9.2.3. Undirected graph
9.3. Import citations subgraph from SCOPUS
9.3.1. Directed graph from citing papers to cited papers
9.3.2. Directed graph from cited papers to citing papers
9.3.3. Undirected graph
9.4. Load node attributes from SQL databases
9.4.1. Publications
9.4.2. Projects
9.4.3. Patents
9.5 Import project-researchers bipartite graph from file
10. Graph tools
10.1. Subsample snode
10.1.1. Replace the original snode
10.1.2. Keep the original snode and create a new one
10.2. Make a subgraph with the largest community
10.3. Remove isolated nodes
10.4. Remove attribute from graph nodes
10.5. Generate a synthetic graph for simple testing
11. Graph inference tools
11.1. Cluster equivalence classes
11.2. Equivalent Similarity graph: from A_X to A-A
11.3. Similarity graph: from A_X to A-A
11.3.1. He: 1 minus squared Hellinger's distance (JS) (sklearn-based)"
11.3.2. He2: 'He2: self implementation of He (faster)'
11.3.3. l1: 'l1: 1 minus l1 distance'
11.3.4. JS: 'JS: Jensen-Shannon similarity (too slow)'
11.3.5. Gauss: 'Gauss: An exponential function of the squared l2 distance'
11.3.6. He->JS: 'He->JS: JS through He and a theoretical bound'
11.3.7. He2->JS: 'He2->JS: Same as He->JS, but using implementation He2'
11.3.8. l1->JS: 'l1->JS: JS through l1 and a theoretical bound'
11.4. Bipartite graph from attributes: from A_B to A->B
11.5. Transductive graph: from A-A->B to B-B
11.5.1. First-order graph (for transduced similarity graphs)
11.5.2. Zero-order graph (for cooperation graphs)
11.6. Transitive graph: from A->B->C to A->C
12. Local graph analysis
12.1. Eigenvector Centrality
12.2. Degree Centrality
12.3. Betweenness centrality
12.4. Closeness centrality
12.5. Clustering Coefficient
12.6. PageRank
12.7. Katz centrality
12.8. Absolute (i.e. unnormalized) in-degree
12.9. Absolute (i.e. unnormalized) out_degree
13. Community detection tools
13.1. Louvain
13.2. Fastgreedy
13.3. Walktrap
13.4. Infomap
13.5. Label Propagation
13.6. kmeans
13.7. Agglomerative Kmeans
13.8. Connected components
14. Evaluate community partitions
14.1. Coverage
14.2. Performance
14.3. Modularity
15. Compare two communities
15.1. VI: Variation of information metric, Meila (2003)
15.2. NMI: Normalized mutual information, Danon et al (2005)
15.3. RI: Rand index, Rand (1971)
15.4. ARI: Adjusted Rand index, Hubert and Arabie (1985)
15.5. SJD: Split-join distance of van Dongen (2000)
15.6. SJP: Split-join projection of van Dongen (200)
16. Graph visualization
16.1. Show top nodes ranked by attribute value
16.2. Graph layout
16.3. Visualize bipartite graph
0. Exit the application
Some options (such as those related to the processing or analysis of specific graphs) will request some more information: the graph of graphs to be processes, the attributes to be analyzed, etc.)
The application allows the exportation of all graphs to a Neo4J database. You can find some instructions about Neo4J here: