-
Notifications
You must be signed in to change notification settings - Fork 1
Files and folders
data:
- input networks in edge_lists/
- optional extra node features (node_attributes)
- labeled GRCh37 Ensembl gene IDs protein coding genes based on the disease gene list (train_node_labels)
- labeled GRCh37 Ensembl gene IDs protein coding genes based on the test set gene list (test_node_labels)
parameters: parametrization of the models.
src: core code of Tiresias. Includes all models, bagging, LOOCV, artifacts folder creation (intermediate files), mlruns visualization folder creation (AUC of the cumulative distribution curve, ranking of unlabeled nodes, assessment metrics such as Spearman’s coefficient).
README: instructions for Tiresias installation and use.
config.yml: file to be user-updated with the relevant system resources, input files paths, and the methods selection.
Makefile: all available steps of the run, the pipeline of Tiresias.
environment.yml: dependencies of the run.
data/node_attributes: optional extra node features file- already provided or user-provided.
data/train_node_labels: file with all protein coding genes and their corresponding label according to the disease gene list of interest. A file with 1 column: node which is the GRCh37 Ensembl gene IDs of the disease genes of the disease of interest.
data/test_node_labels: file similar to the above (train_node_labels) but for testing the generalisation of the trained models.
data/edge_lists/layer*.tsv: network examples. Each network is in a tab-delimited file with 3 columns: src, dst, weight. Unweighted networks need to be weighted (1 as weight of each edge).
parameters/features: file with random walks and skip gram (embeddings) parameters.
parameters/models_validation: file with all model parameters for the runs of the validation step of the pipeline; the run on the disease gene list.
parameters/models_test: file with all model parameters for the runs for the optional test step of the pipeline; the run on the test set gene list.
| Model name | Description |
|---|---|
direct neighbors |
label propagation method- no learning |
label_spreading |
label propagation method- no learning |
rwr |
random walk with restart- no learning |
rwr_m |
random walk with restart for multilayer networks- no learning |
bagging_logistic_regression |
node2vec (embeddings) coupled with logistic regression |
bagging_logistic_regression_with_attributes |
node2vec (embeddings) coupled with logistic regression for learning, with extra gene features |
bagging_mlp |
node2vec (embeddings) coupled with multilayer perceptron (MLP) for learning |
bagging_mlp_with_attributes |
node2vec (embeddings) coupled with multilayer perceptron (MLP) for learning, with extra gene features |
bagging_gcn |
graph convolutional networks |
bagging_gcn_with_attributes |
graph convolutional networks, with extra gene features |
bagging_rgcn |
relational graph convolutional networks for weighted networks |
bagging_rgcn_with_attributes |
relational graph convolutional networks for weighted networks, with extra gene features |