Skip to content

Files and folders

SD edited this page Mar 5, 2020 · 14 revisions

List of pipeline folders

data:

  • input networks in edge_lists/
  • optional extra node features (node_attributes)
  • labeled GRCh37 Ensembl gene IDs protein coding genes based on the disease gene list (train_node_labels)
  • labeled GRCh37 Ensembl gene IDs protein coding genes based on the test set gene list (test_node_labels)

parameters: parametrization of the models.

src: core code of Tiresias. Includes all models, bagging, LOOCV, artifacts folder creation (intermediate files), mlruns visualization folder creation (AUC of the cumulative distribution curve, ranking of unlabeled nodes, assessment metrics such as Spearman’s coefficient).

List of pipeline files

README: instructions for Tiresias installation and use.

config.yml: file to be user-updated with the relevant system resources, input files paths, and the methods selection.

Makefile: all available steps of the run, the pipeline of Tiresias.

environment.yml: dependencies of the run.

data/node_attributes: optional extra node features file- already provided or user-provided.

data/train_node_labels: file with all protein coding genes and their corresponding label according to the disease gene list of interest. A file with 1 column: node which is the GRCh37 Ensembl gene IDs of the disease genes of the disease of interest.

data/test_node_labels: file similar to the above (train_node_labels) but for testing the generalisation of the trained models.

data/edge_lists/layer*.tsv: network examples. Each network is in a tab-delimited file with 3 columns: src, dst, weight. Unweighted networks need to be weighted (1 as weight of each edge). parameters/features: file with random walks and skip gram (embeddings) parameters.

parameters/models_validation: file with all model parameters for the runs of the validation step of the pipeline; the run on the disease gene list.

parameters/models_test: file with all model parameters for the runs for the optional test step of the pipeline; the run on the test set gene list.

List of model configurations

Model name Description
direct neighbors label propagation method- no learning
label_spreading label propagation method- no learning
rwr random walk with restart- no learning
rwr_m random walk with restart for multilayer networks- no learning
bagging_logistic_regression node2vec (embeddings) coupled with logistic regression
bagging_logistic_regression_with_attributes node2vec (embeddings) coupled with logistic regression for learning, with extra gene features
bagging_mlp node2vec (embeddings) coupled with multilayer perceptron (MLP) for learning
bagging_mlp_with_attributes node2vec (embeddings) coupled with multilayer perceptron (MLP) for learning, with extra gene features
bagging_gcn graph convolutional networks
bagging_gcn_with_attributes graph convolutional networks, with extra gene features
bagging_rgcn relational graph convolutional networks for weighted networks
bagging_rgcn_with_attributes relational graph convolutional networks for weighted networks, with extra gene features

Clone this wiki locally