Identification of Novel Genetic Markers for Complex Diseases by Integrating Blood Samples from Both Patient and Healthy Donors (Tenk10k cohort-specific eQTL)

This repository contains the bioinformatics pipeline used to investigate the regulatory architecture of the immune system under the physiological stress of Coronary Artery Disease (CAD). The analysis utilizes SAIGE-QTL for genetic association testing across two cohorts (BioHeart and TOB) and then Cochran's Q-test and SMR (Summary-data-based Mendelian Randomization) for integrative analysis.

Main Analyses

We ran eQTL mapping using SAIGE-QTL in TOB and BioHeart cohorts separately across 28 blood cell types, and performed heterogeneity and SMR tests.

Project Structure

The pipeline is divided into four main stages: Preprocessing, Processing (SAIGE-QTL), Postprocessing, and Downstream Analysis (SMR & Heterogeneity).

1. Preprocessing

Scripts to prepare gene coordinates, filter genes based on expression thresholds, and format phenotypes/genotypes.

generate_coords.py: Generates hg38 coordinates for the analysis.
generate_filtered_gene_list.py: Filters genes within input_{cell_type} that meet the minimum expression criteria (> 0.05% of cells).
prepare_phenotypes.py: Merges gene expression from .h5ad files with covariates (PCs) into SAIGE-ready formats.
prepare_genotypes.sh: Shell script to generate chromosome-wise PLINK binary files.
submit_phenotypes_array.sh: HPC array job script to parallelize the creation of phenotype files across 22 chromosomes.

2. Processing (SAIGE-QTL)

The core association testing phase, optimized for running Steps 1 & 2 of SAIGE-QTL.

run_SAIGE.sh: Primary implementation of SAIGE-QTL (Steps 1 & 2 combined) for each specific cohort.
rerun_SAIGE.sh: Troubleshooting script for failed genes; uses the filtered gene list and provides higher RAM allocation.
missing_genes.sh: Diagnostic script to identify genes that failed initial runs due to memory constraints.

3. Postprocessing

Consolidating raw outputs from the cluster into unified datasets.

merge_results.py: Merges raw Step 2 results into Full-Genome files for both cohorts.

4. Downstream Analysis

Heterogeneity Testing:

het_test.R: Performs Cochran’s Q test on cell-type-specific results to identify heterogeneous effects. Outputs are stored in output_heterogeneity/.

SMR Analysis:

run_smr_gene_wise.py & run_SMR.sh: Integration of CAD GWAS data, gene lists, and hg38 coordinates to perform Summary-data-based Mendelian Randomization.

Data Note

Input data (Genotypes, .h5ad objects, and raw GWAS summaries) are not included in this repository due to size and privacy constraints. These scripts assume a directory structure containing /inputs_genotype and /input_{cell_type}

Contact

Angli Xue (a.xue@garvan.org.au)

Jonathan Johnson (jonathanvergis@hotmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Step 1 - Preparation of Input Files		Step 1 - Preparation of Input Files
Step 2 - Running SAIGE-QTL		Step 2 - Running SAIGE-QTL
Step 3 - Merging Raw Results		Step 3 - Merging Raw Results
Step 4 - Running Heterogeneity Tests		Step 4 - Running Heterogeneity Tests
Step 5 - Performing SMR		Step 5 - Performing SMR
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identification of Novel Genetic Markers for Complex Diseases by Integrating Blood Samples from Both Patient and Healthy Donors (Tenk10k cohort-specific eQTL)

Main Analyses

Project Structure

1. Preprocessing

2. Processing (SAIGE-QTL)

3. Postprocessing

4. Downstream Analysis

Heterogeneity Testing:

SMR Analysis:

Data Note

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Identification of Novel Genetic Markers for Complex Diseases by Integrating Blood Samples from Both Patient and Healthy Donors (Tenk10k cohort-specific eQTL)

Main Analyses

Project Structure

1. Preprocessing

2. Processing (SAIGE-QTL)

3. Postprocessing

4. Downstream Analysis

Heterogeneity Testing:

SMR Analysis:

Data Note

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages