Polygenic Risk Scoring for ASD

My independent study examined the predictive power of polygenic risk scoring for autism, using 119 cases and 201 controls.

Download Final Report

⚠️ Project Status: This repository is under active development to ensure proper documentation and reproducibility of the entire machine learning pipeline. The analyses and results for the project can be found in the Final Report.

Repository Structure

SpectrumPRS/
│
├── data/         
|   ├── hdgp_ref 
|   ├── study_samples.md            
├── notebooks/    
|   ├── exploratory_PRS.ipynb    
├── scripts/
│   ├── getHG19info.sh   
│   ├── installDependencies.sh 
│   ├── liftover.sh
|   ├── sortByBP.sh
│   └── getAutosomes.sh
├── src/                 
│   ├── preprocessing   
│   ├── features
│   ├── models
│   └── evaluation          
├── results/     # model outputs       
└── README.md

data/hgdp_ref: Contains a comparative reference panel from the Human Genome Diversity Project, which was used to help estimate the admixture for the American samples.

data/study_samples.md: Contains information on the samples in the study.

notebook/exploratory_PRS.ipynb: Before exploring the different methods for polygenic risk scoring in autism, I ran a sample experiment on the Michigan Imputation Server (v2) with a random set of samples from the HGDP project. This shows the expected distribution of scores given a random sample, but the samples did not contain phenotypes for evaluating predictive power.

scripts/: Contains bash scripts for quality control. Detailed descriptions of each QC step can be found in the directory README.

scripts/getHG19Info.sh: Transfers your dowloaded HG19.fa file and Hg38ToHg19.over.chain file to the script directory to properly run liftover scripts

scripts/installDependencies.sh: Installs CrossMap, bcftools, and htslib on your local machine. If you are using the Great Lakes HPC, bcftools and htslib are already downloaded, so you do not need to run this script.

scripts/liftover.sh: Lifts a VCF file in HG38 build over to HG19 (more commonly used in genomic studies).

scripts/sortByBP.sh: Sorts a VCF file by ascending base pairs for each chromosome.

scripts/getAutosomes.sh: Reduces the VCF file to chromosomes 1-22, excluding the sex chromosomes. This may be useful if you only want to study autosomal DNA for your project.

Installation/Usage

1. Clone the github repository in your terminal and navigate to the SpectrumPRS directory

git clone https://github.com/KatherineWasmer/SpectrumPRS
cd SpectrumPRS

2. To download one or more genotype file, navigate to data -> study_samples.md and click the hyperlink of your choice. Unless you are working on an HPC cluster, downloading an entire TAR file (typically > 10 GB) is highly discouraged.

Example for downloading a single file (A102902, from LaSalle):

i. Go to data -> study_samples.md -> click the hyperlinked GEO accession ID for the American data set

ii. On the NCBI webpage, click on the hyperlinked GSM ID next to your sample. This will direct you to a page for downloading SNP data.

iii. Underneath the supplementary files section, navigate to the row with the file "GSM5381820_A102092_SNP.vcf.gz". Click on the http hyperlink to download the VCF file.

3. Add downloaded files to your cloned data folder. Since even a single genotype file exceeds the file upload limits on GitHub, you will need to install them manually. Run this script to transfer a downloaded file to the data directory.

# using the example from step 2 
cd data
mv ~/Downloads/GSM5381820_A102092_SNP.vcf.gz .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polygenic Risk Scoring for ASD

Download Final Report

Repository Structure

Installation/Usage

1. Clone the github repository in your terminal and navigate to the SpectrumPRS directory

2. To download one or more genotype file, navigate to data -> study_samples.md and click the hyperlink of your choice. Unless you are working on an HPC cluster, downloading an entire TAR file (typically > 10 GB) is highly discouraged.

3. Add downloaded files to your cloned data folder. Since even a single genotype file exceeds the file upload limits on GitHub, you will need to install them manually. Run this script to transfer a downloaded file to the data directory.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
data		data
notebooks		notebooks
results		results
scripts		scripts
src		src
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Polygenic Risk Scoring for ASD

Download Final Report

Repository Structure

Installation/Usage

1. Clone the github repository in your terminal and navigate to the SpectrumPRS directory

2. To download one or more genotype file, navigate to data -> study_samples.md and click the hyperlink of your choice. Unless you are working on an HPC cluster, downloading an entire TAR file (typically > 10 GB) is highly discouraged.

3. Add downloaded files to your cloned data folder. Since even a single genotype file exceeds the file upload limits on GitHub, you will need to install them manually. Run this script to transfer a downloaded file to the data directory.

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages