Skip to content

comp-med/nf-dbsnp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-dbSNP

A nextflow pipeline to download the latest realease of dbSNP, alligned to the human genome assemblies GRCh37p13 and GRCh38p14.

image

Introduction

The database dbSNP contains information on registered genetic polymorphisms for the whole genome. This can be useful to check positions of SNPs across assemblies, since methods like LiftOver appear to not be recommended for SNVs (see here). Some point out that converting SNP locations using dbSNP is also not recommended (see here). Regardless, having the database available in searchable format cannot hurt.

Requirements

conda

Some steps require access to an installation of conda to create environments with the necessary software. See the directory conda_envs for the specifications.

R

An installation of the R programming language with the additional library polars are required. Additionally, the location of the R package library must be entered as the parameter r_lib in the file nextflow.config

Getting Started

Run the pipeline by entering the required parameters in nextflow.config and setting up the profile for your runtime environment (e.g. your local computer or HPC) and the start the pipeline.

# The basic command
nextflow run main.nf 

# Specify this in `nextflow.config` or as a parameter flag
nextflow run main.nf --r_lib </PATH/TO/YOUR/R/LIBRARY>

# Select a profile if necessary
nextflow run main.nf -profile cluster

Input

The pipeline does not require input files, only the name of the genome build used for the positions of the variants, which must be either/or grch37_p13 and grch38_p14 and the chromosomes (in UCSC style) to write as output. You can change these values in the file nextfow.config.

Output

Based on the parameter outDir (default ist ./output/) and the selected inputs, the pipeline will output .parquet files for each chromosome of each genome build (GRCh37p13, GRCH38p14).

output/
├── grch37_p13
│   ├── dbsnp_grch37_p13_chr1.parquet
│   ├── dbsnp_grch37_p13_chr10.parquet
[...]
│   └── dbsnp_grch37_p13_chrY.parquet
└── grch38_p14
    ├── dbsnp_grch38_p14_chr1.parquet
    ├── dbsnp_grch38_p14_chr10.parquet
[...]
    └── dbsnp_grch38_p14_chrY.parquet

About

Nextflow pipeline to download and format the latest dbsnp release

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published