nf-dbSNP

A nextflow pipeline to download the latest realease of dbSNP, alligned to the human genome assemblies GRCh37p13 and GRCh38p14.

Introduction

The database dbSNP contains information on registered genetic polymorphisms for the whole genome. This can be useful to check positions of SNPs across assemblies, since methods like LiftOver appear to not be recommended for SNVs (see here). Some point out that converting SNP locations using dbSNP is also not recommended (see here). Regardless, having the database available in searchable format cannot hurt.

Requirements

`conda`

Some steps require access to an installation of conda to create environments with the necessary software. See the directory conda_envs for the specifications.

`R`

An installation of the R programming language with the additional library polars are required. Additionally, the location of the R package library must be entered as the parameter r_lib in the file nextflow.config

Getting Started

Run the pipeline by entering the required parameters in nextflow.config and setting up the profile for your runtime environment (e.g. your local computer or HPC) and the start the pipeline.

# The basic command
nextflow run main.nf 

# Specify this in `nextflow.config` or as a parameter flag
nextflow run main.nf --r_lib </PATH/TO/YOUR/R/LIBRARY>

# Select a profile if necessary
nextflow run main.nf -profile cluster

Input

The pipeline does not require input files, only the name of the genome build used for the positions of the variants, which must be either/or grch37_p13 and grch38_p14 and the chromosomes (in UCSC style) to write as output. You can change these values in the file nextfow.config.

Output

Based on the parameter outDir (default ist ./output/) and the selected inputs, the pipeline will output .parquet files for each chromosome of each genome build (GRCh37p13, GRCH38p14).

output/
├── grch37_p13
│   ├── dbsnp_grch37_p13_chr1.parquet
│   ├── dbsnp_grch37_p13_chr10.parquet
[...]
│   └── dbsnp_grch37_p13_chrY.parquet
└── grch38_p14
    ├── dbsnp_grch38_p14_chr1.parquet
    ├── dbsnp_grch38_p14_chr10.parquet
[...]
    └── dbsnp_grch38_p14_chrY.parquet

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
bin		bin
conda_envs		conda_envs
config		config
images		images
modules		modules
workflows		workflows
.gitignore		.gitignore
README.md		README.md
cleanup.sh		cleanup.sh
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nf-dbSNP

Introduction

Requirements

`conda`

`R`

Getting Started

Input

Output

About

Uh oh!

Releases

Packages

Languages

comp-med/nf-dbsnp

Folders and files

Latest commit

History

Repository files navigation

nf-dbSNP

Introduction

Requirements

conda

R

Getting Started

Input

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`conda`

`R`

Packages