Tools to add genomic positions to files that contain dbSNP IDs. The pipeline downloads dbSNP from the UCSC Genome Browser (UCSC/NCBI dbSNP mirrors), filters to the required columns, splits the reference for faster lookups, and then maps IDs in parallel. It supports dbSNP releases 151, 153, and 155 (default: 155).
- R packages:
data.table,optparse,parallel,here - Command line tools:
split,gzip(optional:pigzfor faster decompression),aria2c(optional, multi-connection downloads) - Optional but recommended: UCSC
bigBedNamedItemsutility + dbSNP BigBed (dbSnp155.bb) — defaults target to hg38/dbSNP155
Install the R dependencies with:
Rscript -e 'install.packages(c("data.table","optparse","parallel","here"))'Install aria2c (optional, for faster downloads):
- Conda:
conda install -c conda-forge aria2 - Homebrew (macOS):
brew install aria2 - Debian/Ubuntu:
sudo apt-get install -y aria2 - RHEL/CentOS:
sudo yum install -y aria2
Using the UCSC BigBed file skips the 90–100 GB text download and parallel awk scan.
- Download the utility (place it in
./script/or yourPATH):
- macOS (Apple Silicon):
curl -L http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.arm64/bigBedNamedItems -o ./script/bigBedNamedItems && chmod +x ./script/bigBedNamedItems - macOS (Intel):
curl -L http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bigBedNamedItems -o ./script/bigBedNamedItems && chmod +x ./script/bigBedNamedItems - Linux (x86_64):
curl -L http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedNamedItems -o ./script/bigBedNamedItems && chmod +x ./script/bigBedNamedItems
- Download the dbSNP BigBed for your build (defaults to
./data/dbSnp<version>_<build>.bb):
- hg38 (default):
http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb(ordbSnp153.bb/dbSnp151.bbif you set--dbsnp-version) - hg19:
http://hgdownload.soe.ucsc.edu/gbdb/hg19/snp/dbSnp155.bb(ordbSnp153.bb/dbSnp151.bb) Make sure the file you download matches the--buildyou use. The script will auto-download to./dataif missing, unless--no-bbis set. Tip: if you have bandwidth,aria2ccan speed this up:
aria2c -x8 -s8 -o dbSnp155_hg38.bb http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb
mv dbSnp155_hg38.bb ./data/If you do not have aria2c, use curl:
curl -L http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb -o ./data/dbSnp155_hg38.bb- Run with
--bb-file:
Rscript ./script/positionsFromDBSNP.r \
--input=./example/example_input.txt \
--ID=ID \
--build=hg38 \
--dbsnp-version=155 \
--bb-file=./data/dbSnp155_hg38.bb \
--outdir=./example \
--prefix=example_bb \
--data-dir=./dataThe script will still download RsMergeArch.bcp for ID updates if it is not present in ./data.
Use this only if you are not using the BigBed fast path. It downloads and splits the full text dumps.
Download and preprocess dbSNP once, then reuse across runs:
Rscript ./script/prepare_reference_data.R \
--build=both \
--dbsnp-version=155 \
--data-dir=./data \
--cpus=8This fetches dbSNP 155 for hg19 and hg38 (~90–100 GB total after splitting) and the RsMerge archive, storing everything under ./data. Use --build=hg19 or --build=hg38 to limit downloads, and --split-lines to adjust chunk size.
Note: downloads automatically prefer aria2c (multi-connection) when available; otherwise they fall back to download.file.
Warning: the text/awk path is legacy and slow for large inputs; prefer the BigBed fast path whenever possible.
Rscript ./script/positionsFromDBSNP.r [options]Key options:
--inputpath to file with dbSNP IDs (e.g., summary statistics)--IDcolumn name containing dbSNP IDs (default:ID)--buildgenome build:hg19orhg38--dbsnp-versiondbSNP release to use (151,153, or155; default:155)--bb-filepath to dbSNP BigBed file (if set, text-based lookup is skipped; defaults to a downloaded./data/dbSnp<version>_<build>.bbwhen available)--no-bbdisable the BigBed fast path and force text lookup--data-dirdirectory for reference data (default:./data)--outdiroutput directory--prefixprefix for output file name (defaults to input filename)--cpusCPUs to use for parallel lookups--skipskip this many lines in the input file--prepare-onlydownload reference data and exit
Rscript ./script/positionsFromDBSNP.r \
--input=./example/example_input.txt \
--ID=ID \
--build=hg38 \
--dbsnp-version=155 \
--outdir=./example \
--prefix=example \
--data-dir=./data \
--cpus=16# Install aria2 (optional, faster downloads)
brew install aria2
# Get UCSC BigBed tool (Apple Silicon) and make executable
curl -L http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.arm64/bigBedNamedItems -o ./script/bigBedNamedItems
chmod +x ./script/bigBedNamedItems
# Download dbSNP BigBed (hg38/dbSNP155)
aria2c -x8 -s8 -o dbSnp155_hg38.bb http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb
mv dbSnp155_hg38.bb ./data/
# Install R deps
Rscript -e 'install.packages(c("data.table","optparse","parallel","here"))'
# Run example
Rscript ./script/positionsFromDBSNP.r \
--input=./example/example_input.txt \
--ID=ID \
--build=hg38 \
--dbsnp-version=155 \
--bb-file=./data/dbSnp155_hg38.bb \
--outdir=./example \
--prefix=example_bb \
--data-dir=./databrew install aria2
curl -L http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bigBedNamedItems -o ./script/bigBedNamedItems
chmod +x ./script/bigBedNamedItems
aria2c -x8 -s8 -o dbSnp155_hg38.bb http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb
mv dbSnp155_hg38.bb ./data/
Rscript -e 'install.packages(c("data.table","optparse","parallel","here"))'
Rscript ./script/positionsFromDBSNP.r --input=./example/example_input.txt --ID=ID --build=hg38 --dbsnp-version=155 --bb-file=./data/dbSnp155_hg38.bb --outdir=./example --prefix=example_bb --data-dir=./data# Install aria2 (Debian/Ubuntu example)
sudo apt-get update && sudo apt-get install -y aria2
# Get UCSC BigBed tool (Linux x86_64)
curl -L http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedNamedItems -o ./script/bigBedNamedItems
chmod +x ./script/bigBedNamedItems
# Download dbSNP BigBed (hg38/dbSNP155)
aria2c -x8 -s8 -o dbSnp155_hg38.bb http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb
mv dbSnp155_hg38.bb ./data/
# Install R deps
Rscript -e 'install.packages(c("data.table","optparse","parallel","here"))'
# Run example
Rscript ./script/positionsFromDBSNP.r \
--input=./example/example_input.txt \
--ID=ID \
--build=hg38 \
--dbsnp-version=155 \
--bb-file=./data/dbSnp155_hg38.bb \
--outdir=./example \
--prefix=example_bb \
--data-dir=./data- UCSC Genome Browser downloads (dbSNP tables): https://hgdownload.soe.ucsc.edu/goldenPath/
- UCSC gbdb BigBed sources for dbSNP: https://hgdownload.soe.ucsc.edu/gbdb/
- UCSC bigBedNamedItems utility: http://hgdownload.cse.ucsc.edu/admin/exe/
- NCBI dbSNP RsMerge archive: https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/database/organism_data/