Given a NCBI Bioproject ID
- Gather SRA IDs
- Prefetch all SRAs
- Chunks SRAs into groups and submits to the multithreaded fasterq-dump
Highly recommend executing this command which will have NCBI download temporary files to /scratch instead of your home directory
echo '/repository/user/main/public/root = "/scratch"' >> $HOME/.ncbi/user-settings.mkfgAdd an environment variable to your path pointing at the code. Only do this once!
echo "export dumpproject=/projectnb2/talbot-lab-data/msilver/BUfqdump/dumpproject.qsub" >> ~/.bashrc
source ~/.bashrcqsub $dumpproject PRJNA485178 $PWD/PRJNA485178 --split-files
For submitting multiple jobs, it can be convenient to name the qsub job with the bioproject id
bioproject=PRJNA485178
qsub -N $bioproject $dumpproject $bioproject $PWD/$bioproject --split-filesThere are two files in this repo:
dumpproject.qsub:- Retrieves the SRA IDs from a given Bioproject
- Breaks the list of those IDs into chunks of 30
- Submits each chunk to
fqdump.qsub
fqdump.qsub: Downloads and gzips fastq files for given SRA IDs.
Inputs for each be viewed in the command line with -h:
[msilver4@scc1 ~]$ bash $dumpproject -h
USAGE: qsub dumpproject.qsub <NCBI BIOPROJECT ID> <OUTPUT DIRECTORY> <FASTERQ-DUMP ARGUMENTS>
- Do not use the -O option for output directory, supply output directory as second argument
- Do not use --gzip, files will be zipped automaticallyNOTICE: Output directory is the second argument, do not provide the -O option. Nor should you provide the --gzip command, refer to the fasterq-dump documentation for allowed arguments.
[msilver4@scc1 ~]$ bash $(dirname $dumpproject)/fqdump.qsub -h
# USAGE: qsub fqdump.qsub <PATH TO LIST OF SRAS> <OUTPUT DIRECTORY> <fasterq-dump arguments>
# Do not use the -O option for output directory, supply output directory as second argument