A workflow for utilizing unique molecular identifiers (UMIs) for error-correction of small subunit (SSU) rRNA (e.g. 16S rRNA) gene amplicons on the Nanopore platform. This workflow is a branch of the longread_umi pipeline, and has been taylored for 16S rRNA gene sequencing with newer Nanopore sequencing chemistry (>= R.10.4).
Table of contents
Citations
Lin, Xuan, Kate Waring, John Tyson, Ryan M. Ziels. (2023) High-accuracy meets high-throughput for microbiome profiling with near full-length 16S rRNA amplicon sequencing on the Nanopore platform. bioRxiv.
Karst, Søren M., Ryan M. Ziels, Rasmus H. Kirkegaard, Emil A. Sørensen, Daniel McDonald, Qiyun Zhu, Rob Knight, and Mads Albertsen. (2021) High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat Methods 18, 165–169 (2021). https://doi.org/10.1038/s41592-020-01041-y
-
Install the
longread_umipackage -
Determine the location of the package contents. For instance, if
longread_umiwas installed via conda, type:conda activate longread_umi script_path="`echo "$CONDA_PREFIX/longread_umi"`" conda deactivate -
Download the
ssUMIscripts:
git clone https://github.com/ZielsLab/ssUMI.git
-
Replace the
longread_umiscripts folder with the new (ssUMI) scripts foldermv $script_path/scripts $script_path/scripts_old mv path/to/ssUMI/scripts $script_path/ chmod +x $script_path/scripts/* -
Download
VSEARCH -
Install
medakavia a virtual environment, and add new medaka model files: (see https://github.com/nanoporetech/medaka for details)cd /path/to/medaka python3 -m venv medaka --prompt "medaka" source medaka/bin/activate pip install --upgrade pip pip install medaka medaka tools download_models deactivate -
Edit the file
$script_path/scripts/dependencies.shReplace part of this line:
export VSEARCH="/path/to/vsearch"with your file path to your
VSEARCHinstallation.Then, replace part of this line:
export USEARCH="/path/to/usearch"with the path to your
USEARCHenvironment (installed as part oflongread_umi)Finally, replace part of this line:
export MEDAKA_ENV_START="source /path/to/medaka/bin/activate"with the paths to your
medakavirtual environment (e.g. leave thesource activatepart).It is highly recommended that users test their installation and
ssUMIscripts using the test data found in thetest_datafolder within the clonedssUMIrepository. Code for running thessUMIpipeline ontest_data.fastqis given below in Usage. The expected output is a fasta file containing 4 UMI-based consensus sequences.
longread_umi ssumi_rapid: run the ssUMI pipeline for consensus polishing of UMI-tagged 16S rRNA gene amplicons in 'rapid' mode, with just (-c) rounds of Racon polishing (recommended number of rounds = 3).
longread_umi ssumi_std: run the ssUMI pipeline for consensus polishing of UMI-tagged 16S rRNA gene amplicons in 'standard' mode, with just (-c) rounds of Racon polishing (recommended value = 3), then (-p) rounds of Medaka (recommended value = 2), followed by a final round of Racon polishing.
usage:
ssumi_std [-h] (-d file -v value -o dir -s value)
(-e value -m value -M value -f string -F string -r string -R string )
( -c value -p value -n value -u dir -t value -T value )
ssumi_rapid [-h] (-d file -v value -o dir -s value)
(-e value -m value -M value -f string -F string -r string -R string )
( -c value -n value -u dir -t value )
where:
-h Show this help text.
-d Single file containing raw Nanopore data in fastq format.
-v Minimum read coverage for using UMI consensus sequences for
variant calling.
-o Output directory.
-s Check start of read up to s bp for UMIs.
-e Check end of read up to f bp for UMIs.
-m Minimum read length.
-M Maximum read length.
-f Forward adaptor sequence.
-F Forward primer sequence.
-r Reverse adaptor sequence.
-R Reverse primer sequence.
-c Number of iterative rounds of consensus calling with Racon.
-p Number of iterative rounds of consensus calling with Medaka.
-q Medaka model used for polishing. r941_min_high, r10_min_high etc.
-u Directory with UMI binned reads.
-t Number of threads to use.
-T Number of medaka jobs to start. Threads pr. job is threads/jobs.
[Default = 1].
Below is an example usage for V1-V9 16S rRNA gene amplicons generated with UMI-tagged 8F / 1391R UMI primers and an ONT R.10.4 flowcell and basecalled with guppy v6.3.8. The raw ONT reads would be given by INFILE (e.g. INFILE="path/to/raw/reads"), the desired output folder specified by OUTDIR (e.g. OUTDIR="path/to/output"), and the number of threads by THREADS (e.g. THREADS=16).
longread_umi ssumi_std \
-d ${INFILE} \
-v 3 \
-o ${OUTDIR} \
-s 200 \
-e 200 \
-E 0.1 \
-m 1200 \
-M 2000 \
-f GTATCGTGTAGAGACTGCGTAGG \
-F AGRGTTYGATYMTGGCTCAG \
-r AGTGATCGAGTCAGTGCGAGTG \
-R GACGGGCGGTGWGTRCA \
-c 3 \
-p 2 \
-q r104_e81_sup_g610 \
-t ${THREADS} \
-T ${THREADS}