Abby Williams, University of Oxford, 2025
A pipeline for cleaned sequencing reads from avian museum samples to the Zosterops lateralis pseudochromosome assembly. This pipeline is specialised to take merged, paired-end reads that have already been cleaned using e.g. nf-polish, or alternatively to do a quick clean using fastp before mapping.
Workflow
Main steps are outlined below:
- [OPIONAL] Trim reads and remove adapters using fastp
- Map reads to the Zosterops lateralis pseudochrome assembly using bwa-mem2
- Deduplication of merged paired end reads using DeDup (museum samples) or PicardMarkDuplicates (contemporary samples)
- Calculate mapping stats and depth using Samtools
- Assess DNA damage using MapDamage2 (museum samples only)
Installation and usage
Use conda/mamba to install the environment from the environment.yaml provided.
mamba create --prefix ./snakemake-env --file environment.yaml
Do the following prior to running:
- configure the pipeline by editing
config/config.yaml - add any appropriate profiles in
profiles/ - edit the
run.shdepending on your HPC environment
Then run as appropriate for your HPC. For slurm-based schedulers, run:
sbatch run.sh
This pipeline was built in snakemake using this workflow template.