This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
The pipeline is built using Nextflow and processes data using the following steps:
- MultiQC - Aggregate report describing results and QC from the whole pipeline
- Trimmomatic - Quality and adapter trimming of sequencing reads
- FastQC - Quality control metrics for sequencing reads
- Stacks process_radtags - Demultiplexing and cleaning of RAD-seq data
- Stacks denovo_map - De novo assembly and genotyping of RAD-seq data
- VCFtools - Analysis and filtering of VCF files
- Pipeline information - Report metrics generated during the workflow execution
Output files
multiqc/multiqc_report.html: a standalone HTML file that can be viewed in your web browser.multiqc_data/: directory containing parsed statistics from the different tools used in the pipeline.multiqc_plots/: directory containing static images from the report in various formats.
MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
Figure 1: Number of assembled loci (stacks) generated by Stacks. This show for instance that "sample_104" received an extremely low coverage (6.5X), "sample_105" sufficient coverage (30.3X) and "sample_106" borderline coverage (25.1X).
Figure 2: Overview of read survival rates after running trimmomatic. Low survival rate (e.g sample "_118_S411") is typically >caused by high adaptor content or low quality sequencing runs.
Figure 3: The fraction of Stacks variants missing in each sample (F_MISS), where lower is better. This value is usually >inversely correlated with sequencing depth, but can be an indicate issues with the rad-seq experiment.
Output files
trimmomatic/*.paired.trim_{1,2}.fastq.gz: Quality and adapter trimmed reads*.summary: Summary of read survival rates after trimming
Trimmomatic is a widely-used tool for preprocessing high-throughput sequencing data, focusing on tasks like adapter removal and quality trimming to enhance read quality.
Output files
fastqc/*_fastqc.html: FastQC report containing quality metrics.*_fastqc.zip: Zip archive containing the FastQC report, tab-delimited data file and plot images.
FastQC gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the FastQC help pages.
Output files
process_radtags/*.{1,2}.fq.gz: Processed reads output by Stacks*.process_radtags.log: A summary of read counts removed by the various filters
Stacks process_radtags is a command from the Stacks software suite, developed by the Catchen lab. The process_radtags command is designed to demultiplex and clean raw sequencing data generated from RAD-seq experiments. It performs tasks such as quality filtering, adapter removal, and barcode demultiplexing.
Output files (summary)
denovo_stacks/*.{tags,snps,alleles}.tsv.gz: Per sample based loci and allele calls (ustacks)catalog.{tags,snps,alleles}.tsv.gz: A catalog or a set of consensus loci, snps and alleles (cstacks)*.matches.bam: Per sample matches to the catalog (sstacks + tsv2bam)populations.snps.vcf: Polymorphic sites in VCF format (populations)denovo_map.log: Running log file for the whole denovo_map.pl pipeline
Stacks denovo_map.pl pipeline developed by the Catchen lab. The pipeline is designed for de novo assembly and genotyping of RAD-seq data, enabling the identification of loci and genetic variants without the need for a reference genome.
It processes raw sequencing reads, clusters them into loci, and performs SNP calling and genotyping across multiple samples. The script automates the execution of various Stacks modules, including ustacks, cstacks, sstacks, and populations.
Output files
vcftools/stacks_denovo_map.het: Heterozygosity per individual, inbreeding coefficient Fstacks_denovo_map.idepth: Mean sequence depth per individualstacks_denovo_map.imiss: Variant missingness per individualstacks_denovo_map.relatedness2: Relatedness statistic (based on doi:10.1093/bioinformatics/btq559)
VCFtools is a software suite for working with VCF files, a standard format for storing genetic variation data. It provides tools for filtering, summarizing, and analyzing variant data, enabling researchers to perform population genetics analyses and quality control.
Output files
pipeline_info/- Reports generated by Nextflow:
execution_report.html,execution_timeline.html,execution_trace.txtandpipeline_dag.dot/pipeline_dag.svg. - Reports generated by the pipeline:
pipeline_report.html,pipeline_report.txtandsoftware_versions.yml. Thepipeline_report*files will only be present if the--email/--email_on_failparameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv. - Parameters used by the pipeline run:
params.json.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.


