metashot/mag-illumina is a workflow for the assembly and binning of Illumina sequences from metagenomic samples.
- Input: single-end, paired-end (also interleaved) Illumina sequences (gzip and bzip2 compressed FASTQ also supported);
- Histogram text files (for each input sample) of base frequency, quality scores, GC content, average quality and length are generated from input reads and clean reads using bbduk;
- Adapter trimming, contaminant filtering and quality filtering/trimming and length filtering using bbduk;
- Assembly with Spades or Megahit;
- Assembly statistics using bbtools;
- Binning with Metabat2.
- Optionally, assemble plasmids with metaplasmidSPAdes and verify them using ViralVerify.
- Install Docker (or Singulariry) and Nextflow (see Dependences);
- Start running the analysis:
nextflow run metashot/mag-illumina \
--reads '*_R{1,2}.fastq.gz' \
--outdir resultsSee the file nextflow.config for the complete list of
parameters.
The files and directories listed below will be created in the results directory
after the pipeline has finished.
scaffolds: scaffolds for each input sample;bins: genome bins;unbinned: unbinned contigs;stats_scaffolds.tsv: scaffold statistics;verified_plasmids: verified plasmids (if--run_metaplasmidspadesis set).
raw_reads_stats: base frequency, quality scores, gc content, average quality and length for each input sample;clean_reads_stats: same as above, but for the reads after the quality control;clean_reads: clean reads (if--save_cleanis set);qc: adapter trimming and contaminant filtering statistics;metaspades,metaplasmidspadesandmegahit: complete assembler output for each sample (if--save_assembler_outputis set);scaffolds_plasmids: candidate plasmids (if--run_metaplasmidspadesis set);viralverify: viralVerify output (if--run_metaplasmidspadesis set);metabat2: metabat2 log and the depth of coverage for each assembly.
Please refer to System requirements for the complete list of system requirements options.