Skip to content

Project Plan

Kaavya Venkateswaran edited this page May 19, 2022 · 8 revisions

Project Plan

The project in hand is an attempt to reproduce the metagenomic analysis performed by Thrash et al (2017): Metabolic Roles of Uncultivated Bacterioplankton Lineages in the Northern Gulf of Mexico “Dead Zone”

Background Information & Sample Information

Marine regions that have seasonal to long-term low dissolved oxygen(DO) concentrations, sometimes called “dead zones,” are increasing in number and severity around the globe with deleterious effects on ecology and economics. This study focuses on the "Dead zone" in the continental shelf of the northern Gulf of Mexico (nGOM). Cosmopolitan bacterioplankton lineages that have eluded cultivation whose metabolic roles, which ended up in the sea, is unknown. This study hence analyses the metagenomic & metatranscriptomic profile of the dead vs live zones. The paper looked at 6 sites 6 samples with a range of oxic profiles (D1, D2, D3, E2, E2A, E4), whereas this project will only investigate samples from sites D1 and D3. These samples are take from different oxic concentration locations. D1 with high oxic concentration than that of D3.

Basic analyses:

  • Metagenome assembly.
  • Binning.
  • Quality check of assembly and bins
  • Functional annotation
  • Basic phylogenetic placement of bins
  • Reads preprocessing: trimming + quality check
  • Analysis of activity of different bins.

Workflow

Week Step Analysis Type Software Running Time & computation requirements Data in Data out
1 1 Reads quality Control Illumina FastQC ~15 minutes fastq HTML report
1 2 Trimming Illumina Trimmomatic ~15 minutes fastq fastq
1 3 Reads quality Analyses Trimmed reads MultiQC ~15 minutes fastq HTML report
2 4 Metagenome assembly Metagenomics Megahit ~6 hours (2 cores) fastq fastq
2 5 Binning Metagenomics Metabat <30 min (2 cores) fasta fasta
3 6 Quality check of assembly and bins Fasta sequence analysis CheckM,Quast 2 hours(2 cores) fasta report files
3 7 Mapping/Aligner (DNA &) Bacterial RNA BWA ~ 4-6 hours (2 cores) fasta/fastq Alignment in SAM format
4 8 Functional annotation Prokaryotic Genome Prokka ~1 hour(2 cores) fasta Annotations in GFF3 format + other standards-compliant output files
5 9 Annotation Prokaryotic Genome EggNOG mapper ~ 1 hour gzip Multiple file types
6 10 Basic phylogenetic placement to bins Phylogenetic analysis PhyloPhlan ~6 hours(2 cores) genome file + multifasta (.faa) Newick tree file + PhyloXML file
6 11 Differential expression Annotation & BWA data HTseq ~8 hours EggNOG & BWA output Gene Counts
6 12 Microbial profiling(extra) DNA reads MetaPhlAn ~5 hours Shotgun sequences - DNA txt

Possible Extra analyses:

  • Abundance of different organisms/bins.
  • Refine taxonomic ID of assembled genomes.
  • Metabolic pathway reconstructions for chosen bins
  • Analysis of expression data of chosen gene groups (i.e: respiratory genes, genes involved in carbohydrate metabolism, etc).
  • Comparisons across bins (pathways, expression certain genes groups, etc).
  • Comparative genomics of bins
  • Ortholog gene clustering of bins

Data Organization

The data is currently organized under the following main folders, Data, Scripts, Results & Slurm Outputs. The Data folder contains fastqc, metadata.csv, raw_data & RNA_trimmed_files(so far). The Codes folder contains all the scripts run for all the analyses run. The Results folder has the FASTQC results obtained. Finally, the Slurm Outputs contains all the slurm file outputs renamed according to the analyses run. The results from analyses were moved to & operated from nobackup folder due to space issues in allocated directory. The Results & Data folders were ignored from version updates due to file size limitations on Github. The data organisation can be found here and here

Weekly Task decomposition & Checklist

  • Week 1 - Project plan, Trimming(Trimmomatic) & Quality Analysis(FastQC, MultiQC)
  • Week 2 - Metagenome assembly(Megahit), Binning(Metabat)
  • Week 3 - Quality check of assembly and bins(CheckM)
  • Week 4 - Functional annotation(Prokka), Annotation(EggNOG mapper)
  • Week 5 - Mapping/Aligner(BWA), Differential expression(htseq),

Clone this wiki locally