-
Notifications
You must be signed in to change notification settings - Fork 0
Project Plan
The project in hand is an attempt to reproduce the metagenomic analysis performed by Thrash et al (2017): Metabolic Roles of Uncultivated Bacterioplankton Lineages in the Northern Gulf of Mexico “Dead Zone”
Marine regions that have seasonal to long-term low dissolved oxygen(DO) concentrations, sometimes called “dead zones,” are increasing in number and severity around the globe with deleterious effects on ecology and economics. This study focuses on the "Dead zone" in the continental shelf of the northern Gulf of Mexico (nGOM). Cosmopolitan bacterioplankton lineages that have eluded cultivation whose metabolic roles, which ended up in the sea, is unknown. This study hence analyses the metagenomic & metatranscriptomic profile of the dead vs live zones. The paper looked at 6 sites 6 samples with a range of oxic profiles (D1, D2, D3, E2, E2A, E4), whereas this project will only investigate samples from sites D1 and D3. These samples are take from different oxic concentration locations. D1 with high oxic concentration than that of D3.
- Metagenome assembly.
- Binning.
- Quality check of assembly and bins
- Functional annotation
- Basic phylogenetic placement of bins
- Reads preprocessing: trimming + quality check
- Analysis of activity of different bins.
| Week | Step | Analysis | Type | Software | Running Time & computation requirements | Data in | Data out |
|---|---|---|---|---|---|---|---|
| 1 | 1 | Reads quality Control | Illumina | FastQC | ~15 minutes | fastq | HTML report |
| 1 | 2 | Trimming | Illumina | Trimmomatic | ~15 minutes | fastq | fastq |
| 1 | 3 | Reads quality Analyses | Trimmed reads | MultiQC | ~15 minutes | fastq | HTML report |
| 2 | 4 | Metagenome assembly | Metagenomics | Megahit | ~6 hours (2 cores) | fastq | fastq |
| 2 | 5 | Binning | Metagenomics | Metabat | <30 min (2 cores) | fasta | fasta |
| 3 | 6 | Quality check of assembly and bins | Fasta sequence analysis | CheckM,Quast | 2 hours(2 cores) | fasta | report files |
| 3 | 7 | Mapping/Aligner | (DNA &) Bacterial RNA | BWA | ~ 4-6 hours (2 cores) | fasta/fastq | Alignment in SAM format |
| 4 | 8 | Functional annotation | Prokaryotic Genome | Prokka | ~1 hour(2 cores) | fasta | Annotations in GFF3 format + other standards-compliant output files |
| 5 | 9 | Annotation | Prokaryotic Genome | EggNOG mapper | ~ 1 hour | gzip | Multiple file types |
| 6 | 10 | Basic phylogenetic placement to bins | Phylogenetic analysis | PhyloPhlan | ~6 hours(2 cores) | genome file + multifasta (.faa) | Newick tree file + PhyloXML file |
| 6 | 11 | Differential expression | Annotation & BWA data | HTseq | ~8 hours | EggNOG & BWA output | Gene Counts |
| 6 | 12 | Microbial profiling(extra) | DNA reads | MetaPhlAn | ~5 hours | Shotgun sequences - DNA | txt |
- Abundance of different organisms/bins.
- Refine taxonomic ID of assembled genomes.
- Metabolic pathway reconstructions for chosen bins
- Analysis of expression data of chosen gene groups (i.e: respiratory genes, genes involved in carbohydrate metabolism, etc).
- Comparisons across bins (pathways, expression certain genes groups, etc).
- Comparative genomics of bins
- Ortholog gene clustering of bins
The data is currently organized under the following main folders, Data, Scripts, Results & Slurm Outputs. The Data folder contains fastqc, metadata.csv, raw_data & RNA_trimmed_files(so far). The Codes folder contains all the scripts run for all the analyses run. The Results folder has the FASTQC results obtained. Finally, the Slurm Outputs contains all the slurm file outputs renamed according to the analyses run. The results from analyses were moved to & operated from nobackup folder due to space issues in allocated directory. The Results & Data folders were ignored from version updates due to file size limitations on Github. The data organisation can be found here and here
- Week 1 - Project plan, Trimming(Trimmomatic) & Quality Analysis(FastQC, MultiQC)
- Week 2 - Metagenome assembly(Megahit), Binning(Metabat)
- Week 3 - Quality check of assembly and bins(CheckM)
- Week 4 - Functional annotation(Prokka), Annotation(EggNOG mapper)
- Week 5 - Mapping/Aligner(BWA), Differential expression(htseq),