-
Notifications
You must be signed in to change notification settings - Fork 0
06. Reads preprocessing
Trimming was performed using the shell script TRIMMOMATIC_SLURM.sh, which can be found in the CODE folder. In this, the settings were LEADING:36 and TRAILING:36. This threshold was determined by viewing the results from the FASTQC analysis performed before TRIMMOMATIC, by trying to find a balance between high quality and maintaining a high sequence depth. When this analysis step was finished, FASTQC was performed again, using the shell script FASTQC_POST_SLURM.sh, for which the results are displayed below.
How many reads have been discarded after trimming?
Before trimming the data contained 1666667 reads, and after trimming we have 1664608 reads, meaning that 2059 reads were discarded.
How can this affect your future analyses and results?
A constant trade-off when working with sequence assembly is that of sequence quality and read depth. It is often said "garbage in, garbage out", meaning here that the poor quality sequences are likely to be mapped incorrectly, which in turn would signficantly lower the quality of all following analyses. However, when removing reads for the library, we also increase the probability of missing gene expression. This in turn can affect the following analyses, where the expression count can differ from that in reality. One must take both these factors into consideration when working with read QC.
How is the quality of your data after trimming?
We can see that the over all quality of both reads has increased after trimming, as both the forward and reverse reads are completely within the green regions, indicating a high phred score.
What do the LEADING, TRAILING and SLIDINGWINDOW options do?
LEADING sets a threshold for minimum quality for the first base in a read, where a read that performs below this threshold is discarded. For example, as this was set to 36 for this analysis, and starting bases that remain after trimmomatic have a phred score of at least 36. TRAILING sets a similar threshold, but this instead applies to the final base in a read. SLIDINGWINDOW averages the base quality within the window, the size of which is specified when utilizing this tool, and discards the reads which average quality within this window subceeds the pre-defined quality threshold.