Hi!
I'm just wondering if there is a manual, documentation, or a rule of thumb that can help us in setting the configurations when using PECAT. Recently, we used PECAT to correct ONT reads of a bacteria with an estimated 5.5mb genome size. I don't know whether my configuration is correct but I attached my correction configuration settings here.
`
project= smarcescens
reads= smarcescens_simplex.filtered.fastq
genome_size= 5500000
threads=4
cleanup=1
grid=local
prep_min_length=3000
prep_output_coverage=80
corr_iterate_number=1
corr_block_size=4000000000
corr_filter_options=--filter0=l=5000:al=2500:alr=0.5:aal=5000:oh=1000:ohr=0.1
corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01
corr_rd2rd_options=-x ava-ont
corr_output_coverage=80
`
And I ended up from 16k reads to 12k reads (with N50 from 12.5kb to 12.9kb).
When I assembled (w/o polishing) the PECAT-corrected reads using a different assembler like NECAT (just to avoid assembler bias), I ended up with the following statistics:
Contigs: 25
Assembly size: 5.6mb
minimum length: 18kb
max length: 1.3mb
N50: 550kb
This statistics seem a bit far from a Canu-corrected reads as follows:
Contigs: 4
Assembly size: 5.7mb
minimum length: 17kb
max length: 5.5mb
N50: 5.5mb
So here, I noticed that when I assemble the PECAT-corrected reads, the assembly is highly fragmented as compared to Canu-corrected reads. Although I am quite aware that the statistics above doesn't entirely reflect the quality of the assembly; still, I feel like the PECAT-corrected reads weren't as "contiguous" as the Canu-corrected reads. That's why I'm wondering maybe I'm not setting the configuration file correctly.
Here's my Canu command:
user/tools/canu-2.2/bin/canu -correct \ -p smarcescens_canu_corrected \ -d canu_correction_output \ genomeSize=5.5m \ correctedErrorRate=0.15 \ useGrid=false \ minReadLength=1000 \ corThreads=4 \ -nanopore-raw smarcescens_simplex.filtered.fastq 2>&1
And here's my NECAT command to assemble both reads from CANU and PECAT:
PROJECT=necat_assembly ONT_READ_LIST= GENOME_SIZE=5500000 THREADS=4 MIN_READ_LENGTH=3000 PREP_OUTPUT_COVERAGE=40 OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000 OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000 CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400 ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400 NUM_ITER=2 CNS_OUTPUT_COVERAGE=30 CLEANUP=1
I'd really appreciate it if you can give us ideas on how to set the parameters in PECAT.
Thanks!
Hi!
I'm just wondering if there is a manual, documentation, or a rule of thumb that can help us in setting the configurations when using PECAT. Recently, we used PECAT to correct ONT reads of a bacteria with an estimated 5.5mb genome size. I don't know whether my configuration is correct but I attached my correction configuration settings here.
`
project= smarcescens
reads= smarcescens_simplex.filtered.fastq
genome_size= 5500000
threads=4
cleanup=1
grid=local
prep_min_length=3000
prep_output_coverage=80
corr_iterate_number=1
corr_block_size=4000000000
corr_filter_options=--filter0=l=5000:al=2500:alr=0.5:aal=5000:oh=1000:ohr=0.1
corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01
corr_rd2rd_options=-x ava-ont
corr_output_coverage=80
`
And I ended up from 16k reads to 12k reads (with N50 from 12.5kb to 12.9kb).
When I assembled (w/o polishing) the PECAT-corrected reads using a different assembler like NECAT (just to avoid assembler bias), I ended up with the following statistics:
Contigs: 25
Assembly size: 5.6mb
minimum length: 18kb
max length: 1.3mb
N50: 550kb
This statistics seem a bit far from a Canu-corrected reads as follows:
Contigs: 4
Assembly size: 5.7mb
minimum length: 17kb
max length: 5.5mb
N50: 5.5mb
So here, I noticed that when I assemble the PECAT-corrected reads, the assembly is highly fragmented as compared to Canu-corrected reads. Although I am quite aware that the statistics above doesn't entirely reflect the quality of the assembly; still, I feel like the PECAT-corrected reads weren't as "contiguous" as the Canu-corrected reads. That's why I'm wondering maybe I'm not setting the configuration file correctly.
Here's my Canu command:
user/tools/canu-2.2/bin/canu -correct \ -p smarcescens_canu_corrected \ -d canu_correction_output \ genomeSize=5.5m \ correctedErrorRate=0.15 \ useGrid=false \ minReadLength=1000 \ corThreads=4 \ -nanopore-raw smarcescens_simplex.filtered.fastq 2>&1And here's my NECAT command to assemble both reads from CANU and PECAT:
PROJECT=necat_assembly ONT_READ_LIST= GENOME_SIZE=5500000 THREADS=4 MIN_READ_LENGTH=3000 PREP_OUTPUT_COVERAGE=40 OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000 OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000 CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400 ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400 NUM_ITER=2 CNS_OUTPUT_COVERAGE=30 CLEANUP=1I'd really appreciate it if you can give us ideas on how to set the parameters in PECAT.
Thanks!