Configuration file rule-of-thumb

Hi! 

I'm just wondering if there is a manual, documentation, or a rule of thumb that can help us in setting the configurations when using PECAT. Recently, we used PECAT to correct ONT reads of a bacteria with an estimated 5.5mb genome size. I don't know whether my configuration is correct but I attached my correction configuration settings here. 

`
project= smarcescens
reads= smarcescens_simplex.filtered.fastq
genome_size= 5500000
threads=4
cleanup=1
grid=local

prep_min_length=3000
prep_output_coverage=80

corr_iterate_number=1
corr_block_size=4000000000
corr_filter_options=--filter0=l=5000:al=2500:alr=0.5:aal=5000:oh=1000:ohr=0.1
corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01
corr_rd2rd_options=-x ava-ont
corr_output_coverage=80
`

And I ended up from 16k reads to 12k reads (with N50 from 12.5kb to 12.9kb).

When I assembled (w/o polishing) the PECAT-corrected reads using a different assembler like NECAT (just to avoid assembler bias), I ended up with the following statistics:
Contigs: 25 
Assembly size: 5.6mb
minimum length: 18kb 
max length: 1.3mb
N50: 550kb

This statistics seem a bit far from a Canu-corrected reads as follows:
Contigs: 4 
Assembly size: 5.7mb
minimum length: 17kb 
max length: 5.5mb
N50: 5.5mb

So here, I noticed that when I assemble the PECAT-corrected reads, the assembly is highly fragmented as compared to Canu-corrected reads. Although I am quite aware that the statistics above doesn't entirely reflect the quality of the assembly; still, I feel like the PECAT-corrected reads weren't as "contiguous" as the Canu-corrected reads. That's why I'm wondering maybe I'm not setting the configuration file correctly. 

Here's my Canu command:
`
user/tools/canu-2.2/bin/canu -correct \
-p smarcescens_canu_corrected \
-d canu_correction_output \
genomeSize=5.5m \
correctedErrorRate=0.15 \
useGrid=false \
minReadLength=1000 \
corThreads=4 \
-nanopore-raw smarcescens_simplex.filtered.fastq 2>&1 
`

And here's my NECAT command to assemble both reads from CANU and PECAT:
`
PROJECT=necat_assembly
ONT_READ_LIST=
GENOME_SIZE=5500000
THREADS=4
MIN_READ_LENGTH=3000
PREP_OUTPUT_COVERAGE=40
OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000
OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000
CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400
ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400
NUM_ITER=2
CNS_OUTPUT_COVERAGE=30
CLEANUP=1
`

I'd really appreciate it if you can give us ideas on how to set the parameters in PECAT. 

Thanks! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration file rule-of-thumb #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Configuration file rule-of-thumb #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions