Skip to content

Configuration file rule-of-thumb #26

@jp-jong

Description

@jp-jong

Hi!

I'm just wondering if there is a manual, documentation, or a rule of thumb that can help us in setting the configurations when using PECAT. Recently, we used PECAT to correct ONT reads of a bacteria with an estimated 5.5mb genome size. I don't know whether my configuration is correct but I attached my correction configuration settings here.

`
project= smarcescens
reads= smarcescens_simplex.filtered.fastq
genome_size= 5500000
threads=4
cleanup=1
grid=local

prep_min_length=3000
prep_output_coverage=80

corr_iterate_number=1
corr_block_size=4000000000
corr_filter_options=--filter0=l=5000:al=2500:alr=0.5:aal=5000:oh=1000:ohr=0.1
corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01
corr_rd2rd_options=-x ava-ont
corr_output_coverage=80
`

And I ended up from 16k reads to 12k reads (with N50 from 12.5kb to 12.9kb).

When I assembled (w/o polishing) the PECAT-corrected reads using a different assembler like NECAT (just to avoid assembler bias), I ended up with the following statistics:
Contigs: 25
Assembly size: 5.6mb
minimum length: 18kb
max length: 1.3mb
N50: 550kb

This statistics seem a bit far from a Canu-corrected reads as follows:
Contigs: 4
Assembly size: 5.7mb
minimum length: 17kb
max length: 5.5mb
N50: 5.5mb

So here, I noticed that when I assemble the PECAT-corrected reads, the assembly is highly fragmented as compared to Canu-corrected reads. Although I am quite aware that the statistics above doesn't entirely reflect the quality of the assembly; still, I feel like the PECAT-corrected reads weren't as "contiguous" as the Canu-corrected reads. That's why I'm wondering maybe I'm not setting the configuration file correctly.

Here's my Canu command:
user/tools/canu-2.2/bin/canu -correct \ -p smarcescens_canu_corrected \ -d canu_correction_output \ genomeSize=5.5m \ correctedErrorRate=0.15 \ useGrid=false \ minReadLength=1000 \ corThreads=4 \ -nanopore-raw smarcescens_simplex.filtered.fastq 2>&1

And here's my NECAT command to assemble both reads from CANU and PECAT:
PROJECT=necat_assembly ONT_READ_LIST= GENOME_SIZE=5500000 THREADS=4 MIN_READ_LENGTH=3000 PREP_OUTPUT_COVERAGE=40 OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000 OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000 CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0 TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400 ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400 NUM_ITER=2 CNS_OUTPUT_COVERAGE=30 CLEANUP=1

I'd really appreciate it if you can give us ideas on how to set the parameters in PECAT.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions