"Sequence record does not appear to be DNA" when calculating ANI for vRhyme bins

"Sequence record does not appear to be DNA" when calculating ANI for vRhyme bins

Body:
Hello, my name is Min-uk Park in seoul national university (SNU)

I am currently working with viral genomes and used vRhyme for binning. As a result, the binned contigs were scaffolded together using a gap of 1500 'N's, which is the standard output format for vRhyme.

However, when I try to calculate ANI (Average Nucleotide Identity) using ex: vclust / fastANI], I encounter the following error:

[ Error: The sequence record 'vRhyme_bin_233' does not appear to be DNA.]

Here is an example of what my input FASTA sequence looks like. It consists of normal DNA bases separated by long stretches of 'N's:
>vRhyme_bin_233
AATGGCCCATGTCTGTCATTCGGATTTCCTCCGAAAACCCGGACCGGCTC...
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN... (1500 Ns)
...CGATTTTCTTTTACTTCCACACGAGAATCACCATATTCTCGCGATTTG...
I suspect the ANI tool is throwing this error because of the long strings of 'N's used for scaffolding.

My questions are:

How should I handle vRhyme bins for ANI calculation?

Is there a specific parameter in to ignore these 'N's? Or is it highly recommended to split the scaffolds back into individual contigs by breaking them at the 'N's before running the ANI analysis?

Any advice or recommended scripts to resolve this would be greatly appreciated!

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Sequence record does not appear to be DNA" when calculating ANI for vRhyme bins #44

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

"Sequence record does not appear to be DNA" when calculating ANI for vRhyme bins #44

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions