Skip to content

"Sequence record does not appear to be DNA" when calculating ANI for vRhyme bins #44

@minukpark1228

Description

@minukpark1228

"Sequence record does not appear to be DNA" when calculating ANI for vRhyme bins

Body:
Hello, my name is Min-uk Park in seoul national university (SNU)

I am currently working with viral genomes and used vRhyme for binning. As a result, the binned contigs were scaffolded together using a gap of 1500 'N's, which is the standard output format for vRhyme.

However, when I try to calculate ANI (Average Nucleotide Identity) using ex: vclust / fastANI], I encounter the following error:

[ Error: The sequence record 'vRhyme_bin_233' does not appear to be DNA.]

Here is an example of what my input FASTA sequence looks like. It consists of normal DNA bases separated by long stretches of 'N's:

vRhyme_bin_233
AATGGCCCATGTCTGTCATTCGGATTTCCTCCGAAAACCCGGACCGGCTC...
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN... (1500 Ns)
...CGATTTTCTTTTACTTCCACACGAGAATCACCATATTCTCGCGATTTG...
I suspect the ANI tool is throwing this error because of the long strings of 'N's used for scaffolding.

My questions are:

How should I handle vRhyme bins for ANI calculation?

Is there a specific parameter in to ignore these 'N's? Or is it highly recommended to split the scaffolds back into individual contigs by breaking them at the 'N's before running the ANI analysis?

Any advice or recommended scripts to resolve this would be greatly appreciated!

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions