The gff-version listed by helixer is "3.2.1" I haven't been able to find a specification for that version. The only particularly well detailed specification of gff3 that I've ever found is from Sequence Ontology for "3.1.26"...
https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
...According to this standard, CDS lines that are meant to be part of the same sequence should have the same ID. This marks those lines as a multi-line feature. Helixer doesn't do this. Helixer marks each CDS segment with its own ID.
Is there an actual specification for 3.2.1 somewhere? If so, does it disagree with 3.1.26 in terms of how CDS are supposed to be represented?
EDIT: For reference, current GFF from helixer looks like...
1 Helixer gene 241717 242876 . + . ID=Zmays.B73.HPIv02_1_000003
1 Helixer mRNA 241717 242876 . + . ID=Zmays.B73.HPIv02_1_000003.1;Parent=Zmays.B73.HPIv02_1_000003
1 Helixer exon 241717 241720 . + . ID=Zmays.B73.HPIv02_1_000003.1.exon.1;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer five_prime_UTR 241717 241717 . + . ID=Zmays.B73.HPIv02_1_000003.1.five_prime_UTR.1;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer CDS 241718 241720 . + 0 ID=Zmays.B73.HPIv02_1_000003.1.CDS.1;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer exon 241835 242876 . + . ID=Zmays.B73.HPIv02_1_000003.1.exon.2;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer CDS 241835 242875 . + 0 ID=Zmays.B73.HPIv02_1_000003.1.CDS.2;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer three_prime_UTR 242876 242876 . + . ID=Zmays.B73.HPIv02_1_000003.1.three_prime_UTR.1;Parent=Zmays.B73.HPIv02_1_000003.1
...According to the sequence ontology standard, it should be...
1 Helixer gene 241717 242876 . + . ID=Zmays.B73.HPIv02_1_000003
1 Helixer mRNA 241717 242876 . + . ID=Zmays.B73.HPIv02_1_000003.1;Parent=Zmays.B73.HPIv02_1_000003
1 Helixer exon 241717 241720 . + . ID=Zmays.B73.HPIv02_1_000003.1.exon.1;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer five_prime_UTR 241717 241717 . + . ID=Zmays.B73.HPIv02_1_000003.1.five_prime_UTR.1;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer CDS 241718 241720 . + 0 ID=Zmays.B73.HPIv02_1_000003.1.CDS;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer exon 241835 242876 . + . ID=Zmays.B73.HPIv02_1_000003.1.exon.2;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer CDS 241835 242875 . + 0 ID=Zmays.B73.HPIv02_1_000003.1.CDS;Parent=Zmays.B73.HPIv02_1_000003.1
1 Helixer three_prime_UTR 242876 242876 . + . ID=Zmays.B73.HPIv02_1_000003.1.three_prime_UTR.1;Parent=Zmays.B73.HPIv02_1_000003.1
I know the difference seems small, but its the difference between a "correct" parsing of the file producing two distinct proteins and producing a single (correct) protein.
The gff-version listed by helixer is "3.2.1" I haven't been able to find a specification for that version. The only particularly well detailed specification of gff3 that I've ever found is from Sequence Ontology for "3.1.26"...
https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
...According to this standard, CDS lines that are meant to be part of the same sequence should have the same ID. This marks those lines as a multi-line feature. Helixer doesn't do this. Helixer marks each CDS segment with its own ID.
Is there an actual specification for 3.2.1 somewhere? If so, does it disagree with 3.1.26 in terms of how CDS are supposed to be represented?
EDIT: For reference, current GFF from helixer looks like...
...According to the sequence ontology standard, it should be...
I know the difference seems small, but its the difference between a "correct" parsing of the file producing two distinct proteins and producing a single (correct) protein.