Skip to content

rsem-extract-reference-transcripts fails with "Error Message: Strand is neither '+' nor '-'!" #220

@J-Moravec

Description

@J-Moravec

Working on Rice RNAseq using the https://nf-co.re/rnaseq pipeline that runs RSEM internally.

Here, RSEM fails on:

rsem-extract-reference-transcripts rsem/genome 0 GCF_034140825.1.filtered.gtf None 0 rsem/GCF_034140825.1.fna
The GTF file might be corrupted!
Stop at line : NC_011033.1    RefSeq  transcript  11024   315294  .   ?   .   gene_id "OrsajM_p01"; transcript_id "unassigned_transcript_653"; db_xref "GeneID:6450162"; exception "trans-splicing, RNA editing"; gbkey "mRNA"; gene "n     ad1"; locus_tag "OrsajM_p01"; transcript_biotype "mRNA";

The specification that I could find on GTF2.2 does not mention ? being allowed in strandedness, so I understand these specification based checks.

The reason for ? is that something weird splicing is happening in the mRNA, and this is above my current knowledge, but looks like even the stop codon and start codon have different strand. The whole transcript is thus a patchwork of sequences from positive and negative strands and thus cannot be uniquely assigned strandedness.

See here: https://www.ncbi.nlm.nih.gov/nuccore/NC_011033.1/ with weird complement(...) happening there for about 4 different genes:

image

And here is view of the feature in a GTF file (first 8 columns):

NC_011033.1	RefSeq	gene	11024	11409	.	+	.
NC_011033.1	RefSeq	gene	239890	315294	.	+	.
NC_011033.1	RefSeq	transcript	11024	315294	.	?	.
NC_011033.1	RefSeq	exon	11024	11409	.	+	.
NC_011033.1	RefSeq	exon	241499	241580	.	-	.
NC_011033.1	RefSeq	exon	239890	240081	.	-	.
NC_011033.1	RefSeq	exon	251354	251412	.	-	.
NC_011033.1	RefSeq	exon	315036	315294	.	-	.
NC_011033.1	RefSeq	CDS	11024	11409	.	+	0
NC_011033.1	RefSeq	CDS	241499	241580	.	-	1
NC_011033.1	RefSeq	CDS	239890	240081	.	-	0
NC_011033.1	RefSeq	CDS	251354	251412	.	-	0
NC_011033.1	RefSeq	CDS	315036	315291	.	-	1
NC_011033.1	RefSeq	start_codon	11024	11026	.	+	0
NC_011033.1	RefSeq	stop_codon	315036	315038	.	-	0

Since this is not an obscure organism, but Rice (and I hoped that when working with model organism for once, everything would be fine), should RSEM be able to handle this issue?

Thanks,
-- Jirka

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions