Skip to content

MAPQ0 #82

@ljohansson

Description

@ljohansson

Dear @readmanchiu,

I was looking at a repeat in NOTCH2NLC and noticed that for shorter repeats the MAPQ is 0, while for longer repeats the MAPQ is 60.

chr1 149390802 149390841 CGG NOTCH2NLC NOTCH2NLC

For one sample, with two alleles with a length of 10 RU. However, sequences between alleles are not exactly the same: one allele (GGC)7(GGA)2(GGC)1 and the other (GGC)6(GGA)3(GGC)1.

The vcf output resulted in a 'homozygous' call:
chr1 149390802 . A <CNV:TR> . PASS END=149390841;LOCUS=NOTCH2NLC;RUS_REF=GGC;SVLEN=39;RN=1;RUS=GGC;RUC=10.3;CIRUC=-3.6,0.0 GT:DP:AS 1:12:12

All the reads on this location for this sample were labeled as secondary read and had MAPQ0, even though some of the reads were over 20 kb long.
Two other genes NOTCH2NLA and NOTCH2NLB had the same read mapped to their location, labeled as secondary alignment. The primary alignment was present at NOTCH2.

I did not notice that the reads were not mapped uniquely until I looked in the BAM. Am I right in saying that there is no such information in tsv or vcf files? And could it be that the repeat is actually part of one of the paralog genes?

Is there a strategy, e.g. through comparing surrounding sequences for small differences, to assign reads to the gene of interest or one of the paralog genes?

Thank you.

Regards,
Lennart

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions