Skip to content

Loss of END_TRUNC in GRCh38 vs GRCh37 due to change in GERP cutoff from +180 to -58 #105

@daniel-wells

Description

@daniel-wells

In the GRCh37 branch the GERP cutoff is +180 but in the GRCh38 version it was changed to -58:
681c507

This means that variants that are annotated as LC END_TRUNC in LOFTEE GRCh37 change to being HC LoF in GRCh38, even if they're very close to the end of the protein.

An example variant in GRCh37 that's at the very end of the protein gets annotated as LC END_TRUNC (seems sensible):

##fileformat=VCFv4.0
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
13      32972904        .       CTA     C       .       PASS    cpra=13-32972904-CTA-C
#input_variant_cpra     Consequence     IMPACT  SYMBOL  Gene    Feature BIOTYPE EXON    HGVSc   HGVSp   CANONICAL       SIFT    PolyPhen        DOMAINS LoF     LoF_filter      LoF_flags
13-32972904-CTA-C       frameshift_variant,stop_lost    HIGH    BRCA2   ENSG00000139618 ENST00000380152 protein_coding  -       ENST00000380152.3:c.10255_10256del      ENSP00000369497.3:p.Ter3419SerfsTer18   -       -       -       -       LC      END_TRUNC       -
13-32972904-CTA-C       frameshift_variant,stop_lost    HIGH    BRCA2   ENSG00000139618 ENST00000544455 protein_coding  -       ENST00000544455.1:c.10255_10256del      ENSP00000439902.1:p.Ter3419SerfsTer18   YES     -       -       -       LC      END_TRUNC       -

But the same variant in GRCh38 is annotated as HC LoF:

##fileformat=VCFv4.0
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
13      32398767        .       CTA     C       .       PASS    cpra=13-32398767-CTA
#input_variant_cpra     Consequence     IMPACT  SYMBOL  Gene    Feature BIOTYPE EXON    HGVSc   HGVSp   CANONICAL       SIFT    PolyPhen        DOMAINS LoF     LoF_filter      LoF_flags
13-32398767-CTA-C       frameshift_variant,stop_lost    HIGH    BRCA2   ENSG00000139618 ENST00000380152 protein_coding  -       ENST00000380152.8:c.10255_10256del      ENSP00000369497.3:p.Ter3419SerfsTer18   YES     -       -       -       HC      -       PHYLOCSF_WEAK
13-32398767-CTA-C       frameshift_variant,stop_lost    HIGH    BRCA2   ENSG00000139618 ENST00000544455 protein_coding  -       ENST00000544455.6:c.10255_10256del      ENSP00000439902.1:p.Ter3419SerfsTer18   -       -       -       -       HC      -       PHYLOCSF_WEAK

I can't find exactly how either of the two GERP cutoff values were derived. The GERP scores from the two versions of the files don't look that drastically different.
image
image

Is -58 really the correct value for gerp_end_trunc_cutoff in GRCh38? How was it derived?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions