Skip to content

SnpSift gwasCat: duplicated entries in GWASCAT_* INFO fields #630

@leeyb9916

Description

@leeyb9916

Describe the issue
After running SnpSift gwasCat, the GWASCAT-related INFO fields contain the same values repeated multiple times within a single record.

For example, if the GWAS catalog DB has 4 entries associated with a given position, the GWASCAT fields end up containing 4 * 4 = 16 entries — i.e. the values are repeated by a factor of N.



For example, given the following two entries in the GWAS catalog DB for the same position:

PUBMEDID CHR_ID CHR_POS MAPPED_TRAIT
31562340 2 219055182 body height
35831902 2 219055182 body height

the SnpSift gwasCat output VCF reports the trait 2 * 2 = 4 times:

CHROM POS GWASCAT_TRAIT
2 219055182 Height,Height,Height,Height


To Reproduce

  1. SnpEff version: 4.2 and 5.3a
  2. Genome version: hg38
  3. SnpEff full command line: java -jar Snpsift.jar gwasCat -v -db ${gwas_database} ${input_vcf} | bcftools view -Oz > ${output_vcf}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions