Describe the issue
After running SnpSift gwasCat, the GWASCAT-related INFO fields contain the same values repeated multiple times within a single record.
For example, if the GWAS catalog DB has 4 entries associated with a given position, the GWASCAT fields end up containing 4 * 4 = 16 entries — i.e. the values are repeated by a factor of N.
For example, given the following two entries in the GWAS catalog DB for the same position:
| PUBMEDID |
CHR_ID |
CHR_POS |
MAPPED_TRAIT |
| 31562340 |
2 |
219055182 |
body height |
| 35831902 |
2 |
219055182 |
body height |
the SnpSift gwasCat output VCF reports the trait 2 * 2 = 4 times:
| CHROM |
POS |
GWASCAT_TRAIT |
| 2 |
219055182 |
Height,Height,Height,Height |
To Reproduce
- SnpEff version: 4.2 and 5.3a
- Genome version: hg38
- SnpEff full command line: java -jar Snpsift.jar gwasCat -v -db ${gwas_database} ${input_vcf} | bcftools view -Oz > ${output_vcf}
Describe the issue
After running
SnpSift gwasCat, the GWASCAT-related INFO fields contain the same values repeated multiple times within a single record.For example, if the GWAS catalog DB has 4 entries associated with a given position, the GWASCAT fields end up containing 4 * 4 = 16 entries — i.e. the values are repeated by a factor of N.
For example, given the following two entries in the GWAS catalog DB for the same position:
the
SnpSift gwasCatoutput VCF reports the trait 2 * 2 = 4 times:To Reproduce