Hi! I noticed that atarva accepts a 0-based bed regions file as input and also produces a 0-based VCF as output. VCF format should be 1-based (see below). This creates off-by-1 errors in our downstream analyses.
A nice summary of this common problem:
https://www.biostars.org/p/84686/
The VCF spec:
POS- position: The reference position, with the 1st base having position 1. Positions are sorted numerically, in increasing order, within each reference sequence CHROM. It is permitted to have multiple records with the same POS. Telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig. (Integer, Required)
May I suggest that you produce 1-based VCFs?
Warm regards,
Harriet
Hi! I noticed that atarva accepts a 0-based bed regions file as input and also produces a 0-based VCF as output. VCF format should be 1-based (see below). This creates off-by-1 errors in our downstream analyses.
A nice summary of this common problem:
https://www.biostars.org/p/84686/
The VCF spec:
May I suggest that you produce 1-based VCFs?
Warm regards,
Harriet