gvl.write documents that variant input must be atomized, but it performs no check. Non-atomic input (e.g. a multi-nucleotide REF/ALT such as a 2-bp MNP, or non-atomized indels) silently corrupts haplotype-length arithmetic via the hardcoded +1 REF/ALT-overlap assumption at:
python/genvarloader/_dataset/_genotypes.py:69
python/genvarloader/_dataset/_genotypes.py:313
python/genvarloader/_dataset/_tracks.py:297
It should raise a clear ValueError (mirroring the existing multi-allelic guard at _dataset/_write.py:389) instructing the user to atomize (bcftools norm -a).
Found via property-based testing (Phase 2 test overhaul): inputs are currently canonicalized with bcftools norm -a --atom-overlaps before gvl.write to sidestep this. A clean-rejection test (tests/integration/dataset/test_haps_property.py) is marked xfail pending this validation.
gvl.writedocuments that variant input must be atomized, but it performs no check. Non-atomic input (e.g. a multi-nucleotide REF/ALT such as a 2-bp MNP, or non-atomized indels) silently corrupts haplotype-length arithmetic via the hardcoded+1REF/ALT-overlap assumption at:python/genvarloader/_dataset/_genotypes.py:69python/genvarloader/_dataset/_genotypes.py:313python/genvarloader/_dataset/_tracks.py:297It should raise a clear
ValueError(mirroring the existing multi-allelic guard at_dataset/_write.py:389) instructing the user to atomize (bcftools norm -a).Found via property-based testing (Phase 2 test overhaul): inputs are currently canonicalized with
bcftools norm -a --atom-overlapsbeforegvl.writeto sidestep this. A clean-rejection test (tests/integration/dataset/test_haps_property.py) is markedxfailpending this validation.