When the variant input contains no ALT alleles for any sample (every genotype is REF), or no variants at all, gvl.write fails in confusing ways rather than producing a clean empty/REF-only dataset or raising a clear error:
- All-REF genotypes: the genotype chunk writer skips writing
genotypes/variant_idxs.npy (it only writes when genotypes are non-empty), but Haps.from_path opens that file unconditionally → FileNotFoundError at read time.
- Zero-variant VCF / empty BED:
gvl.write raises polars ... NoDataError (empty CSV) when the derived region set is empty; and the upstream PGEN path (plink2 --make-pgen) exits with Error: No variants in --vcf file.
Expected: either write a valid REF-only/zero-region dataset that reopens cleanly, or raise a single clear ValueError explaining the empty input.
Found via property-based testing (Phase 2). Mitigated test-side by dropping all-REF records in prep (bcftools view --min-ac 1) and skipping fully-all-REF draws, but the underlying robustness gap remains.
When the variant input contains no ALT alleles for any sample (every genotype is REF), or no variants at all,
gvl.writefails in confusing ways rather than producing a clean empty/REF-only dataset or raising a clear error:genotypes/variant_idxs.npy(it only writes when genotypes are non-empty), butHaps.from_pathopens that file unconditionally →FileNotFoundErrorat read time.gvl.writeraisespolars ... NoDataError(empty CSV) when the derived region set is empty; and the upstream PGEN path (plink2--make-pgen) exits withError: No variants in --vcf file.Expected: either write a valid REF-only/zero-region dataset that reopens cleanly, or raise a single clear
ValueErrorexplaining the empty input.Found via property-based testing (Phase 2). Mitigated test-side by dropping all-REF records in prep (
bcftools view --min-ac 1) and skipping fully-all-REF draws, but the underlying robustness gap remains.