Skip to content

gvl.write crashes on all-REF / zero-variant input instead of handling it cleanly #201

@d-laub

Description

@d-laub

When the variant input contains no ALT alleles for any sample (every genotype is REF), or no variants at all, gvl.write fails in confusing ways rather than producing a clean empty/REF-only dataset or raising a clear error:

  1. All-REF genotypes: the genotype chunk writer skips writing genotypes/variant_idxs.npy (it only writes when genotypes are non-empty), but Haps.from_path opens that file unconditionally → FileNotFoundError at read time.
  2. Zero-variant VCF / empty BED: gvl.write raises polars ... NoDataError (empty CSV) when the derived region set is empty; and the upstream PGEN path (plink2 --make-pgen) exits with Error: No variants in --vcf file.

Expected: either write a valid REF-only/zero-region dataset that reopens cleanly, or raise a single clear ValueError explaining the empty input.

Found via property-based testing (Phase 2). Mitigated test-side by dropping all-REF records in prep (bcftools view --min-ac 1) and skipping fully-all-REF draws, but the underlying robustness gap remains.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions