Skip to content

feat(cli): replace --skip-symbolic-alts with composable --no-symbolic / --no-breakend#53

Merged
d-laub merged 9 commits into
mainfrom
feat/breakend-alts
Jun 5, 2026
Merged

feat(cli): replace --skip-symbolic-alts with composable --no-symbolic / --no-breakend#53
d-laub merged 9 commits into
mainfrom
feat/breakend-alts

Conversation

@d-laub
Copy link
Copy Markdown
Owner

@d-laub d-laub commented Jun 5, 2026

Summary

  • Replace the single genoray write --skip-symbolic-alts flag with two independent, composable flags --no-symbolic and --no-breakend, so users can drop un-injectable ALT classes (symbolic alleles like <DEL>, breakends like G[chr1:500[) independently or together.
  • Add private record-level predicates _record_is_symbolic / _record_is_breakend to genoray/exprs.py that mirror the existing is_symbolic / is_breakend polars exprs (sharing _BND_PATTERN), keeping the cyvcf2 record callable and the polars index filter in parity.
  • CLI composes the requested flags into a paired (record callable, polars expr) for the VCF path (both required, or neither) and a single polars expr for PGEN.

Test Plan

  • pixi run test — 347 passed, 16 xfailed
  • ruff check + ruff format --check clean
  • genoray write --help shows --no-symbolic / --no-breakend, no --skip-symbolic-alts or --no-no-*
  • New tests cover no-flags / --no-breakend / combined --no-symbolic --no-breakend across VCF, plus --no-breakend PGEN

🤖 Generated with Claude Code

d-laub and others added 9 commits June 5, 2026 00:40
… ILEN)

Breakend (BND) ALTs in mate-pair / single-breakend notation (e.g.
`G[chr2:321[`, `.TGCA`) are a distinct VCF 4.x ALT class from symbolic
`<...>` alleles, so `is_symbolic` did not flag them. But like symbolic
alleles they are not expandable into nucleotides — the bracket/colon/
position bytes corrupt personalized DNA buffers in haplotype consumers.

Previously a breakend slipped past every guard: `symbolic_ilen` computed
the literal `len(ALT) - len(REF)`, so it masqueraded as a large insertion,
evaded `~is_imprecise`, and reached consumers like genvarloader.

- Add public `is_breakend` expression (VCF §5.4 grammar).
- `symbolic_ilen` now returns null for breakend ALTs, so they are also
  `is_imprecise` and excluded from `is_snp`/`is_indel`. Flows through both
  the VCF and PGEN index-build paths.
- Document the haplotype-safe filter `~is_symbolic & ~is_breakend` in
  SKILL.md (required for public-API changes).
- Add a breakend record to the `symbolic` test fixture and extend the
  oracle/persisted-index tests to cover the path on generated data.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… predicates

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…reakend

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The setter previously mutated only _filter, allowing a mismatched
(filter, pl_filter) pair that the constructor forbids. It now accepts a
(filter, pl_filter) tuple (or None to clear both), validates the
both-or-neither invariant via a shared _check_filter_pair helper, and
invalidates the in-memory index. Getter still returns the callable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-laub d-laub merged commit de5df4c into main Jun 5, 2026
7 checks passed
@d-laub d-laub deleted the feat/breakend-alts branch June 5, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant