This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Always prefix commands with rtk. If RTK has a dedicated filter, it uses it. If not, it passes through unchanged. This means RTK is always safe to use.
Important: Even in command chains with &&, use rtk:
# ❌ Wrong
git add . && git commit -m "msg" && git push
# ✅ Correct
rtk git add . && rtk git commit -m "msg" && rtk git pushrtk cargo build # Cargo build output
rtk cargo check # Cargo check output
rtk cargo clippy # Clippy warnings grouped by file (80%)
rtk tsc # TypeScript errors grouped by file/code (83%)
rtk lint # ESLint/Biome violations grouped (84%)
rtk prettier --check # Files needing format only (70%)
rtk next build # Next.js build with route metrics (87%)rtk cargo test # Cargo test failures only (90%)
rtk vitest run # Vitest failures only (99.5%)
rtk playwright test # Playwright failures only (94%)
rtk test <cmd> # Generic test wrapper - failures onlyrtk git status # Compact status
rtk git log # Compact log (works with all git flags)
rtk git diff # Compact diff (80%)
rtk git show # Compact show (80%)
rtk git add # Ultra-compact confirmations (59%)
rtk git commit # Ultra-compact confirmations (59%)
rtk git push # Ultra-compact confirmations
rtk git pull # Ultra-compact confirmations
rtk git branch # Compact branch list
rtk git fetch # Compact fetch
rtk git stash # Compact stash
rtk git worktree # Compact worktreeNote: Git passthrough works for ALL subcommands, even those not explicitly listed.
rtk gh pr view <num> # Compact PR view (87%)
rtk gh pr checks # Compact PR checks (79%)
rtk gh run list # Compact workflow runs (82%)
rtk gh issue list # Compact issue list (80%)
rtk gh api # Compact API responses (26%)rtk pnpm list # Compact dependency tree (70%)
rtk pnpm outdated # Compact outdated packages (80%)
rtk pnpm install # Compact install output (90%)
rtk npm run <script> # Compact npm script output
rtk npx <cmd> # Compact npx command output
rtk prisma # Prisma without ASCII art (88%)rtk ls <path> # Tree format, compact (65%)
rtk read <file> # Code reading with filtering (60%)
rtk grep <pattern> # Search grouped by file (75%)
rtk find <pattern> # Find grouped by directory (70%)rtk err <cmd> # Filter errors only from any command
rtk log <file> # Deduplicated logs with counts
rtk json <file> # JSON structure without values
rtk deps # Dependency overview
rtk env # Environment variables compact
rtk summary <cmd> # Smart summary of command output
rtk diff # Ultra-compact diffsrtk docker ps # Compact container list
rtk docker images # Compact image list
rtk docker logs <c> # Deduplicated logs
rtk kubectl get # Compact resource list
rtk kubectl logs # Deduplicated pod logsrtk curl <url> # Compact HTTP responses (70%)
rtk wget <url> # Compact download output (65%)rtk gain # View token savings statistics
rtk gain --history # View command history with savings
rtk discover # Analyze Claude Code sessions for missed RTK usage
rtk proxy <cmd> # Run command without filtering (for debugging)
rtk init # Add RTK instructions to CLAUDE.md
rtk init --global # Add RTK to ~/.claude/CLAUDE.md| Category | Commands | Typical Savings |
|---|---|---|
| Tests | vitest, playwright, cargo test | 90-99% |
| Build | next, tsc, lint, prettier | 70-87% |
| Git | status, log, diff, add, commit | 59-80% |
| GitHub | gh pr, gh run, gh issue | 26-87% |
| Package Managers | pnpm, npm, npx | 70-90% |
| Files | ls, read, grep, find | 60-75% |
| Infrastructure | docker, kubectl | 85% |
| Network | curl, wget | 65-70% |
Overall average: 60-90% token reduction on common development operations.
GenVarLoader is a Python/Rust hybrid library for efficiently loading genomic data with genetic variation to train sequence models. It reconstructs haplotypes and re-aligns functional genomic tracks on the fly without writing personalized genomes to disk.
All commands require the pixi package manager. Use pixi run -e dev <task> for development tasks.
# Generate test data (required before first test run)
pixi run -e dev gen
# Run all tests (pytest + cargo)
pixi run -e dev test
# Run a single pytest test
pixi run -e dev pytest tests/dataset/test_dataset.py::test_name -v
# Lint
pixi run -e dev ruff check python/
pixi run -e dev typecheck
# Build docs
pixi run -e docs docThe build system uses Maturin (Rust + Python). Rust code is compiled automatically when running tests via pixi.
python/genvarloader/— main Python packagesrc/— Rust extension (BigWig interval extraction via PyO3/bigtools)
Writing: write(bed, variants, bigwigs) → dataset_dir/
- Normalizes variants (left-align, bi-allelic, atomized)
- Extracts BigWig intervals and re-aligns them to haplotype coordinates when indels are present
- Stores metadata, sparse genotypes, and interval data
Reading: Dataset.open(path, reference?) → RaggedDataset
- Loads metadata and region index map
- Initializes lazy readers (
Hapsfrom genotypes, orReffrom reference) - Eager indexing
dataset[region_idx, sample_idx]triggers data loading
_dataset/_impl.py—Dataset,RaggedDataset,ArrayDatasetclasses; main user API_dataset/_write.py— dataset writing pipeline_dataset/_reconstruct.py— haplotype and track reconstruction from stored data_dataset/_genotypes.py— genotype handling (VCF/PGEN sparse storage)_dataset/_tracks.py— track re-alignment to account for indels_variants/— variant record structures and VCF/PGEN reading_bigwig.py—BigWigsreader wrapping the Rust backend_ragged.py— ragged array utilities built onseqpro.rag.Ragged_types.py—Readerprotocol,AnnotatedHaps, type aliases
Reader protocol (_types.py): Abstract interface for all data sources (VCF, PGEN, BigWig, FASTA). Implementors must provide read(), name, dtype, contigs, coords, chunked.
AnnotatedHaps: Haplotype sequences with parallel arrays for variant indices and reference coordinates. Dtype is S1 (single byte per nucleotide).
Ragged arrays: Variable-length data throughout. RaggedIntervals, RaggedSeqs, RaggedTracks, RaggedAnnotatedHaps all wrap seqpro.rag.Ragged. Use .to_padded() to materialize into dense arrays.
Dataset (frozen dataclass): Lazy view over stored data. Subsetting via subset_to() returns a new lazy view; eager access is dataset[region, sample]. The return type varies based on whether a reference genome and tracks are present.
dataset_dir/
├── metadata.json # sample names, contigs, ploidy, max_jitter
├── input_regions.arrow # BED regions + region index map
├── genotypes/ # sparse genotype storage (if variants provided)
└── intervals/ # track data (or annot_intervals/ with annotation)
skills/genvarloader/SKILL.md is an AI-agent reference for gvl's public Python API. Any PR that changes the public API must also update this skill. Public API = anything exported in python/genvarloader/__init__.py __all__, plus the docstrings, signatures, and defaults of gvl.write, Dataset.open, and every Dataset.with_* method.
In scope:
- New, removed, or renamed public symbols
- Changed signatures, defaults, or accepted literal values (e.g. new
with_seqskind) - New output modes, insertion-fill strategies, or splice/site-only behavior
- Changed bcftools/plink2 preprocessing requirements
- Changed on-disk format that affects how users open datasets
When a change ships, update the relevant section of the skill and re-check the "Common gotchas" and "Where to look next" pointer table. The skill is published to https://www.skills.sh/ as mcvickerlab/GenVarLoader (installable via npx skills add mcvickerlab/GenVarLoader); keep it accurate against main.
- Pixi environments: Use
-e devfor development,-e docsfor documentation,-e py310/py311/py312/py313for Python version testing. Platform is linux-64. - Ruff config: E501 (line length) is ignored.
- Pyrefly: Configured permissively; type annotations follow patterns in
_types.py. - Conventional commits: Project uses commitizen for versioning.
- Test markers:
@pytest.mark.slowfor slow tests (excluded by default).