Skip to content

pappewaio/plink2_score_dosage_verification

Repository files navigation

PLINK2 Dosage Scoring Test Framework

Verifies whether PLINK2 uses dosage information or hard calls when scoring with pgen files.

🚀 Quick Start

# 1. Setup environment
./setup_environment.sh

# 2. Activate environment  
mamba activate plink2_score_dosage_verification

# 3. Run test
./test_plink2_dosage.sh

# 4. View results
firefox plink2_dosage_test/html_report/unified_report.html

📊 What It Does

Creates synthetic genetic data with known dosage values, runs PLINK2 scoring, and determines whether PLINK2 uses:

  • Dosage information (continuous 0-2 values)
  • Hard calls (discrete 0,1,2 genotypes)

Key Discovery: PLINK2 uses dosages when VCF files are imported with the dosage=DS flag.

🎯 Key Results

With proper dosage import (plink2 --vcf file.vcf dosage=DS --make-pgen):

  • Perfect correlation (r = 1.000) with dosage-based expected scores
  • PGEN and BED formats give different results (as expected)
  • All test conditions confirm dosage usage

📋 Requirements

  • mamba (recommended) or conda
  • Internet connection for package installation

🔧 Command Options

./test_plink2_dosage.sh [OPTIONS]

# Common usage
./test_plink2_dosage.sh              # Standard test
./test_plink2_dosage.sh --all        # All test conditions
./test_plink2_dosage.sh --no-plots   # Faster execution
./test_plink2_dosage.sh -s 200 -v 100 --all  # Custom size + all tests

📁 Output

  • Comprehensive report: plink2_dosage_test/html_report/unified_report.html
  • Analysis summary: plink2_dosage_test/comprehensive_analysis.txt
  • Individual test reports: plink2_dosage_test/*/html_report/

🔬 Test Conditions

  • Standard: Beta(2,2) distribution dosages
  • Edge cases: Dosages near rounding boundaries (0.49, 0.51, 1.49, 1.51)
  • Alternative scoring: Different PLINK2 command variations
  • Large scale: 5000 samples, 500 variants (optional)
  • Distributions: Uniform, bimodal, normal distributions (optional)

🎯 Critical Finding

PLINK2 requires explicit dosage import: Without the dosage=DS flag, PLINK2 defaults to hard call behavior even when dosage information is available in the VCF file.

Correct commands:

  • Import: plink2 --vcf file.vcf dosage=DS --make-pgen
  • Export: plink2 --pfile data --export vcf vcf-dosage=DS

📚 Documentation

  • details.md - Detailed methodology, troubleshooting, technical specifications
  • Test reports - Generated HTML reports with comprehensive analysis
  • Source code - Well-documented R scripts in bin/ directory

🔗 References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors