Summary
Evaluate GENOMICON-Seq as a potential simulation module to generate biologically realistic PCR duplicates with cycle-by-cycle amplification modeling—addressing the sampling duplicate limitation in Wessim (see #51).
Background
GENOMICON-Seq is a modern exome simulation tool that explicitly models both the capture step and PCR amplification in a unified pipeline:
| Feature |
Wessim (current) |
GENOMICON-Seq |
| Capture bias |
✅ Probe hybridization |
✅ Probe capture efficiency |
| PCR duplicates |
❌ Sampling-based |
✅ Cycle-by-cycle amplification |
| Error model |
External (GemSim/ART) |
Integrated per-cycle errors |
| Duplicate families |
Binary (2 copies) |
Realistic clonal expansion |
How GENOMICON-Seq works
- Probe capture simulation: Models which fragments are "pulled down" based on hybridization efficiency
- PCR amplification: Uses an inverted sigmoid function to model amplification efficiency per cycle
- Error introduction: Adds noise during PCR cycles, creating realistic duplicate families with shared errors
This produces PCR duplicate families where one source molecule → N copies with correlated errors, matching wet-lab behavior.
Availability
Investigation Tasks
Phase 1: Feasibility assessment
Phase 2: Integration evaluation
Phase 3: Duplicate quality comparison
# Compare duplicate characteristics
# GENOMICON-Seq output
picard MarkDuplicates I=genomicon_output.bam M=genomicon_metrics.txt
# Wessim output (from #51)
picard MarkDuplicates I=wessim_output.bam M=wessim_metrics.txt
# Compare family size distributions
Integration Options
Option A: Add as new simulator backend
muconeup run --simulator genomicon-seq --coverage 100x
- Pros: Preserves backward compatibility, user choice
- Cons: Maintenance burden of multiple backends
Option B: Replace Wessim for Illumina exome
- Pros: Simplified codebase, better defaults
- Cons: Breaking change, may affect reproducibility of existing results
Option C: Hybrid approach
- Use GENOMICON-Seq for capture + PCR simulation
- Pipe fragments to ReSeq for Illumina error profiles
- Pros: Best of both worlds (realistic duplicates + validated error model)
- Cons: Pipeline complexity
Technical Considerations
Input requirements
- Reference genome (we have: hg38/hg19)
- Probe BED file (need: kit-specific files for common platforms)
- Target regions (we have: MUC1 VNTR coordinates)
Output format
- Verify GENOMICON-Seq outputs standard BAM/FASTQ
- Check read naming conventions for duplicate tracking
- Ensure compatibility with downstream callers (VNtyper, adVNTR)
Success Criteria
Related Issues
References
Summary
Evaluate GENOMICON-Seq as a potential simulation module to generate biologically realistic PCR duplicates with cycle-by-cycle amplification modeling—addressing the sampling duplicate limitation in Wessim (see #51).
Background
GENOMICON-Seq is a modern exome simulation tool that explicitly models both the capture step and PCR amplification in a unified pipeline:
How GENOMICON-Seq works
This produces PCR duplicate families where one source molecule → N copies with correlated errors, matching wet-lab behavior.
Availability
Investigation Tasks
Phase 1: Feasibility assessment
Phase 2: Integration evaluation
Phase 3: Duplicate quality comparison
Integration Options
Option A: Add as new simulator backend
Option B: Replace Wessim for Illumina exome
Option C: Hybrid approach
Technical Considerations
Input requirements
Output format
Success Criteria
Related Issues
References