Working with non-model organism means you don't have known SNPs and prepared interval list in GATK bundle. Alternatively I did:
- Optional Base Quality Score Calibration(BQSR) (working...)
- Apply hard filters to call sets.
- Split genome into scaffolds as intervals.
How to run:
conda env create -f environment.yamland install bamcov ...- prepare metadata.tsv.
- modify config.yaml if needed.
snakemake
To run on JCU HPC with PBSpro:
snakemake -p --cluster-config jcu_hpc.json --cluster "qsub -j oe -l walltime={cluster.time} -l select=1:ncpus={cluster.ncpus}:mem={cluster.mem}" --jobs 100 --latency-wait 5
Clean everything:
snakemake clean
Reference:
https://snakemake.readthedocs.io/en/stable/ https://github.com/gatk-workflows/gatk4-germline-snps-indels https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling https://zhuanlan.zhihu.com/p/33891718 https://software.broadinstitute.org/gatk/documentation/article?id=11097 https://gatkforums.broadinstitute.org/gatk/discussion/12443/genomicsdbimport-run-slowly-with-multiple-samples
