Last update: May 23, 2024.
Pre-print: link on biorxiv.
-
An account on Google Cloud Platform (GCP).
-
For each whole genome, you need to output the CpG context files, flag CpGs with the neareast SNP when possible, and flag each sequencing read with REF or ALT (parental chromosome) when possible, using the pipeline CloudASM and store them on BigQuery in a dataset.
-
Configure the
GCPvariables inconfig.yamlaccordingly.
All the steps are in main.sh, which can be executed. The main steps are the following:
- Create a python-based and bash-based image on GCP's artifact repository using the docker folder in the repository.
- define environmental variables in bash using
config.yaml - Prepare the reference genome in genomic regions of 250 nucleotides.
- Evaluate ASM using CloudASM in the samples (here, ENCODE).
- Machine Learning. See our pre-print for a detailed description of the steps.