A simple Snakemake workflow to process paired-end CUT&Tag libraries and produce spike-in scaled bigWig files.
This is a very simple workflow that performs the following steps:
- Align with
bowtie2to reference and spikein reference genomes. - Deduplicate with
Picard. - Filter regions from an exclude loci set (BED file) using
bedtools intersect. - Collect insert size metrics of these alignments with
Picard. - Filter for uniquely mapping reads (MQ over certain threshold - default > 20) using
samtools view. - Count alignments both total and proper pairs, making two output tables using
samtools flagstat. - Calculate spikein factors (see scaling below).
- Make scaled bigWig files using spikein factors using
deepTools bamCoverage. - Call peaks with
MACS3on each sample (MACS3 has a default non-background method now for ATAC seq that should work with CUT&Tag better than the ChIP-seq specific one).
This workflow produces bigWig files where the sample reference is scaled to RPGC 1x genome coverage using deepTools.
In order to achieve this, the spike-in factor is corrected to account for this. deepTools is called
with --effectiveGenomeSize set accordingly to the target genome, and a --scaleFactor parameter that is calculated
as the ratio: target_reads / spikein_reads, normalized to the reference ratio
ref_target_reads / ref_spikein_reads (proper pair counts, after deduplication and exclude regions, to
avoid competition for possibly shared repetitive regions).
Note that for the reference this second factor always equals to 1, hence, reference library has always global mean of 1. The rest of the libraries are relative to it.
- Create a conda environment using the
environment.ymlfile provided:
mamba env create -n cutntag -f environment.yml
- Activate the environment:
conda activate cutntag
- Install
macs3via pip:
pip install macs3
- Clone this repository:
git clone https://github.com/cnluzon/cutntag.git
- Edit the
genomes.yamlfile in it with the necessary values. - Run snakemake defining target, spikein and reflib parameters:
snakemake -p -s /path/to/cutntag/Snakemake --rerun-incomplete --config target=mm39 spikein=dm6 reflib=df_library
Note that target, spikein and reflib parameters can also be defined in the cutntag.yaml file, and that their values
must match one of the references in the same file. For an example of those files, check the
It is possible to run this simple pipeline only producing RPGC scaled bigwigs for the targets, if one has samples with no spike-ins so no scaling possible. In this case, one runs:
snakemake -p -s /path/to/cutntag/Snakemake no_spikein --rerun-incomplete --config target=mm39
The workflow will run on all the FASTQ file pairs provided in the fastq directory in the running directory.
Additionally, it needs a cutntag.yaml file that contains paths to the reference genome files, bowtie indexes and exclude BED files.
It takes as parameters target, spikein which should match a genome in the genomes.yaml file, and reflib as the
prefix of the fastq file pair that serves as spikein reference. For instance, for tf_library_R1.fastq.gz, tf_library_R2.fastq.gz,
the value would be tf_library (without the _R1.fastq.gz).