This repository contains R scripts that streamline the process of generating Hi-C aggregate pileup plots using Coolpuppy from Open2C (https://github.com/open2c/coolpuppy). The scripts automate running coolpup.py, processing its output, and visualizing the results with ggplot2, allowing for easy generation of multiple plots in a single workflow.
Coolpuppy (https://github.com/open2c/coolpuppy) ##Check for other requirements for Coolpuppy R (>=4.4)
library(ggplot2)
library(stringr)
library(reshape2)
library(rhdf5)
library(dplyr)
library(RcolorBrewer)
Hi-C file: The script uses mcool files as input, as you will be able to define the resolution required. Bed File: A standard 3,4, or 6 column tab-separated bed file or a tab-separated paired bed file (format: seqnames1,start1,end1,seqnames2,start2,end2).
- Be sure that the syntax of chromsome names are the same as the cooler file. (i.e. "chr1" or "1")
- if using a paired bed file, you will need to change the input paramters (see below)
There are three functions available in the script functions.r:
- Local_pileup: Used to create local aggregate pileups (i.e.: region1:region1,region2:region2,region3:region3 etc.)
- Intra_contacts: Used to create aggregate pile-ups of all possible cis contacts
- Inter_contacts: Used to create aggregate pile-ups of all possible trans contacts
## PROVIDE PATH TO COOLTOOLS
path_to_cooltools = "/path/to/coolpup.py"
## PATH TO HIC COOLERS (MCOOL FILES) --> comma separated vector for many files
cooler=c(
"/path/to/coolfiles/cooler_sample1.mcool",
"/path/to/coolfiles/cooler_sample2.mcool",
"/path/to/coolfiles/cooler_sample3.mcool",
)
## BED FILES TO USE --> comma separated vector for many files
bed = c(
"/path/to/bedfiles/peaks_WT.bed",
"/path/to/bedfiles/peaks_KO.bed"
)
## BED FILE FORMAT ("bed" OR "bedpe")
ft_format = "bed"
## FLANKING WINDOWS IN BASEBAIRS
flank = 500000
## RESOLUTION OF HIC COOLER FILE TO USE IN BASEPAIRS
res = 10000
## OUTPUT FOLDER PATH
outdir="/path/to/outdir/"
## WEIGHT NAME (e.g. "KR",IF NONE, WRITE "" )
weight = ""
## NUMBER OF PROCESSORS TO USE,
proc = 24
## Logical whether to subset the bed files for less processing time
sample_peaks = T
## if sample_peaks = T, the number of peaks to sample
sample_number = 1000
Local Pileups:
library(ggplot2)
library(stringr)
library(reshape2)
library(rhdf5)
library(dplyr)
source("functions.r")
p1 <- Local_pileup(
path_to_cooltools = "/path/to/coolpup.py",
cooler=c(
"/path/to/coolfiles/cooler_sample1.mcool",
"/path/to/coolfiles/cooler_sample2.mcool",
"/path/to/coolfiles/cooler_sample3.mcool",
) ,
bed = c(
"/path/to/bedfiles/peaks_WT.bed",
"/path/to/bedfiles/peaks_KO.bed"
),
ft_format = "bed" ,
flank = 500000 ,
res = 10000,
outdir="/path/to/outdir/",
weight = "",
proc = 24
)
p1
Intra-contact pileups:
library(ggplot2)
library(stringr)
library(reshape2)
library(rhdf5)
library(dplyr)
source("functions.r")
p1 <- Intra_contacts(
path_to_cooltools = "/path/to/coolpup.py",
cooler=c(
"/path/to/coolfiles/cooler_sample1.mcool",
"/path/to/coolfiles/cooler_sample2.mcool",
"/path/to/coolfiles/cooler_sample3.mcool",
) ,
bed = c(
"/path/to/bedfiles/peaks_WT.bed",
"/path/to/bedfiles/peaks_KO.bed"
),
ft_format = "bed" ,
flank = 500000 ,
res = 10000,
outdir="/path/to/outdir/",
weight = "",
proc = 24,
sample_peaks = T,
sample_number = 1000
)
p1
Inter-contact pileups:
library(ggplot2)
library(stringr)
library(reshape2)
library(rhdf5)
library(dplyr)
source("functions.r")
p1 <- Inter_contacts(
path_to_cooltools = "/path/to/coolpup.py",
cooler=c(
"/path/to/coolfiles/cooler_sample1.mcool",
"/path/to/coolfiles/cooler_sample2.mcool",
"/path/to/coolfiles/cooler_sample3.mcool",
) ,
bed = c(
"/path/to/bedfiles/peaks_WT.bed",
"/path/to/bedfiles/peaks_KO.bed"
),
ft_format = "bed" ,
flank = 500000 ,
res = 10000,
outdir="/path/to/outdir/",
weight = "",
proc = 24,
sample_peaks = T,
sample_number = 500
)
p1