InterDomain is a Python package for detecting metadomains in Hi-C contact matrices. It supports both intra-chromosomal and inter-chromosomal metadomain calling, allowing researchers to identify significant genomic interactions within and between chromosomes.
- Python: Version 3.6 or higher
- Dependencies: The following Python packages are required:
- numpy
- pandas
- cooler
- scipy
- matplotlib
Clone the Repository:
git clone https://github.com/yourusername/InterDomain.git
cd InterDomain
pip install .This short tutorial walks through calling metadomains on a Treg dataset and plotting the top results.
- Download the Treg
.mcoolfile
- Download the Treg
.mcoolfile from https://drive.google.com/drive/folders/1bsxRWz-5rAUNe-3OctKurxgIj7hCoTOs?usp=sharing and note its path on your machine. Reference a specific resolution using a Cooler URI, e.g.path/to/Treg_all.mcool::/resolutions/50000.
- Run intra-chromosomal metadomain calling
call_metadomains_intra path/to/Treg_all.mcool::/resolutions/50000 \
--label Treg \
--n_workers 8 \
--save_intermediates \- Run inter-chromosomal metadomain calling
call_metadomains_inter path/to/Treg_all.mcool::/resolutions/50000 \
--label Treg \
--n_workers 8 \
--save_intermediates \- Plot the top metadomains
- Replace
--typewithintraorinterdepending on which results you want to visualize.
interdomain_plot \
--top_n 20 \
--output_dir bedfile_output/ \
--type interNotes:
- Adjust
--n_workersto match your available CPU cores. - If you used a
.coolfile instead of.mcool, pass its path directly (no resolution suffix).
- To perform intra-chromosomal metadomain calling, use the call_metadomains_intra command:
- call_metadomains_intra path/to/your_file.cool
- To perform inter-chromosomal metadomain calling, use the call_metadomains_inter command:
- call_metadomains_inter path/to/your_file.cool
The commands are similar for both intra and inter metadomains, with slight modifications to the hyperparameters. The following are the main arguments for both commands:
These are the main arguments related to how the program is run:
- --n_workers: Number of worker processes to use (default: 1; max: number of chromosomes).
- --label: Label for the output files (default: 'test').
- --save_intermediates: Save intermediate matrices for plotting and inspection of results (default: False). Will take up a lot of space.
- --output_dir: Directory to save output files (default: 'bedfile_output/').
Here are the main hyperparameters for the program, which can be adjusted to change the behavior of the program:
- --filter_width: Size of the inside filter (intra default: 1; inter default: 3;). Increasing this will detect larger metadomains and more significant interactions; recommended to be larger for sparser datasets.
- --sigma: Sigma value for smoothing if --useSigma is set (default: 2). Recommended for interchromosomal matrices, can be turned up if you see that your smoothed obs/exp contact map doesn't look smooth, especially on sparser datasets.
- --prominence: Prominence threshold for peak detection (default: 4). Lower values --> more metadomains detected. Higher values --> more stringent in calling metadomains.
- --pco: P-value cutoff for calling metadomains (default: 5 for interchromosomal matrices; 20 for intrachromosomal matrices). Lower values --> more metadomains detected. This should be a decent balance between sensitivity and specificity.
Here are other parameters which can be changed, but shouldn't need to be changed for most datasets:
- --filter_n: Size of the outside filter (default: 15). This should already be a good outside filter for most datasets.
- --useSigma: Use sigma for smoothing (intra default: False; inter default: True; recommended for sparser matrices and interchromosomal matrices).
- --cutoff: intra only; determines the minimum size of a metadomain (default: 2,000,000). This should already be a good cutoff for most datasets.
Results will be saved in the specified directory (--output_dir; default: 'bedfile_output/'). The output files will contain the metadomain coordinates and other relevant information, and are saved as .tsv files. Intra and inter-chromosomal results will be saved in separate directories (/intra/ and /inter/).
Intermediate results can be saved in --save_intermediates, which can allow for better visualization of intermediate steps in the metadomain calling process.
Intermediate files are as follows:
- Observed and expected matrices
- Balanced matrices
- Smoothed matrices
- Peak detection results
- -LogP values
After running the metadomain calling, you can visualize the most significant metadomains using the interdomain_plot command. This command requires :
- --top_n: specify the number of metadomains to plot
- --output_dir: specify what the output dir was for your dataset
- --type: specify whether to plot intrachromosomal or interchromosomal metadomains