Skip to content

pritykinlab/InterDomain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InterDomain

InterDomain is a Python package for detecting metadomains in Hi-C contact matrices. It supports both intra-chromosomal and inter-chromosomal metadomain calling, allowing researchers to identify significant genomic interactions within and between chromosomes.


Installation

Requirements

  • Python: Version 3.6 or higher
  • Dependencies: The following Python packages are required:
    • numpy
    • pandas
    • cooler
    • scipy
    • matplotlib

Installation Steps

Clone the Repository:

git clone https://github.com/yourusername/InterDomain.git
cd InterDomain
pip install .

Tutorial

This short tutorial walks through calling metadomains on a Treg dataset and plotting the top results.

  1. Download the Treg .mcool file
  1. Run intra-chromosomal metadomain calling
call_metadomains_intra path/to/Treg_all.mcool::/resolutions/50000 \
  --label Treg \
  --n_workers 8 \
  --save_intermediates \
  1. Run inter-chromosomal metadomain calling
call_metadomains_inter path/to/Treg_all.mcool::/resolutions/50000 \
  --label Treg \
  --n_workers 8 \
  --save_intermediates \
  1. Plot the top metadomains
  • Replace --type with intra or inter depending on which results you want to visualize.
interdomain_plot \
  --top_n 20 \
  --output_dir bedfile_output/ \
  --type inter

Notes:

  • Adjust --n_workers to match your available CPU cores.
  • If you used a .cool file instead of .mcool, pass its path directly (no resolution suffix).

Usage

Intra and Inter-chromosomal Metadomain Calling

  • To perform intra-chromosomal metadomain calling, use the call_metadomains_intra command:
    • call_metadomains_intra path/to/your_file.cool
  • To perform inter-chromosomal metadomain calling, use the call_metadomains_inter command:
    • call_metadomains_inter path/to/your_file.cool

The commands are similar for both intra and inter metadomains, with slight modifications to the hyperparameters. The following are the main arguments for both commands:

These are the main arguments related to how the program is run:

  • --n_workers: Number of worker processes to use (default: 1; max: number of chromosomes).
  • --label: Label for the output files (default: 'test').
  • --save_intermediates: Save intermediate matrices for plotting and inspection of results (default: False). Will take up a lot of space.
  • --output_dir: Directory to save output files (default: 'bedfile_output/').

Here are the main hyperparameters for the program, which can be adjusted to change the behavior of the program:

  • --filter_width: Size of the inside filter (intra default: 1; inter default: 3;). Increasing this will detect larger metadomains and more significant interactions; recommended to be larger for sparser datasets.
  • --sigma: Sigma value for smoothing if --useSigma is set (default: 2). Recommended for interchromosomal matrices, can be turned up if you see that your smoothed obs/exp contact map doesn't look smooth, especially on sparser datasets.
  • --prominence: Prominence threshold for peak detection (default: 4). Lower values --> more metadomains detected. Higher values --> more stringent in calling metadomains.
  • --pco: P-value cutoff for calling metadomains (default: 5 for interchromosomal matrices; 20 for intrachromosomal matrices). Lower values --> more metadomains detected. This should be a decent balance between sensitivity and specificity.

Here are other parameters which can be changed, but shouldn't need to be changed for most datasets:

  • --filter_n: Size of the outside filter (default: 15). This should already be a good outside filter for most datasets.
  • --useSigma: Use sigma for smoothing (intra default: False; inter default: True; recommended for sparser matrices and interchromosomal matrices).
  • --cutoff: intra only; determines the minimum size of a metadomain (default: 2,000,000). This should already be a good cutoff for most datasets.

Outputs

Results will be saved in the specified directory (--output_dir; default: 'bedfile_output/'). The output files will contain the metadomain coordinates and other relevant information, and are saved as .tsv files. Intra and inter-chromosomal results will be saved in separate directories (/intra/ and /inter/).

Intermediate results can be saved in --save_intermediates, which can allow for better visualization of intermediate steps in the metadomain calling process.

Intermediate files are as follows:

  • Observed and expected matrices
  • Balanced matrices
  • Smoothed matrices
  • Peak detection results
  • -LogP values

Plotting Metadomains

After running the metadomain calling, you can visualize the most significant metadomains using the interdomain_plot command. This command requires :

  • --top_n: specify the number of metadomains to plot
  • --output_dir: specify what the output dir was for your dataset
  • --type: specify whether to plot intrachromosomal or interchromosomal metadomains

About

Code to run metadomain calling on your dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published