This issue is going to briefly touch upon the state of the dot-calling in cooltools.
Intro
We were trying to re-implement HiCCUPS dot-calling algorithm under the cooltools umbrella for some time now. It is still under active development and right now code is scattered across forks and branches.
master
The initial progress that we made with dot-calling, by implementing convolution based calculation of locally adjusted expected (donut, lowleft, vertical, horizontal) is reflected in this repository in the master branch. The post processing steps in this master are closest to the so-called BH-FDR version of dot-calling in the original HiCCUPS paper (Rao etal 2014) - in a sense that we do not do the lambda-chunking to perform multiple hypothesis testing. Moreover this implementation simply ends with the dump of the pre-calculated adjusted expected for different kernels: donut, lowleft, vertical, horizontal, and reports that post-processing in a bad shape. Thus this isn't very usable for now, not for the final dot-calling at least.
new-dotfinder
Lambda-chunking procedure is implemented in the dekkerlab fork of the cooltools branch new-dotfinder, which is pip-installable:
pip install git+https://github.com/dekkerlab/cooltools@new-dotfinder
pip install -e git+https://github.com/dekkerlab/cooltools@new-dotfinder#egg=cooltools
The second command would allow to modify the source code, whereas the 1st one would simply install it. One would want to modify the source code if the enrichment threshold modification is needed, for instance - as those are not implemented as CLI options just yet. A typical run would be:
cooltools call_dots -n {cores}
-o {signif_dots} -v --fdr 0.1
--max-nans-tolerated 7
--max-loci-separation 20000000
--dots-clustering-radius 21000
--tile-size 10000000
{input_cooler} {input_expected}
Which would produce list of dots that passed multiple hypothesis testing (the lambda-chunking step itself) but haven't been post-processed, i.e. clustered or filtered by enrichment. The post processed list of dots would show up in the same folder as signif_dots but with the prefix final_ - we'll fix this ugliness later on of course. call_dots CLI determines resolution and pick correct kernels parameters w, p accordingly (all of the defaults kept same as HiCCUPS).
This dot-caller albeit very close to HiCCUPS implementation, deviates from it in some regards, some of the most importnat aspects:
- fixed kernels size for every pixel (in HiCCUPS "donuts" are shrinked near the diagonal and enlarged as needed based on the value of
lowleft)
- clustering is slightly different - we use off the shelve
Birch, HiCCUPS implements special greedy algorithm for that - results are very close anyways.
- really minor detail in the way we treat pixels near the bad-rows/columns (ones filled with
NaNs after balancing): HiCCUPS disregard pixels that are within ~5 pixels away from bad-rows/cols, instead we simply check number of NaNs in a kernel footprint --max-nans-tolerated - given the resolution/size of the kernels one can realize which pixels would be discarded.
... I might add more details here later on, and edit/elaborate more on this here ...
shrink-donut-dotfinder
This branch elaborates further on top of the new-dotfinder by dealing with the disrepancy #1 - dynamic kernels resizing. As it follows from the name here we implemented only the near-diagonal kernels shrinakge - arguably the most important aspect of the dynamic kernels, which was preventing us from calling dots really close to the diagonal and driving the deviation between cooltools dot-calling and HiCCUPS. There are no additional parameters that user needs to control in this case, everything is done the same way as in HiCCUPS - and this is data-independent kernel shrinkage, as opposed to the enlarging kernels based on the value of the lowleft kernels for each pixel tested.
This branch eliminates another difference between HiCCUPS and cooltools which is related to the way lambda-chunking is implemented and is too technical to describe here. To give some numbers, for Rao et al 2014 GM... primary dataset we are getting ~7700 dots vs ~8050 by HiCCUPS, where overlap is ~7600.
This issue is going to briefly touch upon the state of the dot-calling in
cooltools.Intro
We were trying to re-implement HiCCUPS dot-calling algorithm under the
cooltoolsumbrella for some time now. It is still under active development and right now code is scattered across forks and branches.master
The initial progress that we made with dot-calling, by implementing convolution based calculation of locally adjusted expected (
donut,lowleft,vertical,horizontal) is reflected in this repository in themasterbranch. The post processing steps in thismasterare closest to the so-called BH-FDR version of dot-calling in the original HiCCUPS paper (Rao etal 2014) - in a sense that we do not do the lambda-chunking to perform multiple hypothesis testing. Moreover this implementation simply ends with the dump of the pre-calculated adjusted expected for different kernels:donut,lowleft,vertical,horizontal, and reports that post-processing in a bad shape. Thus this isn't very usable for now, not for the final dot-calling at least.new-dotfinder
Lambda-chunking procedure is implemented in the
dekkerlabfork of thecooltoolsbranchnew-dotfinder, which ispip-installable:The second command would allow to modify the source code, whereas the 1st one would simply install it. One would want to modify the source code if the enrichment threshold modification is needed, for instance - as those are not implemented as CLI options just yet. A typical run would be:
Which would produce list of dots that passed multiple hypothesis testing (the lambda-chunking step itself) but haven't been post-processed, i.e. clustered or filtered by enrichment. The post processed list of dots would show up in the same folder as
signif_dotsbut with the prefixfinal_- we'll fix this ugliness later on of course.call_dotsCLI determines resolution and pick correct kernels parametersw,paccordingly (all of the defaults kept same as HiCCUPS).This dot-caller albeit very close to HiCCUPS implementation, deviates from it in some regards, some of the most importnat aspects:
lowleft)Birch, HiCCUPS implements special greedy algorithm for that - results are very close anyways.NaNsafter balancing): HiCCUPS disregard pixels that are within~5pixels away from bad-rows/cols, instead we simply check number ofNaNsin a kernel footprint--max-nans-tolerated- given the resolution/size of the kernels one can realize which pixels would be discarded.... I might add more details here later on, and edit/elaborate more on this here ...
shrink-donut-dotfinder
This branch elaborates further on top of the
new-dotfinderby dealing with the disrepancy#1- dynamic kernels resizing. As it follows from the name here we implemented only the near-diagonal kernels shrinakge - arguably the most important aspect of the dynamic kernels, which was preventing us from calling dots really close to the diagonal and driving the deviation betweencooltoolsdot-calling and HiCCUPS. There are no additional parameters that user needs to control in this case, everything is done the same way as in HiCCUPS - and this is data-independent kernel shrinkage, as opposed to the enlarging kernels based on the value of thelowleftkernels for each pixel tested.This branch eliminates another difference between HiCCUPS and
cooltoolswhich is related to the way lambda-chunking is implemented and is too technical to describe here. To give some numbers, for Rao et al 2014 GM... primary dataset we are getting ~7700 dots vs ~8050 by HiCCUPS, where overlap is ~7600.