Skip to content

pairsamtools subsampling [new tool, enhancement] #66

@sergpolly

Description

@sergpolly

I feel like we would benefit from having a simple pairsamtools subsample tool (or an option to subsample for pairsamtools select) ...

The rationale being - to enable us to do some "rigorous" statistics/significance estimation/bootstrapping/permutation testing for some of the analyses, e.g., if we want to measure a "subtle" compartment strength difference between 2 experiments, and we have 10 mln and 12 mln pairs for the experiments - one can subsample both down to 5 mln several times and calculate a compartment strength for each subsample and compare the resultant distributions. Another example would be - subsampling and mixing mitotic and G1 pairs to check if some experimental effects could be explained by such a simple mixture, etc.

Technical notes/questions:

  • the only way to subsample in 1 pass (streaming like) is by knowing the total # of pairs (#pairs per chrom etc) a priori ?!
  • there might be need to implement more sophisticated samplings - distance dependent weights, chrom dependent weights, cis/trans, etc (do not overdo what's already available in select) ...
  • any other way to do a streaming-like subsample ? Do we need to care about its streaming nature ?
  • would pairix index help speed up subsampling ? Should we rely on it ?
  • does it seem likesubsample fit into select or it deserves to be a separate tool ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions