Hi, thanks for ChromatinHD, we’ve found it very useful for our analyses.
We think we found an issue in from_multiple_fragments_tsv. When loading 12 samples, some ended up with ~50x fewer fragments per cell than expected, even though the raw files are all similar in size.
We traced it to line 294 of fragments.py, where cell_to_cell_ix_batches is built using obs[batch_column].unique(). Since unique() returns values in order of first appearance (not sorted), and this list gets zipped with the file-ordered fragments_tabix_batches, the barcode lookups can get paired with the wrong files when the obs row order doesn't match batch number order (e.g. after integration).
Would replacing obs[batch_column].unique() with range(len(fragments_files)) fix this? Happy to provide more details if helpful.
Hi, thanks for ChromatinHD, we’ve found it very useful for our analyses.
We think we found an issue in
from_multiple_fragments_tsv. When loading 12 samples, some ended up with ~50x fewer fragments per cell than expected, even though the raw files are all similar in size.We traced it to line 294 of
fragments.py, wherecell_to_cell_ix_batchesis built usingobs[batch_column].unique(). Sinceunique()returns values in order of first appearance (not sorted), and this list gets zipped with the file-orderedfragments_tabix_batches, the barcode lookups can get paired with the wrong files when the obs row order doesn't match batch number order (e.g. after integration).Would replacing
obs[batch_column].unique()withrange(len(fragments_files))fix this? Happy to provide more details if helpful.