Possible batch ordering issue in from_multiple_fragments_tsv

Hi, thanks for ChromatinHD, we’ve found it very useful for our analyses.

We think we found an issue in `from_multiple_fragments_tsv`. When loading 12 samples, some ended up with ~50x fewer fragments per cell than expected, even though the raw files are all similar in size.

We traced it to line 294 of `fragments.py`, where `cell_to_cell_ix_batches` is built using `obs[batch_column].unique()`. Since `unique()` returns values in order of first appearance (not sorted), and this list gets zipped with the file-ordered `fragments_tabix_batches`, the barcode lookups can get paired with the wrong files when the obs row order doesn't match batch number order (e.g. after integration).

Would replacing `obs[batch_column].unique()` with `range(len(fragments_files))` fix this? Happy to provide more details if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible batch ordering issue in from_multiple_fragments_tsv #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible batch ordering issue in from_multiple_fragments_tsv #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions