Skip to content

Possible batch ordering issue in from_multiple_fragments_tsv #16

@ymahmoud

Description

@ymahmoud

Hi, thanks for ChromatinHD, we’ve found it very useful for our analyses.

We think we found an issue in from_multiple_fragments_tsv. When loading 12 samples, some ended up with ~50x fewer fragments per cell than expected, even though the raw files are all similar in size.

We traced it to line 294 of fragments.py, where cell_to_cell_ix_batches is built using obs[batch_column].unique(). Since unique() returns values in order of first appearance (not sorted), and this list gets zipped with the file-ordered fragments_tabix_batches, the barcode lookups can get paired with the wrong files when the obs row order doesn't match batch number order (e.g. after integration).

Would replacing obs[batch_column].unique() with range(len(fragments_files)) fix this? Happy to provide more details if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions