Skip to content

pickles break while "multiprocess"-ing dots #50

@sergpolly

Description

@sergpolly

more of a docs/reminder rather than an issue

  1. While trying to split call-dots into modular steps, returned to scroing_step by @nvictus in call-dots.
    Testing on "big"-data yielded somewhat familiar multiprocessing/pickle error:
multiprocess.pool.MaybeEncodingError: Error sending result: '[ 
...
[dump of ~500-1,000 pd.DataFrame-s, ~500,000 rows by 30 columns each ]
...
Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'
  1. For the same input parameters it breaks here: https://github.com/mirnylab/cooltools/blob/441a84ab6c1efd3bcd29de6cfd6ee78551873478/cooltools/cli/call_dots.py#L302 , but not here: https://github.com/mirnylab/cooltools/blob/441a84ab6c1efd3bcd29de6cfd6ee78551873478/cooltools/cli/call_dots.py#L404 , as these objects are slices of histograms and are way smaller than 500,000X30
  2. @Phlya observed same or similar issue even for https://github.com/mirnylab/cooltools/blob/441a84ab6c1efd3bcd29de6cfd6ee78551873478/cooltools/cli/call_dots.py#L404 or https://github.com/mirnylab/cooltools/blob/441a84ab6c1efd3bcd29de6cfd6ee78551873478/cooltools/cli/call_dots.py#L482 while running "modern" call-dots instance that didn't use @nvictus -'s scoring_step. I could not find a corresponding issue anywhere.
  3. apparently pickle is calculating total number of elements - looks like it does columns*rows by the number of dataframes, otherwise math does not work out (>=2147483647 ). Is it indeed the case @nvictus @mimakaev @golobor ? what if it were to be a bunch of string-s of total length >2bln ?! https://stackoverflow.com/questions/47776486/python-struct-error-i-format-requires-2147483648-number-2147483647 - indeed says something about calculating elements in each objects ...
  4. I'll work around this BS for testing etc purposes but we should address it eventually - dask ? or at least multipro-something that is using dill - @nvictus ?
  5. Also, looks like it was finally fixed bpo-17560: Too small type for struct.pack/unpack in mutliprocessing.Connection python/cpython#10305 ?! can anyone more knowledgeable confirm @nvictus @mimakaev @golobor ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions