It would be interesting to have imap_unordered work on partial results from a generator task function.
For instance:
def read_file_by_chunks(large_file):
for batch in pd.read_json(large_file, orient="records", lines=True, chunksize=max_chunk_size):
# do some processing
yield batch
And be able to call a worker pool on a list of paths:
with WorkerPool(n_jobs=n_workers, start_method="spawn") as pool:
yield from pool.imap_unordered(read_file_by_chunks, all_paths)
A side effect of this idea is it also require to have a bounded queue in the number of partial task results to accumulate in imap before blocking the producers.
It would be interesting to have imap_unordered work on partial results from a generator task function.
For instance:
And be able to call a worker pool on a list of paths:
A side effect of this idea is it also require to have a bounded queue in the number of partial task results to accumulate in imap before blocking the producers.