When you use conv2d from PyTorch with parallel_save it hangs. This was first seen in CI for PR #42 - not because of that PR, but because I fixed the tests, which in the "parallel" test case for conv2d was calling the save method not parallel_save - so this exposed what was a bug with the original implementation.
It's worth noting that the tests pass on macOS, but fail on Linux. If we interrupt it we see the following error:
This process (pid=366448) is multi-threaded, use of fork() may lead to deadlocks in the child.
This is then consistent with the hang that we see. My assumption is that under the hood PyTorch is using parallelism too, and it's a known issue that you shouldn't have children be parents in multiprocessing.
I'll revert the "fix" to the tests in PR #42, and this bug is then that we have to solve this somehow.
- Change the spawn mode for multiprocessing?
- Use PyTorch multiprocessing?
- Detect if conv2d is used in an expression, and if so revert to non-parallel save/sum?
When you use conv2d from PyTorch with
parallel_saveit hangs. This was first seen in CI for PR #42 - not because of that PR, but because I fixed the tests, which in the "parallel" test case for conv2d was calling thesavemethod notparallel_save- so this exposed what was a bug with the original implementation.It's worth noting that the tests pass on macOS, but fail on Linux. If we interrupt it we see the following error:
This is then consistent with the hang that we see. My assumption is that under the hood PyTorch is using parallelism too, and it's a known issue that you shouldn't have children be parents in multiprocessing.
I'll revert the "fix" to the tests in PR #42, and this bug is then that we have to solve this somehow.