Skip to content

[Question] Threading error after last training epoch #201

@NMVRodrigues

Description

@NMVRodrigues

Hi,
I'm having an issue where after the last training epoch ends, when the validation set should be evaluated, a threading error caused by batchgenerators occurs.
I'm using the provided Docker image, with the predefined env_det_num_threads=6 and OMP_NUM_THREADS=1. ( I have additionally tried making a container with env_det_num_threads=1 to see if the problem was related to this flag, but the problem persisted).
To check if the problem was from our dataset I also tried this on the example dataset 000, and exact same problem happened.

Following is a set of traces of the problems that arise. It's always a thread error/exception. Looking at it, it feels like the batchgenerators class is having problems closing some threads?

Any help would be greatly appreciated :)
Best regards,
Nuno

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the print"
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception ignored in: <function MultiThreadedAugmenter.__del__ at 0x7f3efd1a5ca0>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 294, in __del__
    self._finish()
  File "/opt/conda/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 276, in _finish
    self._queues[i].close()
  File "/opt/conda/lib/python3.8/multiprocessing/queues.py", line 137, in close
    self._reader.close()
  File "/opt/conda/lib/python3.8/multiprocessing/connection.py", line 177, in close
    self._close()
  File "/opt/conda/lib/python3.8/multiprocessing/connection.py", line 361, in _close
    _close(self._handle)
OSError: [Errno 9] Bad file descriptor
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.8/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 92, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the print"
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions