This repository was archived by the owner on May 28, 2024. It is now read-only.
Description
ERROR LOG (first epoch)
[1210 18:09:10 @param.py:158] [HyperParamSetter] At global_step=0, learning_rate is set to 0.001000
[1210 18:09:11 @prof.py:294] [HostMemoryTracker] Free RAM in before_train() is 238.12 GB.
[1210 18:09:11 @stac_helper.py:83] ----------------------------------------------------------------------------------------------------
[1210 18:09:11 @stac_helper.py:84] Model save path: result/VOC2007/instances_trainval
[1210 18:09:11 @stac_helper.py:85] ----------------------------------------------------------------------------------------------------
[1210 18:09:11 @eval.py:313] [EvalCallback] Will evaluate every 20 epochs
[1210 18:09:28 @base.py:273] Start Epoch 1 ...
0%| |0/500[00:00<?,?it/s]2021-12-10 18:09:43.544891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2021-12-10 18:10:23.596973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
0%| |0/500[02:46<?,?it/s]
2021-12-10 18:12:16.766932: W tensorflow/core/kernels/queue_base.cc:277] _0_QueueInput/input_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DeadlineExceededError: Timed out waiting for notification
Environment Information:
sys.platform linux
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Tensorpack v0.9.8-61-g4ac2e22b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /usr/lib64/libnvidia-ml.so.460.73.01
CUDA /mnt/lustre/share/cuda-10.0/lib64/libcudart.so.10.0.130
CUDNN /mnt/lustre/share/cuda-10.0/lib64/libcudnn.so.7.4.1
NCCL
CUDA_VISIBLE_DEVICES 1,2,3,4
GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB
Free RAM 344.40/376.39 GB
CPU Count 48
cv2 4.1.1
msgpack 1.0.3
python-prctl False
Reactions are currently unavailable
ERROR LOG (first epoch)
[1210 18:09:10 @param.py:158] [HyperParamSetter] At global_step=0, learning_rate is set to 0.001000
[1210 18:09:11 @prof.py:294] [HostMemoryTracker] Free RAM in before_train() is 238.12 GB.
[1210 18:09:11 @stac_helper.py:83] ----------------------------------------------------------------------------------------------------
[1210 18:09:11 @stac_helper.py:84] Model save path: result/VOC2007/instances_trainval
[1210 18:09:11 @stac_helper.py:85] ----------------------------------------------------------------------------------------------------
[1210 18:09:11 @eval.py:313] [EvalCallback] Will evaluate every 20 epochs
[1210 18:09:28 @base.py:273] Start Epoch 1 ...
0%| |0/500[00:00<?,?it/s]2021-12-10 18:09:43.544891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2021-12-10 18:10:23.596973: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
0%| |0/500[02:46<?,?it/s]
2021-12-10 18:12:16.766932: W tensorflow/core/kernels/queue_base.cc:277] _0_QueueInput/input_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/mnt/lustre/liangzhixuan/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.DeadlineExceededError: Timed out waiting for notification
Environment Information:
sys.platform linux
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Tensorpack v0.9.8-61-g4ac2e22b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /usr/lib64/libnvidia-ml.so.460.73.01
CUDA /mnt/lustre/share/cuda-10.0/lib64/libcudart.so.10.0.130
CUDNN /mnt/lustre/share/cuda-10.0/lib64/libcudnn.so.7.4.1
NCCL
CUDA_VISIBLE_DEVICES 1,2,3,4
GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB
Free RAM 344.40/376.39 GB
CPU Count 48
cv2 4.1.1
msgpack 1.0.3
python-prctl False