Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

problems to run niftynet #446

Open
yuniorcf opened this issue Oct 1, 2019 · 7 comments
Open

problems to run niftynet #446

yuniorcf opened this issue Oct 1, 2019 · 7 comments

Comments

@yuniorcf
Copy link

yuniorcf commented Oct 1, 2019

Hello,
I am trying to test NiftyNet for the first time but I am unable to do it.
I have configured the instalation according to this site (source code repository): https://niftynet.readthedocs.io/en/dev/installation.html
I have sicessfuly downloaded the model, however, once I execute te command "python net_segment.py inference -c ~/niftynet/extensions/dense_vnet_abdominal_ct/config.ini" I get the follwing errors:
....
-> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Initialising Dataset from 1 subjects...
2019-10-01 13:53:56.311601: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-01 13:53:56.312103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725
pciBusID: 0000:01:00.0
2019-10-01 13:53:56.312156: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-10-01 13:53:56.312167: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-10-01 13:53:56.312177: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-10-01 13:53:56.312186: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-10-01 13:53:56.312195: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-10-01 13:53:56.312205: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-10-01 13:53:56.312215: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-01 13:53:56.312256: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-01 13:53:56.312735: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-01 13:53:56.313199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-10-01 13:53:56.313229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-01 13:53:56.313233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-10-01 13:53:56.313240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-10-01 13:53:56.313345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-01 13:53:56.313816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-01 13:53:56.314271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6821 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Restoring parameters from /home/yunior/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000
2019-10-01 13:53:56.630423: W tensorflow/core/common_runtime/colocation_graph.cc:1016] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
IteratorGetNext: CPU GPU XLA_CPU XLA_GPU
OneShotIterator: CPU
IteratorToStringHandle: CPU GPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any:
worker_0/validation/OneShotIterator (OneShotIterator) /device:GPU:0
worker_0/validation/IteratorToStringHandle (IteratorToStringHandle) /device:GPU:0
worker_0/validation/IteratorGetNext (IteratorGetNext) /device:GPU:0

2019-10-01 13:53:57.360882: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-10-01 13:53:57.991115: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-01 13:53:57.998596: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-01 13:53:58.001047: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-01 13:53:58.001075: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads
......
my configuration is as follows
CPU conf.
intel I7 (8 cores) and 64GB RAM
GPU conf.
GeForce RTX 2070, 8GB, 2304 cores

In addition I have installed the gpu-version of tensorflow to use de GPU por calculations
I can imaging that errors are related to memory issues in the GPU. I wonder whether is there a way to use the memory on the CPU as well.

Could you please give me a feedback. Note I am not an expert using python

thanks in advance

@danieltudosiu
Copy link
Contributor

As per Tensorflow issue #24496 it seems to be a tensorflow problem.

Could you please try and run this tensorflow example and let us know if the same error appears there.

@yuniorcf
Copy link
Author

yuniorcf commented Oct 2, 2019

Thank you very much for the reply.
I have tried this and I got no errors at all. The script did the predictions and this are the final message:
...
Test accuracy: 0.8813
(28, 28)
(1, 28, 28)
[[3.5098125e-04 1.3001217e-15 9.9916017e-01 4.8920496e-11 4.2484555e-04
5.2356322e-12 6.4001571e-05 5.9205704e-17 5.7315066e-11 2.8146843e-15]]

I have an additional comment that might help to figure out the problem with NiftyNet.
I faced problems with tf at the beguining. The thing is that I have 1.14.0 version of tf and apparently NiftyNet have troubles with this version. As a simple solution the program suggested to use tf.compat.v1.Session in several subscripts of the software. Therefore I used:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

instead of
import tensorflow as tf

Then errors with tensorflow session were fixed
Could it be the source of the current problem?

Thank you in advance

@yuniorcf
Copy link
Author

yuniorcf commented Oct 4, 2019

Hi, i did some progress, i think.
I have upgraded nvidia drivers and cuda toolkit. At leas I do not see the previous errors anymore.
Now I have nvidia-418, cuda-10.1 and tf 1.14.
However I have a new error (see below)
......
Traceback (most recent call last):
File "net_segment.py", line 5, in
from niftynet import main
File "/home/yunior/NiftyNet/niftynet/init.py", line 62, in
import niftynet.utilities.user_parameters_parser as user_parameters_parser
File "/home/yunior/NiftyNet/niftynet/utilities/user_parameters_parser.py", line 22, in
from niftynet.utilities.user_parameters_default import
File "/home/yunior/NiftyNet/niftynet/utilities/user_parameters_default.py", line 10, in
from niftynet.engine.image_window_dataset import SMALLER_FINAL_BATCH_MODE
File "/home/yunior/NiftyNet/niftynet/engine/image_window_dataset.py", line 18, in
from niftynet.layer.base_layer import Layer
File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 11, in
from niftynet.engine.application_variables import RESTORABLE
File "/home/yunior/NiftyNet/niftynet/engine/application_variables.py", line 10, in
from tensorflow.contrib.framework import list_variables
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/init.py", line 37, in
from tensorflow.contrib import cudnn_rnn
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/init.py", line 38, in
from tensorflow.contrib.cudnn_rnn.python.layers import *
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/init.py", line 23, in
from tensorflow.contrib.cudnn_rnn.python.layers.cudnn_rnn import *
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 20, in
from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 22, in
from tensorflow.contrib.rnn.python.ops import lstm_ops
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/rnn/init.py", line 91, in
from tensorflow.contrib.rnn.python.ops.lstm_ops import *
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py", line 298, in
@ops.RegisterGradient("BlockLSTM")
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 2489, in call
_gradient_registry.register(f, self._op_type)
File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow_core/python/framework/registry.py", line 61, in register
(self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'BlockLSTM'! (Previous registration was in register /home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow_core/python/framework/registry.py:66)"

Please, could anybody suggest a tentative solution?
Thanks

@yuniorcf
Copy link
Author

yuniorcf commented Oct 9, 2019

Hi guys, I really need NiftyNet running in my PC. However after more than a week I am not able to do it. Could somebody guiveme a feedback please?
I have been trying to run the example posted here with no success. I have tried several configuration of nvidia drivers, cuda versions, cudnn and tensorflow but no progress at all.
I currently have Ubuntu 18.04, Nvidia 4.18, cuda 10.0, cudnn 7.3.0. I see the following messages in the terminal when executed the program.

NiftyNet version 0.5.0+185.gb5f3ba1e.dirty
[CUSTOM]
-- num_classes: 9
-- output_prob: False
-- label_normalisation: False
-- softmax: True
-- min_sampling_ratio: 0
-- compulsory_labels: (0, 1)
-- rand_samples: 0
-- min_numb_labels: 1
-- proba_connect: True
-- evaluation_units: foreground
-- do_mixup: False
-- mixup_alpha: 0.2
-- mix_match: False
-- weight: ()
-- inferred: ()
-- sampler: ()
-- label: ('label',)
-- image: ('ct',)
-- name: net_segment
[CONFIG_FILE]
-- path: /home/yunior/niftynet/extensions/dense_vnet_abdominal_ct/config.ini
[CT]
-- csv_file:
-- path_to_search: ./data/dense_vnet_abdominal_ct/
-- filename_contains: ('CT',)
-- filename_not_contains: ()
-- filename_removefromid:
-- interp_order: 1
-- loader: None
-- pixdim: ()
-- axcodes: ('A', 'R', 'S')
-- spatial_window_size: (144, 144, 144)
[LABEL]
-- csv_file:
-- path_to_search: ./data/dense_vnet_abdominal_ct/
-- filename_contains: ('Label',)
-- filename_not_contains: ()
-- filename_removefromid:
-- interp_order: 0
-- loader: None
-- pixdim: ()
-- axcodes: ('A', 'R', 'S')
-- spatial_window_size: (144, 144, 144)
[SYSTEM]
-- cuda_devices: 0
-- num_threads: 1
-- num_gpus: 1
-- model_dir: /home/yunior/niftynet/models/dense_vnet_abdominal_ct
-- dataset_split_file: ./dataset_split.csv
-- event_handler: ('model_saver', 'model_restorer', 'sampler_threading', 'apply_gradients', 'output_interpreter', 'console_logger', 'tensorboard_logger', 'performance_logger')
-- iteration_generator: iteration_generator
-- queue_length: 36
-- action: inference
[NETWORK]
-- name: dense_vnet
-- activation_function: relu
-- batch_size: 1
-- smaller_final_batch_mode: pad
-- decay: 0.0
-- reg_type: L2
-- volume_padding_size: (0, 0, 0)
-- volume_padding_mode: minimum
-- volume_padding_to_size: (0,)
-- window_sampling: resize
-- force_output_identity_resizing: False
-- queue_length: 5
-- multimod_foreground_type: and
-- histogram_ref_file: ./histogram_ref_file.txt
-- norm_type: percentile
-- cutoff: (0.01, 0.99)
-- foreground_type: otsu_plus
-- normalisation: False
-- rgb_normalisation: False
-- whitening: False
-- normalise_foreground_only: False
-- weight_initializer: he_normal
-- bias_initializer: zeros
-- keep_prob: 1.0
-- weight_initializer_args: {}
-- bias_initializer_args: {}
[TRAINING]
-- optimiser: adam
-- sample_per_volume: 1
-- rotation_angle: ()
-- rotation_angle_x: ()
-- rotation_angle_y: ()
-- rotation_angle_z: ()
-- scaling_percentage: ()
-- isotropic_scaling: False
-- antialiasing: True
-- bias_field_range: ()
-- bf_order: 3
-- random_flipping_axes: -1
-- do_elastic_deformation: False
-- num_ctrl_points: 4
-- deformation_sigma: 15
-- proportion_to_deform: 0.5
-- lr: 0.001
-- loss_type: dense_vnet_abdominal_ct.dice_hinge.dice
-- starting_iter: 0
-- save_every_n: 1000
-- tensorboard_every_n: 20
-- max_iter: 3001
-- max_checkpoints: 100
-- validation_every_n: -1
-- validation_max_iter: 1
-- exclude_fraction_for_validation: 0.0
-- exclude_fraction_for_inference: 0.0
-- vars_to_restore:
-- vars_to_freeze:
-- patience: 100
-- early_stopping_mode: mean
[INFERENCE]
-- spatial_window_size: (144, 144, 144)
-- inference_iter: 3000
-- dataset_to_infer:
-- save_seg_dir: ./segmentation_output/
-- output_postfix: _niftynet_out
-- output_interp_order: 0
-- border: (0, 0, 0)
-- fill_constant: 0.0
INFO:niftynet: set CUDA_VISIBLE_DEVICES to 0
INFO:niftynet: starting segmentation application
INFO:niftynet: csv_file = not found, writing to "/home/yunior/niftynet/models/dense_vnet_abdominal_ct/ct.csv" instead.
INFO:niftynet: [ct] search file folders, writing csv file /home/yunior/niftynet/models/dense_vnet_abdominal_ct/ct.csv
INFO:niftynet: csv_file = not found, writing to "/home/yunior/niftynet/models/dense_vnet_abdominal_ct/label.csv" instead.
INFO:niftynet: [label] search file folders, writing csv file /home/yunior/niftynet/models/dense_vnet_abdominal_ct/label.csv
INFO:niftynet:

Number of subjects 1, input section names: ['subject_id', 'ct', 'label']
-- using all subjects (without data partitioning).

INFO:niftynet: Image reader: loading 1 subjects from sections ('ct',) as input [image]
2019-10-09 13:38:12.946391: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-09 13:38:12.949613: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-10-09 13:38:12.949963: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fa8e60d9d0 executing computations on platform Host. Devices:
2019-10-09 13:38:12.949975: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-10-09 13:38:13.047755: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-09 13:38:13.048308: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fa8d4b5a40 executing computations on platform CUDA. Devices:
2019-10-09 13:38:13.048320: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2019-10-09 13:38:13.048406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 7.21GiB
2019-10-09 13:38:13.048414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-09 13:38:13.049017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-09 13:38:13.049024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-10-09 13:38:13.049027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-10-09 13:38:13.049085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7012 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: reading size of preprocessed images
INFO:niftynet: initialised resize sampler {'image': (1, 144, 144, 144, 1, 1), 'image_location': (1, 7)}
INFO:niftynet: using DenseVNet
2019-10-09 13:38:13.056568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-09 13:38:13.056584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-09 13:38:13.056588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-10-09 13:38:13.056591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-10-09 13:38:13.056641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7012 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Initialising Dataset from 1 subjects...
2019-10-09 13:38:14.395612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-09 13:38:14.395650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-09 13:38:14.395654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-10-09 13:38:14.395657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-10-09 13:38:14.395743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7012 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Restoring parameters from /home/yunior/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000
2019-10-09 13:38:15.997092: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2019-10-09 13:38:15.998516: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-09 13:38:16.000078: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-09 13:38:16.000099: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support

INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads
Traceback (most recent call last):
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node worker_0/DenseVNet/conv_bn/conv
/conv}}]]
[[{{node worker_0/post_processing/ExpandDims}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "net_segment.py", line 8, in
sys.exit(main())
File "/home/yunior/NiftyNet/niftynet/init.py", line 148, in main
app_driver.run(app_driver.app)
File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 206, in run
loop_status=loop_status)
File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 332, in loop
ApplicationDriver.loop_step(application, iter_msg)
File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 364, in loop_step
feed_dict=iteration_message.data_feed_dict)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node worker_0/DenseVNet/conv_bn/conv
/conv (defined at /home/yunior/NiftyNet/niftynet/layer/convolution.py:100) ]]
[[node worker_0/post_processing/ExpandDims (defined at /home/yunior/NiftyNet/niftynet/layer/post_processing.py:36) ]]

Caused by op 'worker_0/DenseVNet/conv_bn/conv_/conv', defined at:
File "net_segment.py", line 8, in
sys.exit(main())
File "/home/yunior/NiftyNet/niftynet/init.py", line 148, in main
app_driver.run(app_driver.app)
File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 190, in run
is_training_action=self.is_training_action)
File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 271, in create_graph
outputs_collector, gradients_collector)
File "/home/yunior/NiftyNet/niftynet/application/segmentation_application.py", line 458, in connect_data_and_network
net_out = self.net(image, **net_args)
File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 35, in call
return self._op(*args, **kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 360, in call
return self._call_func(args, kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/yunior/NiftyNet/niftynet/network/dense_vnet.py", line 233, in layer_op
input_tensor, is_training=is_training)
File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 35, in call
return self._op(*args, **kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 360, in call
return self._call_func(args, kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/yunior/NiftyNet/niftynet/layer/convolution.py", line 254, in layer_op
output_tensor = activation(conv_layer(input_tensor))
File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 35, in call
return self._op(*args, **kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 360, in call
return self._call_func(args, kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/yunior/NiftyNet/niftynet/layer/convolution.py", line 100, in layer_op
name='conv')
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in call
return self.conv_op(inp, filter)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in call
return self.call(inp, filter)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in call
name=self.name)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1440, in conv3d
dilations=dilations, name=name)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node worker_0/DenseVNet/conv_bn/conv_/conv (defined at /home/yunior/NiftyNet/niftynet/layer/convolution.py:100) ]]
[[node worker_0/post_processing/ExpandDims (defined at /home/yunior/NiftyNet/niftynet/layer/post_processing.py:36) ]]

I would like to add that when execute the program with no gpu compatibility, te software works but slowly.

Thank you in advance

@danieltudosiu
Copy link
Contributor

I have never encountered your problem. Also, it seems to be Tensorflow & CUDA related more then NiftyNet related, which is also referenced by the fact that it works on CPU but not on GPU.

Could you please modify the following line in util_common.py:

       def tf_config():
           """
           tensorflow system configurations
           """
             config = tf.ConfigProto()
             config.log_device_placement = False
             config.allow_soft_placement = True
             return config

with

       def tf_config():
           """
           tensorflow system configurations
           """
             config = tf.ConfigProto()
             config.log_device_placement = False
             config.allow_soft_placement = True
             config.gpu_options.allow_growth = True
             return config

@talmazov
Copy link

talmazov commented Dec 8, 2019

i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16
using geforce RTX 2060 6GB vram with nvidia driver 440.33.01
tensorflow tries to allocate 5 GB
spatial_window_size = (64, 64, 512) with dense_vnet network

i've tried config.gpu_options.allow_growth = True but it doesn't seem to work.
I get the same "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"

any solution so far?
I am not sure if legacy drivers will work better, maybe the v390 nvidia driver is compatible?
I wonder if this memcpy and CUDNN internal error is related to the newer drivers/cards
I bought a GTX 1080 Ti w/ 11GB ram, will see if this one supports niftynet

@yuniorcf
Copy link
Author

yuniorcf commented Dec 10, 2019 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants