Cuda 3.0? #25

infojunkie · 2015-11-09T18:40:08Z

Are there plans to support Cuda compute capability 3.0?

zheng-xq · 2015-11-09T18:58:55Z

Officially, Cuda compute capability 3.5 and 5.2 are supported. You can try to enable other compute capability by modifying the build script:

https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc#L236

infojunkie · 2015-11-09T19:02:03Z

Thanks! Will try it and report here.

zheng-xq · 2015-11-09T21:03:00Z

This is not officially supported yet. But if you want to enable Cuda 3.0 locally, here are the additional places to change:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/gpu/gpu_device.cc#L610
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/gpu/gpu_device.cc#L629
Where the smaller GPU device is ignored.

The official support will eventually come in a different form, where we make sure the fix works on all different computational environment.

infojunkie · 2015-11-09T23:28:06Z

I made the changes to the lines above, and was able to compile and run the basic example on the Getting Started page: http://tensorflow.org/get_started/os_setup.md#try_your_first_tensorflow_program - it did not complain about gpu, but it didn't report using the gpu either.

How can I help with next steps?

zheng-xq · 2015-11-10T00:38:16Z

infojunkie@, could you post your step and upload the log?

If you were following this example:

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

If you see the following line, the GPU logic device is being created:

Creating TensorFlow device (/gpu:0) -> (device: ..., name: ..., pci bus id: ...)

If you want to be absolutely sure GPU was used, set CUDA_PROFILE=1 and enable Cuda profiler. If the Cuda profiler logs were generated, it was a sure sign GPU was used.

http://docs.nvidia.com/cuda/profiler-users-guide/#command-line-profiler-control

infojunkie · 2015-11-10T00:42:34Z

I got the following log:

I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.967
pciBusID 0000:02:00.0
Total memory: 2.00GiB
Free memory: 896.49MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 730324992
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8

I guess it means the GPU was found and used. I can try the CUDA profiler if you think it's useful.

udibr · 2015-11-10T02:52:11Z

Please prioritize this issue. It is blocking gpu usage on both OSX and AWS's K520 and for many people this is the only environments available.
Thanks!

graphific · 2015-11-10T14:55:02Z

Not the nicest fix, but just comment out the the cuda compute version check at gpu_device.c line 610 to 616, recompile, and amazon g2 GPU acceleration seems to works fine:

infojunkie · 2015-11-10T18:39:54Z

For reference, here's my very primitive patch to work with Cuda 3.0: https://gist.github.com/infojunkie/cb6d1a4e8bf674c6e38e

markusdr · 2015-11-11T01:42:51Z

@infojunkie I applied your fix, but I got lots of nan's in the computation output:

$ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
000006/000003 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000004/000003 lambda = 2.000027 x = [79795.101562 -39896.468750] y = [159592.375000 -79795.101562]
000005/000006 lambda = 2.000054 x = [39896.468750 -19947.152344] y = [79795.101562 -39896.468750]
000001/000007 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000002/000003 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000009/000008 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000004/000004 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000001/000005 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000006/000007 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000003/000006 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]
000006/000006 lambda =     -nan x = [0.000000 0.000000] y = [0.000000 0.000000]

zheng-xq · 2015-11-11T04:27:24Z

@markusdr, this is very strange. Could you post the completely steps you build the binary?

Could what GPU and OS are you running with? Are you using Cuda 7.0 and Cudnn 6.5 V2?

avostryakov · 2015-11-11T11:33:43Z

Just +1 to fix this problem on AWS as soon as possible. We don't have any other GPU cards for our research.

allanzelener · 2015-11-11T15:33:56Z

Hi, not sure if this is a separate issue but I'm trying to build with a CUDA 3.0 GPU (Geforce 660 Ti) and am getting many errors with --config=cuda. See the attached file below. It seems unrelated to the recommended changes above. I've noticed that it tries to compile a temporary compute_52.cpp1.ii file which would be the wrong version for my GPU.

I'm on Ubuntu 15.10. I modified the host_config.h in the Cuda includes to remove the version check on gcc. I'm using Cuda 7.0 and cuDNN 6.5 v2 as recommended, although I have newer versions installed as well.

cuda_build_fail.txt

markusdr · 2015-11-11T17:28:33Z

Yes, I was using Cuda 7.0 and Cudnn 6.5 on an EC2 g2.2xlarge instance with this AIM:
cuda_7 - ami-12fd8178
ubuntu 14.04, gcc 4.8, cuda 7.0, atlas, and opencv.
To build, I followed the instructions on tensorflow.org.

vsrikarunyan · 2015-11-11T17:49:18Z

It looks like we are seeing an API incompatibility between Compute Capability v3 and Compute Capability v3.5; post infojunkie's patch fix, I stumped onto this issue

I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro K2100M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
F tensorflow/stream_executor/cuda/cuda_blas.cc:229] Check failed: f != nullptr could not find cublasCreate_v2 in cuBLAS DSO; dlerror: bazel-bin/tensorflow/cc/tutorials_example_trainer: undefined symbol: cublasCreate_v2

I run on Ubuntu 15.04, gcc 4.9.2, CUDA Toolkit 7.5, cuDNN 6.5;

+1 for having Compute Capability v3 Support

graphific · 2015-11-11T18:02:03Z

is cublas installed? and where does it link to
ls -lah /usr/local/cuda/lib64/libcublas.so ?

zheng-xq · 2015-11-11T18:32:47Z

@allanzelener, what OS and GCC versions do you have? Your errors seem to come from incompatible C++ compilers.

It is recommended to use Ubuntu 14.04 and GCC 4.8 with TensorFlow.

zheng-xq · 2015-11-11T19:33:27Z

@vsrikarunyan, it is better to use CUDA Toolkit 7.0, as recommended. You can install an older CUDA Toolkit along with your newer toolkit. Just point TensorFlow "configure" and maybe LD_LIBRARY_PATH to the CUDA 7.0 when you run TensorFlow.

zheng-xq · 2015-11-11T19:40:53Z

@avostryakov, @infojunkie's early patch should work on AWS.

https://gist.github.com/infojunkie/cb6d1a4e8bf674c6e38e

An official patch is working its way through the pipeline. It would expose a configuration option to let you choose your compute target. But underneath, it does similar changes. I've tried it on AWS g2, and find out once things would work, after I completely uninstall NVIDIA driver, and reinstall the latest GPU driver from NVIDIA.

Once again, the recommended setting on AWS at this point is the following.
Ubuntu 14.04, GCC 4.8, CUDA Toolkit 7.0 and CUDNN 6.5. For the last two, it is okay to install them without affecting your existing installation of other versions. Also the official recommended versions for the last two might change soon as well.

jbencook · 2015-11-11T20:19:28Z

I applied the same patch on a g2.2xlarge instance and got the same result as @markusdr... a bunch of nan's.

allanzelener · 2015-11-11T22:57:58Z

@zheng-xq Yes, I'm on Ubuntu 15.10 and I was using GCC 5.2.1. The issue was the compiler. I couldn't figure out how to change the compiler with bazel but simply installing gcc-4.8 and using update-alternatives to change the symlinks in usr/bin seems to have worked. (More info: http://askubuntu.com/questions/26498/choose-gcc-and-g-version). Thanks for the help, I'll report back if I experience any further issues.

nbenhaim · 2015-11-12T05:24:11Z

I did get this to work on a g2.2xlarge instance and got the training example to run, and verified that the gpu was active using the nvidia-smi tool , but when running mnist's convolutional.py , it ran out of memory. I suspect this just has to do with the batch size and the fact that the aws gpus don't have a lot of memory, but just wanted to throw that out there to make sure it sounds correct. To clarify, I ran the following, and it ran for like 15 minutes , and then ran out of memory.

python tensorflow/models/image/mnist/convolutional.py

anjishnu · 2015-11-12T06:07:48Z

@nbenhaim, just what did you have to do to get it to work?

zheng-xq · 2015-11-12T06:50:27Z

@markusdr, @jbencook, the NAN is quite troubling. I ran the same thing myself, and didn't have any problem.

If you use the recommended software setting: Ubuntu 14.04, GCC 4.8, Cuda 7.0 and Cudnn 6.5, then my next guess is the Cuda driver. Could you uninstall and reinstall the latest Cuda driver.

This is the sequence I tried on AWS, your mileage may vary:

sudo apt-get remove --purge "nvidia*"
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/352.55/NVIDIA-Linux-x86_64-352.55.run
sudo ./NVIDIA-Linux-x86_64-352.55.run --accept-license --no-x-check --no-recursion

jbencook · 2015-11-12T11:08:00Z

Thanks for following up @zheng-xq - I'll give that a shot today.

mjwillson · 2015-11-12T17:17:39Z

Another +1 for supporting pre-3.5 GPUs, as someone else whose only realistic option for training on real data is AWS GPU instances.

Even for local testing, turns out my (recent, developer) laptop's GPU doesn't support 3.5 :-(

Fixing issues tensorflow#23 and tensorflow#25

wingdi · 2017-08-23T04:28:30Z

I have the same problem ："Ignoring gpu device (device:0,name:GeForce GT 635M, pci bus id) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0 ." . @smtabatabaie @martinwicke @alphaJatin. help !!!!

martinwicke · 2017-08-23T15:50:19Z

Compute capability 2.1 is too low to run TensorFlow. You'll need a newer (or more powerful) graphics card to run TensorFlow on a GPU.

fix xla

mengxingxinqing · 2018-08-08T18:06:01Z

The url of answer to the question is invalid. Can you update it?

gunan · 2018-08-08T19:55:38Z

For nightly pip packages, recommended way to install is to use pip install tf-nightly command.
ci.tensorflow.org is deprecated.

Tf fixes

fix hcc linking error caused by __fdividef

Feature/examples

vrv mentioned this issue Nov 9, 2015

minimum req: Cuda compute capability 3.5 #29

Closed

keveman added the cuda label Nov 9, 2015

vrv mentioned this issue Nov 11, 2015

EC2 g2.2xlarge: Ignoring gpu device (GRID K520) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5. #112

Closed

vrv mentioned this issue Nov 11, 2015

TF not compatible with AWS GPU instances? #142

Closed

hexujun mentioned this issue Apr 10, 2017

android libtensorflow_inference.so run crash (signal 6 (SIGABRT)) #9096

Closed

tarasglek pushed a commit to tarasglek/tensorflow that referenced this issue Jun 20, 2017

fixing issues tensorflow#23 and tensorflow#25

55a34ae

tarasglek pushed a commit to tarasglek/tensorflow that referenced this issue Jun 20, 2017

merged changes from tensorflow#25

a472ac9

tarasglek pushed a commit to tarasglek/tensorflow that referenced this issue Jun 20, 2017

Merge pull request tensorflow#26 from daviddao/master

5481748

Fixing issues tensorflow#23 and tensorflow#25

zhangdingfei mentioned this issue Jun 21, 2017

tensorflow-1.2.0 import tensorflow Segmentation fault #10870

Closed

jakiechris mentioned this issue Sep 4, 2017

protobuf crashes at runtime when loading tensor lib. #12794

Closed

zhangbo5891001 mentioned this issue Nov 29, 2017

[BUG]Out-of-Bounds Read in DecodeBmpOp class(tensorflow/core/kernels/decode_bmp_op.cc) #14959

Closed

tharindu-mathew mentioned this issue Mar 5, 2018

Minimum Cuda capability is 3.5? But, 3.0 stated on site #17445

Closed

whchung referenced this issue in ROCm/tensorflow-upstream Apr 10, 2018

Merge pull request #25 from 7SK/XLA_fix

e6b0c7a

fix xla

mikefairbank mentioned this issue Jun 18, 2018

NaN appearing on tf.gradients calculation with tf.where and division by zero on the false branch #20091

Closed

ychen404 mentioned this issue Aug 19, 2018

Not able to port a 6-layered mobilenet tflite model to mobile #21368

Closed

chenjiasheng mentioned this issue Dec 12, 2018

Distributed Training Randomly Stops During the Training Process #12667

Closed

lorenzoriano mentioned this issue Jan 11, 2019

BUS Error, likely with blas #24844

Closed

eggonlea pushed a commit to eggonlea/tensorflow that referenced this issue Mar 12, 2019

Merge pull request tensorflow#25 from lissyx/tf-fixes

7743ea5

Tf fixes

isra60 mentioned this issue Mar 25, 2019

Segmentation Fault with TensorRT create interference graph #27100

Closed

dkashkin mentioned this issue Apr 25, 2019

TFLite Interpreter fails to load quantized model on Android (stock ssd_mobilenet_v2) #28163

Closed

chengdianxuezi mentioned this issue Nov 1, 2019

Bug: tensorflow-gpu takes long time before beginning to compute #18652

Closed

yanceyblog mentioned this issue Nov 28, 2019

armeabi-v7a libtensorflowlite_jni.so：signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xeef5445f #34669

Closed

cjolivier01 pushed a commit to Cerebras/tensorflow that referenced this issue Dec 6, 2019

Merge pull request tensorflow#25 from ROCmSoftwarePlatform/fix_fdividef

5dafc62

fix hcc linking error caused by __fdividef

keithm-xmos referenced this issue in xmos/tensorflow Feb 1, 2021

Merge pull request #25 from xmos/feature/examples

f8bc01c

Feature/examples

dinkdeep mentioned this issue Apr 7, 2021

Segmentation fault in tf-opt while running a tf dialect mlir file #48365

Open

Liang-yc mentioned this issue May 17, 2021

麻烦大佬帮我看看日志, 我训练的时候是否启用了GPU在跑?? Liang-yc/ssq#25

Open

rsanthanam-amd mentioned this issue Jul 1, 2021

[ROCm] This change replaces the original assert for detecting multiple #49232

Closed

ivankxt mentioned this issue Jun 12, 2023

Get deadlock after Predict(cuda10.0, cudnn7.6.5, Tesla T4 GPU) #60841

Closed

lyz1005 mentioned this issue Oct 26, 2023

Interpreter run crash #62240

Closed

spacycoder mentioned this issue Dec 11, 2023

Why does my full integer quantized tflite model crash when loaded? #62618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda 3.0? #25

Cuda 3.0? #25

infojunkie commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

infojunkie commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

infojunkie commented Nov 9, 2015

zheng-xq commented Nov 10, 2015

infojunkie commented Nov 10, 2015

udibr commented Nov 10, 2015

graphific commented Nov 10, 2015

infojunkie commented Nov 10, 2015

markusdr commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

avostryakov commented Nov 11, 2015

allanzelener commented Nov 11, 2015

markusdr commented Nov 11, 2015

vsrikarunyan commented Nov 11, 2015

graphific commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

jbencook commented Nov 11, 2015

allanzelener commented Nov 11, 2015

nbenhaim commented Nov 12, 2015

anjishnu commented Nov 12, 2015

zheng-xq commented Nov 12, 2015

jbencook commented Nov 12, 2015

mjwillson commented Nov 12, 2015

wingdi commented Aug 23, 2017

martinwicke commented Aug 23, 2017

mengxingxinqing commented Aug 8, 2018

gunan commented Aug 8, 2018

Cuda 3.0? #25

Cuda 3.0? #25

Comments

infojunkie commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

infojunkie commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

infojunkie commented Nov 9, 2015

zheng-xq commented Nov 10, 2015

infojunkie commented Nov 10, 2015

udibr commented Nov 10, 2015

graphific commented Nov 10, 2015

infojunkie commented Nov 10, 2015

markusdr commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

avostryakov commented Nov 11, 2015

allanzelener commented Nov 11, 2015

markusdr commented Nov 11, 2015

vsrikarunyan commented Nov 11, 2015

graphific commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

zheng-xq commented Nov 11, 2015

jbencook commented Nov 11, 2015

allanzelener commented Nov 11, 2015

nbenhaim commented Nov 12, 2015

anjishnu commented Nov 12, 2015

zheng-xq commented Nov 12, 2015

jbencook commented Nov 12, 2015

mjwillson commented Nov 12, 2015

wingdi commented Aug 23, 2017

martinwicke commented Aug 23, 2017

mengxingxinqing commented Aug 8, 2018

gunan commented Aug 8, 2018