New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda 3.0? #25
Comments
Officially, Cuda compute capability 3.5 and 5.2 are supported. You can try to enable other compute capability by modifying the build script: |
Thanks! Will try it and report here. |
This is not officially supported yet. But if you want to enable Cuda 3.0 locally, here are the additional places to change: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/gpu/gpu_device.cc#L610 The official support will eventually come in a different form, where we make sure the fix works on all different computational environment. |
I made the changes to the lines above, and was able to compile and run the basic example on the Getting Started page: http://tensorflow.org/get_started/os_setup.md#try_your_first_tensorflow_program - it did not complain about gpu, but it didn't report using the gpu either. How can I help with next steps? |
infojunkie@, could you post your step and upload the log? If you were following this example: bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer If you see the following line, the GPU logic device is being created: Creating TensorFlow device (/gpu:0) -> (device: ..., name: ..., pci bus id: ...) If you want to be absolutely sure GPU was used, set CUDA_PROFILE=1 and enable Cuda profiler. If the Cuda profiler logs were generated, it was a sure sign GPU was used. http://docs.nvidia.com/cuda/profiler-users-guide/#command-line-profiler-control |
I got the following log:
I guess it means the GPU was found and used. I can try the CUDA profiler if you think it's useful. |
Please prioritize this issue. It is blocking gpu usage on both OSX and AWS's K520 and for many people this is the only environments available. |
For reference, here's my very primitive patch to work with Cuda 3.0: https://gist.github.com/infojunkie/cb6d1a4e8bf674c6e38e |
@infojunkie I applied your fix, but I got lots of nan's in the computation output:
|
@markusdr, this is very strange. Could you post the completely steps you build the binary? Could what GPU and OS are you running with? Are you using Cuda 7.0 and Cudnn 6.5 V2? |
Just +1 to fix this problem on AWS as soon as possible. We don't have any other GPU cards for our research. |
Hi, not sure if this is a separate issue but I'm trying to build with a CUDA 3.0 GPU (Geforce 660 Ti) and am getting many errors with --config=cuda. See the attached file below. It seems unrelated to the recommended changes above. I've noticed that it tries to compile a temporary compute_52.cpp1.ii file which would be the wrong version for my GPU. I'm on Ubuntu 15.10. I modified the host_config.h in the Cuda includes to remove the version check on gcc. I'm using Cuda 7.0 and cuDNN 6.5 v2 as recommended, although I have newer versions installed as well. |
Yes, I was using Cuda 7.0 and Cudnn 6.5 on an EC2 g2.2xlarge instance with this AIM: |
It looks like we are seeing an API incompatibility between Compute Capability v3 and Compute Capability v3.5; post infojunkie's patch fix, I stumped onto this issue I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro K2100M, pci bus id: 0000:01:00.0) I run on Ubuntu 15.04, gcc 4.9.2, CUDA Toolkit 7.5, cuDNN 6.5; +1 for having Compute Capability v3 Support |
is cublas installed? and where does it link to |
@allanzelener, what OS and GCC versions do you have? Your errors seem to come from incompatible C++ compilers. It is recommended to use Ubuntu 14.04 and GCC 4.8 with TensorFlow. |
@vsrikarunyan, it is better to use CUDA Toolkit 7.0, as recommended. You can install an older CUDA Toolkit along with your newer toolkit. Just point TensorFlow "configure" and maybe LD_LIBRARY_PATH to the CUDA 7.0 when you run TensorFlow. |
@avostryakov, @infojunkie's early patch should work on AWS. https://gist.github.com/infojunkie/cb6d1a4e8bf674c6e38e An official patch is working its way through the pipeline. It would expose a configuration option to let you choose your compute target. But underneath, it does similar changes. I've tried it on AWS g2, and find out once things would work, after I completely uninstall NVIDIA driver, and reinstall the latest GPU driver from NVIDIA. Once again, the recommended setting on AWS at this point is the following. |
I applied the same patch on a g2.2xlarge instance and got the same result as @markusdr... a bunch of nan's. |
@zheng-xq Yes, I'm on Ubuntu 15.10 and I was using GCC 5.2.1. The issue was the compiler. I couldn't figure out how to change the compiler with bazel but simply installing gcc-4.8 and using update-alternatives to change the symlinks in usr/bin seems to have worked. (More info: http://askubuntu.com/questions/26498/choose-gcc-and-g-version). Thanks for the help, I'll report back if I experience any further issues. |
I did get this to work on a g2.2xlarge instance and got the training example to run, and verified that the gpu was active using the nvidia-smi tool , but when running mnist's convolutional.py , it ran out of memory. I suspect this just has to do with the batch size and the fact that the aws gpus don't have a lot of memory, but just wanted to throw that out there to make sure it sounds correct. To clarify, I ran the following, and it ran for like 15 minutes , and then ran out of memory. python tensorflow/models/image/mnist/convolutional.py |
@nbenhaim, just what did you have to do to get it to work? |
@markusdr, @jbencook, the NAN is quite troubling. I ran the same thing myself, and didn't have any problem. If you use the recommended software setting: Ubuntu 14.04, GCC 4.8, Cuda 7.0 and Cudnn 6.5, then my next guess is the Cuda driver. Could you uninstall and reinstall the latest Cuda driver. This is the sequence I tried on AWS, your mileage may vary: sudo apt-get remove --purge "nvidia*" |
Thanks for following up @zheng-xq - I'll give that a shot today. |
Another +1 for supporting pre-3.5 GPUs, as someone else whose only realistic option for training on real data is AWS GPU instances. Even for local testing, turns out my (recent, developer) laptop's GPU doesn't support 3.5 :-( |
Fixing issues tensorflow#23 and tensorflow#25
I have the same problem :"Ignoring gpu device (device:0,name:GeForce GT 635M, pci bus id) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0 ." . @smtabatabaie @martinwicke @alphaJatin. help !!!! |
Compute capability 2.1 is too low to run TensorFlow. You'll need a newer (or more powerful) graphics card to run TensorFlow on a GPU. |
The url of answer to the question is invalid. Can you update it? |
For nightly pip packages, recommended way to install is to use |
fix hcc linking error caused by __fdividef
Are there plans to support Cuda compute capability 3.0?
The text was updated successfully, but these errors were encountered: