Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory #17629

Closed
cubetastic33 opened this issue Mar 11, 2018 · 41 comments
Assignees
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues

Comments

@cubetastic33
Copy link

cubetastic33 commented Mar 11, 2018

OS Platform and Distribution:
Linux Ubuntu 17.10

TensorFlow installed using pip
TensorFlow version: 1.6, with GPU support
Python Version: 3.6.4
CUDA version: 9.1
GPU model and memory: NVidia GEForce 940MX 2GB
command to reproduce:
~$ python3

import tensorflow as tf
(basically run any tensorflow program to reproduce)

Problem:
Whenever you run a tensorflow program, you get a huge error log, but the main problem is this:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
So, the reason this is happening is because TensorFlow wants Cuda 9.0, but I have Cuda 9.1. This problem can be fixed by installing Cuda 9.0, but I have a few requests. Seeing that a couple of people have this problem (see #15604, #15817, #15817), I think that TensorFlow could be updated so that it works with Cuda 9.1 (but I think this issue is only with Ubuntu), or the following could be done:
Update the TensorFlow documentation, saying that you specifically need Cuda 9.0 for TensorFlow 1.6, and Cuda 8.0 for TensorFlow 1.4, and so on
And also, include this in the errors list at https://www.tensorflow.org/install/install_linux#common_installation_problems.

Edit: If a Pull Request is required to update the documentation, I am fine with doing that.

@mldm4
Copy link

mldm4 commented Mar 11, 2018

I proposed a solution that worked for me in #15604

@cubetastic33
Copy link
Author

@mldm4 I have also replied to you at 15604, the issue is not really that I couldn't get it to work, I know what the problem is. I just need this to be fixed, or mentioned more specifically (it already is, just not that clearly) in the TensorFlow docs, and included in the common installation problems.

@ghost
Copy link

ghost commented Apr 11, 2018

I also face the same problem with same configuration. But when I install cuda-9.0 version the issue got solved. I feel tensorflow-gpu version is using the cuda-9.0 version specifically.

@jamesredwards
Copy link

this has been a persistent problem since i first installed tensorflow. it seems totally unable to reference the latest cuda library and instead insists on a specific x.version.

i last managed to correct this with CUDA v8.0 by renaming than file libXXX.so.N to the version it was looking for.

@mikewlange
Copy link

https://github.com/mikewlange/tensorflow-gpu-install-ubuntu-16.04

was the only way could install TensorFlow gpu and Cuda 9.1. Trust me - was about to throw my computer out the window. and that's what did it.

I installed the full anaconda package.
(tensorflow) mike@mike:~$ python
Python 2.7.14 |Anaconda, Inc.| (default, Mar 27 2018, 17:29:31)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
hello = tf.constant('What a pain in the ARSE!')
print(sess.run(hello))
What a pain in the ARSE!

(tensorflow) mike@mike:~$ nvidia-smi
Sun Apr 22 23:00:55 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:03:00.0 On | N/A |
| 4% 53C P8 9W / 180W | 7762MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 975 G /usr/lib/xorg/Xorg 480MiB |
| 0 1818 G compiz 274MiB |
| 0 2751 C python 6995MiB |
+-----------------------------------------------------------------------------+

(tensorflow) mike@mike:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

I use Keras as it's so damn simple. But out of protest I may only use Theno and the backend and not TF.

@sebma
Copy link

sebma commented Apr 25, 2018

@mldm4 Hi, do you mean to downgrade coda to v9-0 ?

Can you please be more specific and point your hyperlink to the comment with the proposed solution to gain people more time ?

It would be great !

@mikewlange
Copy link

mikewlange commented Apr 25, 2018 via email

@sebma
Copy link

sebma commented Apr 26, 2018

@mikewlange Thanks 😃

@mikewlange
Copy link

mikewlange commented Apr 26, 2018 via email

@davidblumntcgeo
Copy link

jamesredwards said: "this has been a persistent problem since i first installed tensorflow. it seems totally unable to reference the latest cuda library and instead insists on a specific x.version.

i last managed to correct this with CUDA v8.0 by renaming than file libXXX.so.N to the version it was looking for."

@jamesredwards , or anyone else who is circumventing this error by renaming files, could you clarify how I'd do this for libcublas.so.9.0? I'm running cuda v8.0 on ubuntu 16.0.4, and tensorflow-gpu is erroring out with "ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory" .

In the /usr/local/cuda/lib64/ directory I see libcublas.so , libcublas.so.8.0, libcublas.8.0.61 and libcublas.so.8.0.88. I can use sudo mv to rename these files. Just renaming libcublas.so.8.0 to libcublas.so.9.0 does not fix this error. Do I have to do something to the other files as well?

Is tensorflow looking for licublas.so.9.0 in a different location than /usr/local/cuda/lib64/ ?

Thanks for any help!

@dashsd
Copy link

dashsd commented May 12, 2018

This issue is related to Google's protobuf-compiler due to which tensorflow fails to find the shared object file, in this instance, libcublas.so.9.0. Even switching from CUDA 9.1 to 9.0 didn't help as tensorflow was still unable to locate the file.

Building the latest version of protobuf (3.5.0) from source didn't help either. What worked for me was to install the system-wide protobuf compiler through apt install protobuf-compiler on Ubuntu 16.04. And, install the python version through pip3 install protobuf. I am using CUDA 9.0 as 9.1 is not yet compatible with tensorflow's pre-built binary.

You can check the system-wide protobuf version using protoc --version which is 2.6.1 on 16.04. The protoc python version is 3.5.2.post1. Hope this helps. I had a similar issue using earlier versions of tensorflow and CUDA 8, and had documented this troubleshooting procedure. Using the same procedure, I am able to use tensorflow 1.8.0 too.

@sliute
Copy link

sliute commented May 20, 2018

What worked for me was the process described at https://medium.com/@taylordenouden/installing-tensorflow-gpu-on-ubuntu-18-04-89a142325138 plus the protobuf bit @dashsd provided above. $PATH and $LD_LIBRARY_PATH all use '/usr/local/cuda-9.0'. The latter contains two separate entries, as described in #16750: one for /usr/local/cuda-9.0/extras/CUPTI/lib64 and another for /usr/local/cuda-9.0/lib64.

I use Ubuntu 18.04 on a Dell XPS 15" with NVIDIA GeForce GTX 1050 (GP107M), driver version 390.48. Tensorflow now runs on CUDA 9.0 and CUDNN 7.0.5.

@agilebean
Copy link

Hi @shivaniag > it would be great if you can reassign this issue if you have other priorities at the moment.
This issue affects many users and is therefore critical.
Thank you for your caring consideration!

@fay111101
Copy link

I have the same bug cuda9.2 cuDNN7.14

@mesargent
Copy link

Same bug in cuda-9.2 for me as well

@agilebean
Copy link

Hi @shivaniag you would really help the community if you re-assign this issue...
To All:
Who would be willing to solve this issue?

@dashsd
Copy link

dashsd commented Jul 23, 2018

TF 1.9 is also built against cuda 9.0. So, it's not going to work with cuda 9.2 unless it is built from source. I had a hard time trying to do it. With all unsuccessful attempts, I switched back to 9.0 for now. However, on Antergos, it's available pre-built with 9.2. It was more convenient to me rather than building it from source. Sorry, there's not much help I can provide regarding this. Some poor souls have successfully built it. Here's one of the links: https://github.com/fo40225/tensorflow-windows-wheel
Hope this helps!!!

@Parnia
Copy link

Parnia commented Jul 28, 2018

I have the same problem. CUDA 9.2, cuDNN7.1

@angeload
Copy link

angeload commented Aug 4, 2018

Same problem here. CUDA 9.2, cudNN 7.1.4

@T0T4R4
Copy link

T0T4R4 commented Aug 7, 2018

You have to re-compile tensorflow. The builds are not compatible with this combination yet.

We have compiled the latest master branch with
NVIDIA Drivers 396.37
CUDA 9.2,
cuDNN 7.1,
bazel-0.16.0
NCCL v2.2.13,

Package available here (temporarily)
https://drive.google.com/open?id=11E7hBBeAi79xPe7EYfYJZgOZDhIwmrBB

@cgdsss
Copy link

cgdsss commented Aug 8, 2018

re-compile tensorflow.
@angeload @Parnia @agilebean @fay111101 @mesargent
install bazel:

sudo apt-get install openjdk-8-jdk
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
sudo apt-get install curl
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel

clone tf code:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git pull
git checkout r1.9 

./configure

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: enter
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: enter
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: enter
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: enter
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: enter
Do you wish to build TensorFlow with XLA JIT support? [y/N]: enter
Do you wish to build TensorFlow with GDR support? [y/N]: enter
Do you wish to build TensorFlow with VERBS support? [y/N]: enter
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: enter
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2
Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-9.2
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1.4
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.2]: /usr/local/cuda-9.2
Do you wish to build TensorFlow with TensorRT support? [y/N]: enter
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: enter
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: enter
Do you want to use clang as CUDA compiler? [y/N]: enter
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: enter
Do you wish to build TensorFlow with MPI support? [y/N]: enter
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:enter

build:

bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg
cd tensorflow_pkg
sudo pip install tensorflow*.whl

testing:

import tensorflow as tf   
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

print Hello, TensorFlow!

@T0T4R4
Copy link

T0T4R4 commented Aug 10, 2018

Yeah but what a pain.. I mean cuda 9.2 and cudnn 7.1 have been there for quite a while already...
Would be nice to have a ready package...

@dashsd
Copy link

dashsd commented Aug 15, 2018

Tensorflow 1.10 released and it still requires cuda 9.0!!!!!! Can someone confirm this?

ImportError: Could not find 'cudart64_90.dll'. TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable. Download and install CUDA 9.0 from this URL: https://developer.nvidia.com/cuda-90-download-archive

@dhruvhacks
Copy link

Please refer combinations of CUDA, CuDNN and Tensorflow.

This error happens majorly due to incorrect version combinations of Nvidia-driver, CUDA, CuDNN and Tensorflow-gpu
image

@VictorGallagher
Copy link

Yesterday I received this message.
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
This install is about 2 months old and had been under daily use. I had just completed a training run and ran the script again and received the message. Could someone point me in the right direction, to resolve this issue ? Also I am running this in a virtual environment.
Ubuntu 18.04
Python 3.6.5
tensorflow-gpu==1.10.0
protobuf [required: >=3.4.0, installed: 3.6.0]
gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)
nvcc: NVIDIA (R) Cuda compiler driver
Cuda compilation tools, release 9.0, V9.0.176
NVIDIA-SMI 396.51 Driver Version: 396.51

@mikewlange
Copy link

mikewlange commented Oct 7, 2018

Try this Victor

Snagged most of these words from stack overflow. I’ve run into similar.

  1. dir is wonky in your vm
    First check this:
    LD_LIBRARY_PATH and make sure it’s pointing to the correct location.

  2. Check the permission of the folder /usr/local/cuda-9.0 it’s prob wrong. Kinda common.

This how #2 above handled it upon discovery..

I could not even cd to the folder. I changed the owner of the cuda-9.0from root to my-user-name and after that python was able to find the missing library.

  1. If you still can’t jump to hyperspace, try these things.

This is almost always a missing path in your LD_LIBRARY_PATH. We know.. check it again and proceed.

Find libcublas.so.9.0on your system (start looking under /usr/local).

If you don't find it, then install the CUDA 9.0 Toolkit https://developer.nvidia.com/cuda-90-download-archive after you read the rest..

NOTE:::::::

with TF 1.5 you want 9.0 toolkit and not 9.1 as anyone who has installed this does that, don’t.

If you have it, then update your LD_LIBRARY_PATH to point to the appropriate lib directory.

If you've done either of those and are now getting a similar looking error for a cudnn related library, then repeat that process for the CUDNN library.

https://developer.nvidia.com/cudnn

I think the latest version works. Tensorflow depends on both CUDA toolkit and the CuDNN library extension.

Note that you can install all of this in userspace too (sudo is typical, but not required).

@trananhdung17
Copy link

trananhdung17 commented Oct 12, 2018

I got same error message when import tensorflow. And now, I solve this problem already.
Let try
apt-get install cuda-cublas-[cuda-version]
with cuda-version: 9-0, 9-1, etc...

@VictorGallagher
Copy link

mikewlange, Thanks for pointing me in the right direction. Your resources worked. It seems that ''wonky'' is the right term to describe this problem with the pip virtual environment.

@mikewlange
Copy link

Thanks. Just a word to others trying to do all this. It’s obvious by watching this thread.. If you don’t know what you’re doing from the getgo. I.e., you’re not a AI engineer who know specifically knows why they need CUDA , taken heed. CUDA is mess with TensorFlow (and TF is a mess on its own).

You’re wasting time here if you want to check out CUDA. think about the Ml/AI solution you’re trying to solve and work accordingly. Don’t just install this monster if your have no idea how to use it.

If you want to just learn, use docker for the love of god. There are a million images that can work. Try Kaggle’s AI image. Ready to go with no nonsense like this. Masochists you all are. Lol.

@mechanicalAI
Copy link

mechanicalAI commented Nov 11, 2018

We'll, if you don't want me help. @elithrar please come with a better solution?

@mechanicalAI
Copy link

And for the love of god. If you don't know why you need TF, stop using it!!! https://medium.com/@julsimon/apache-mxnet-support-in-keras-83de7dec46e5 convert you TF model to MXnet: https://github.com/Microsoft/MMdnn and them get to work!!!!

@mechanicalAI
Copy link

mechanicalAI commented Nov 11, 2018

ALL READ: -

https://github.com/awslabs/keras-apache-mxnet is your TF replacement. you won't look back.... trust me. Been at this for 24 years (software engineering in general) and have built anything you've seen on the net. And guess what, without TF. Worked at hedge funds to build prop trading tools and algos, search and rescue 100% autonomous drones, ios slot machines, surgical tools, ect...

All without TF. you are correct no CUDA needed either.

Honesty, anything you need to do, can do it PyTorch. Or,

  1. Python for data munging
  2. https://pytorch.org/docs/stable/index.html to build your model

OR - easier...

  1. Convert your TF models to MXNet -
  2. Find your converter: https://github.com/mechanicalAI/deep-learning-model-convertor https://github.com/Microsoft/MMdnn#conversion

OR, here is what sane people do who have to do this for a living....

  1. Data pro? Munge that data like no-ones business?
  2. Keras!!! OR https://github.com/mechanicalAI/autokeras or toss the GPU and learn

1.. FInd the model yourself - example:
image

OR -

or https://github.com/mikewlange/KETTLE
or https://github.com/mechanicalAI/tpot OR you're best bet
https://github.com/mechanicalAI/h2o4gpu <- drop in replacement for ScikitLearn with with GPU support - https://github.com/mechanicalAI/h2o4gpu#requirements

Good luck and don't let your self get stuck for more than 1:30 min. Draw a line..

@kimsan0622
Copy link

kimsan0622 commented Nov 20, 2018

Unfortunately, Tensorflow and pytorch support CUDA 9.0 and CUDA 9.2 respectively until now: 18.11.20.
and CUDA 9.0 support ubuntu 16 and 17. So, til now, if you wanna use Tensorflow framework or pytorch framework, you should install ubuntu 16.04 or 17.
or you should compile tensorflow source on your machine.

Let's try this.
OS = ubuntu 16 or 17
CUDA = 9.0
cudnn >=7.2
tensorflow, pytorch
etc.

follow images are tensorflow gpu support and pytorch gpu support respectively.

2018-11-20 10 13 31

2018-11-20 10 14 16

@cbasavaraj
Copy link

Hi all,
I use PyTorch as much as possible, but for a particular project where i need to export a (Keras) model to tensorflowjs, I'm forced to use tf. The only solution which has worked well for me has been to build from source, after installing CUDA from the Ubuntu multiverse, as described here:

https://medium.com/@asmello/how-to-install-tensorflow-cuda-9-1-into-ubuntu-18-04-b645e769f01d

Bonne chance!

@KostasBitsakos
Copy link

February 2019, stil no concrete solution.

@Emile0205
Copy link

Emile0205 commented Feb 9, 2019

This issue seems to not be given enough attention. Why is tensorflow searching for libcublas.so.9.0 when CUDA 10.0 is installed. I installed tensorflow and ran some programs but it seemed most of the GPU memory wasn't being allocated to tensorflow so I rebooted my pc and then I received this error.

@pauldadzie
Copy link

@Emile0205 I agree!
I tried using TF 1.12.0 with cuda 10.0, seemed to work fine. Then I change to TF-gpu 1.12.0 and I get this error. Like why is this still an issue almost a year later from the thread was created?
I read in another thread that this problem has been around since TF 1.5!

@jgmeyerucsd
Copy link

jgmeyerucsd commented Feb 15, 2019

I had this problem trying to use tensor flow within a conda environment. tensorflow worked fine with the standard install in my base python 3.6, but not in my conda environment that I use for python.
import tensorflow as tf
I got the same error.

Here is what worked for me. I checked that the file did actually exist at /usr/local/cuda-9.0/lib64 and found that I have libcublas.so.9.0
Then I added the LD_LIBRARY_PATH to my ~/.bashrc:
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64/" >> ~/.bashrc
Restarted my terminal, built conda environment:
'conda create -n tf-gpu python=3.6 tensorflow-gpu'
'source activate tf-gpu'
'conda install ipykernel'
'python -m ipykernel install --user --name --display-name "tf-gpu"'
'source deactivate'
'jupyter notebook'

and then made a new notebook that uses the tf-gpu kernel, and:
'import tensorflow as tf'
'h = tf.constant('hellow')'
'sess = tf.Session()'
'print(sess.run(h))'

Worked

@Evarin
Copy link

Evarin commented Feb 20, 2019

I had the same problem (with CUDA 9.2), and somehow got it solved ~magically using conda install tensorflow-gpu (after uninstalling it with pip), rather than pip install tensorflow-gpu. It does not sound like a read fix, but it may help :)

@chankim
Copy link

chankim commented Feb 27, 2019

Hi, folks! Why all this fuss? Read below!
If you just downgrade from cuda 9.2 to cuda 9.0 (and the driver too), you will have other problems when installing new software packages. The right solution is just to add the cuda 9.0 toolkit leaving the cuda 9.2 intact.

After you have cuda 9.2 and the corresponding newest driver, you can just add cuda 9.0 toolkit using .run file install. While you additively install cuda 9.0 using .run file, say 'no' whey you are asked 'do you want to install the driver?' and the driver will remain intact and just cuda 9.0 toolkit will be installed. Cuda 9.0 works ok with the driver for cuda 9.2 (drivers are backward compatible). When you want to use tensorflow, just change the symbolic liink /usr/local/cuda to /usr/local/cuda-9.0 instead of /usr/local/cuda-9.2.
If you are in ubutu 16.04 or higher, you don't have to worry about the nouveau driver and nomodeset, nvidia-xconfig things during the .run file install. Just run the .run file and it is installed cleanly. (install all the patches too).

See https://devtalk.nvidia.com/default/topic/493290/multiple-cuda-versions-can-they-coexist-/

@jvishnuvardhan jvishnuvardhan added subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues labels Jun 7, 2019
@jvishnuvardhan
Copy link
Contributor

Automatically closing this out since I understand it to be resolved, but please let me know if I'm mistaken.Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests