Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support #52

Open
kormoczi opened this issue May 31, 2021 · 13 comments
Open

GPU support #52

kormoczi opened this issue May 31, 2021 · 13 comments
Assignees
Labels
question Further information is requested

Comments

@kormoczi
Copy link

kormoczi commented May 31, 2021

Hi,
I am using the Marian Models for translation.
It works fine, but I am assuming it works only on CPU
(I am using the following code:
pipe_translate = nlu.load('hu.translate_to.en')
translate = pipe_translate.predict("Sziasztok, mi a helyzet?")
and the predict part takes about 5 second, and I have an A100 GPU,
I dont think this should take so long...)
I can't figure it out, how to use the GPU, or how to check, if it uses the GPU...
(print (tf.test.gpu_device_name()) show the the GPU is there...)
Where can I find some documentation/info about this issue?
I had some issues with CUDA and java installation, but right now these look fine...

Thanks

@C-K-Loan
Copy link
Member

C-K-Loan commented May 31, 2021

HI @kormoczi

you can call nlu.load('any model' , gpu=True) , which will enable GPU mode for NLU.

Make sure you enable the gpu mode in the very first call to NLU, or otherwise it will not enable.
Keep in mind, the translator models are quite big and thus slow, even in GPU mode, but upgrades are planned.

@C-K-Loan C-K-Loan added the question Further information is requested label May 31, 2021
@C-K-Loan C-K-Loan self-assigned this May 31, 2021
@kormoczi
Copy link
Author

kormoczi commented Jun 7, 2021

After some struggles with CUDA/Python/Ubuntu versions, finally I think the basic system is fine, I could run some basic tests on GPU.
But with NLU, I still have problems.
The loading of the CUDA libraries looks fine, but then I receive multiple error, the first one is this:

2021-06-07 12:46:56.905136: E external/org_tensorflow/tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
2021-06-07 12:46:56.905182: E external/org_tensorflow/tensorflow/c/c_api.cc:2184] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
[ | ]21/06/07 12:46:56 ERROR Instrumentation: org.tensorflow.exceptions.TensorFlowException: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101)
at org.tensorflow.Session.allocate(Session.java:576)
at org.tensorflow.Session.(Session.java:97)
at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:317)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel(TensorflowSerializeModel.scala:127)
at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel.readTensorflowModel$(TensorflowSerializeModel.scala:103)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readTensorflowModel(SentenceDetectorDLModel.scala:338)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:320)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.readSentenceDetectorDLGraph$(SentenceDetectorDLModel.scala:318)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLModel$.readSentenceDetectorDLGraph(SentenceDetectorDLModel.scala:338)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1(SentenceDetectorDLModel.scala:324)
at com.johnsnowlabs.nlp.annotators.sentence_detector_dl.ReadsSentenceDetectorDLGraph.$anonfun$$init$$1$adapted(SentenceDetectorDLModel.scala:324)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:31)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:30)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:30)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:41)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:41)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:19)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$5(Pipeline.scala:277)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:162)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:157)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$4(Pipeline.scala:277)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$load$3(Pipeline.scala:274)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:268)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$7(Pipeline.scala:356)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent(events.scala:162)
at org.apache.spark.ml.MLEvents.withLoadInstanceEvent$(events.scala:157)
at org.apache.spark.ml.util.Instrumentation.withLoadInstanceEvent(Instrumentation.scala:42)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.$anonfun$load$6(Pipeline.scala:355)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:355)
at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:349)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:395)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:389)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:499)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

Do you have any idea or suggestion?
Thanks!

@C-K-Loan
Copy link
Member

C-K-Loan commented Jun 7, 2021

Thank you for sharing, taking a closer look

@C-K-Loan C-K-Loan reopened this Jun 7, 2021
@C-K-Loan
Copy link
Member

C-K-Loan commented Jun 8, 2021

HI @kormoczi
can you share which specific python code you ran to cause this issue?

@kormoczi
Copy link
Author

kormoczi commented Jun 9, 2021

Hi @C-K-Loan
Sure, this is the python code:

import nlu
input("Step #1 - nlu.load - Press Enter to proceed...")
pipe_translate_hu_en = nlu.load('hu.translate_to.en', gpu=True)
input("Step #2 - pipe_translate_hu_en.predict - Press Enter to proceed...")
translate_output = pipe_translate_hu_en.predict("Sziasztok, mi a helyzet?")
print(translate_output)
input("Step #3 - delete pipe - Press Enter to proceed...")
del pipe_translate_hu_en

The error comes up during nlu.load already.

I am not sure, but this error message ("Status: device kernel image is invalid") looks similar to another issue, which I had recently with another project. That is a PyTorch based project, and I had to match the CUDA and the torch version, and install the appropriate torch version with the CUDA extension/support.
I think, the tensorflow-gpu version also should be matched with the CUDA version, but I am not sure, how to check this within NLU. I can see this jar file in the cache: com.johnsnowlabs.nlo_tensorflow-gpu_2.12-0.2.2.jar, but this version number looks a little bit strange for me... (I have tried different CUDA versions, but until now without success...)

Thanks for your help!

@C-K-Loan
Copy link
Member

C-K-Loan commented Jun 9, 2021

Hi @kormoczi,
can you share what CUDA version you are running?
You should have Cuda 11.2
I just tested on google colab, it works fine https://colab.research.google.com/drive/1woxdvCSk7u_yhXrhCNx37L4OVC2tO47i?usp=sharing (if you click runtime in the top, you can switch to GPU runtime and test it)

Let me know if you have more trouble after installing Cuda 11.2

@kormoczi
Copy link
Author

kormoczi commented Jun 9, 2021

Hi @C-K-Loan,
Until now I have tested with CUDA 11.2.2 and CUDA 10.1.
For me the problem with CUDA 11.2.2 (and 11.2 as well), that I am getting a lot of errors about missing libraries (like libcudart.so.10.1, libcudnn.so.7, etc.), that is why I though, that I should use CUDA 10.1.
By the way, I am trying to install nlu with the following versions: openjdk-8-jre, pyspark==3.0.1, nlu==3.0.2.
I can see, that in the colab the install (and the versions) are different, I will try to reproduce those settings on my machine...

@maziyarpanahi
Copy link
Member

maziyarpanahi commented Jun 9, 2021

Since NLU is based on Spark NLP, GPU requirements:

  • Spark NLP 3.x is based on TensorFlow 2.3.1 so the requirements for GPU
    -- CUDA 10.1
    -- cuDNN 7.x
  • Spark NLP 3.1 is based on TensorFlow 2.4.1 so the requirements for GPU
    -- CUDA 11
    -- cuDNN 8.0.2

Since the latest NLU is on Spark NLP 3.x you should go with the first options. Make sure you follow TensorFlow instructions for installing/settings GPU correctly especially LD_LIBRARY_PATH env variable.

PS: As for why Google Colab with CUDA 11.x can work with Spark NLP 3.x, they simply have all the CUDA 10.1, 10.2, and 11.x dynamic files available in the path so Spark NLP finds them regardless of the default CUDA version. You should be able to see something like this:

external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-14 16:23:59.532205: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-05-14 16:23:59.534978: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-05-14 16:23:59.535568: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-05-14 16:23:59.538122: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-05-14 16:23:59.539839: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-05-14 16:23:59.545371: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-05-14 16:23:59.545456: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:23:59.546220: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:23:59.546892: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-14 16:23:59.546933: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-14 16:24:00.207163: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-14 16:24:00.207208: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-05-14 16:24:00.207215: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-05-14 16:24:00.207432: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:24:00.208113: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:24:00.208840: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 16:24:00.209546: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14766 MB memory) -> physical GPU (device: 0, name: NVIDIA Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0)

Also, this is a nice thread to read when GPU setup becomes tricky: https://spark-nlp.slack.com/archives/CA118BWRM/p1620933399356800

@kormoczi
Copy link
Author

kormoczi commented Jun 9, 2021

Sorry, I am a little bit confused right now.
@C-K-Loan told I should have CUDA 11.2, @maziyarpanahi, you told I should have CUDA 10.1.
Anyhow, I have tried both...
With CUDA 11.2, even the libraries did not load.
With CUDA 10.1, it looks like the libraries did load, but than I receive the error, I mentioned before ("Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid").
Maybe the problem is, that I am not explicitly installing / setting Tensorflow? It is installed by the nlu itself, during the first run...

@kormoczi
Copy link
Author

kormoczi commented Jun 9, 2021

@maziyarpanahi
By the way, how I can request access to this thread you have mentioned?
When I click on the link you have provided, I just received an error: "doesn’t have an account on this workspace".
Thanks!

@C-K-Loan
Copy link
Member

C-K-Loan commented Jun 9, 2021

@kormoczi my bad, what @maziyarpanahi suggested is correct.
The current version of NLU is based on Spark NLP 3.X, which means you need either CUDA 10.1 or cuDNN 7.x

This looks most likely like a Tensorflow installation issue.

Maybe try verifying Tensorflow has access to GPU https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell

The thread that has been posted by @maziyarpanahi is visible when you join the Slack channel https://join.slack.com/t/spark-nlp/shared_invite/zt-lutct9gm-kuUazcyFKhuGY3_0AMkxqA , it is our community with over 2000 people helping each other

Hope this helps

@kormoczi
Copy link
Author

kormoczi commented Jun 9, 2021

@C-K-Loan
Thanks, I could join the Slack channel.

I have double checked the Tensorflow install (as described on the link you have provided), with using the following python script:

import tensorflow as tf
input("Step #1 - Verify that Tensorflow loads - Press Enter to proceed...")
print(tf.__version__)
input("Step #2 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...")
print('Num GPUs Available: ', len(tf.config.experimental.list_physical_devices('GPU')))
input("Step #3 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...")
print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))

And it looks ok, this is the output:

2021-06-09 11:44:03.086382: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Step #1 - Verify that Tensorflow loads - Press Enter to proceed...
2.3.1
Step #2 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...
2021-06-09 11:44:06.328576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-06-09 11:44:06.387802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:86:00.0 name: A100-PCIE-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s
2021-06-09 11:44:06.387848: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-06-09 11:44:06.390328: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-06-09 11:44:06.392879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-06-09 11:44:06.393314: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-06-09 11:44:06.395953: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-06-09 11:44:06.397420: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-06-09 11:44:06.403439: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-06-09 11:44:06.406652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
Num GPUs Available:  1
Step #3 - Verify that GPU is seen by Tensorflow - Press Enter to proceed...
Num GPUs Available:  1

But the nlu does not work, this script:

import nlu
input("Step #1 nlu.load - Press Enter to proceed...")
pipe_translate_hu_en = nlu.load('hu.translate_to.en', gpu=True)
input("Step #2 pipe.predict - Press Enter to proceed...")
translate_output = pipe_translate_hu_en.predict("Sziasztok, mi a helyzet?")
print(translate_output)
input("Step #3 del pipe - Press Enter to proceed...")
del pipe_translate_hu_en

gives the error mentioned in the beginning of this thread (the error happens during nlu.load).

But I think these are not the same Tensorflow installs, the first one is installed with pip, the second one is installed by the nlu, as a jar package...
As I can see, these are the version numbers for the main jar packages installed by nlu:

  • spark-nlp-gpu_2.12/3.0.3/spark-nlp-gpu_2.12-3.0.3.jar
  • tensorflow-gpu_2.12/0.2.2/tensorflow-gpu_2.12-0.2.2.jar
    Are these versions look ok? Or is there a way to use other versions here?

By the way, now I have installed nlu based on the colad_setup.sh script.

The value of the LD_LIBRARY_PATH was "/usr/local/nvidia/lib:/usr/local/nvidia/lib64", I have replaced it with the following: "/usr/local/cuda/lib64", but no change either (there is no directory named /usr/local/nvidia).

@C-K-Loan
Copy link
Member

C-K-Loan commented Jul 17, 2021

Hi @kormoczi
sorry for the late reply.

Could you test a couple of other Tensorflow based models and see if this error occurs?

i.e. try please
nlu.load('bert', gpu=True)

nlu.load('elmo', gpu=True)

nlu.load('xlnet', gpu=True)

please let me know if you get the same errors or if this only happens on translate.

This could be related to
SentenceDetectorDLModel which is in the original stacktrace.

Alternatively, please try the following

pipe_translate = nlu.load('hu.translate_to.en')
pipe_translate.components
pipe_translate.components.remove(pipe_translate.components[1])
pipe_translate.predict('Hello world', output_level ='document')

This will remove the SentenceDetectorDL, which is causing the error in the pipeline for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants