Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no kernel image is available for execution on the device #806

Open
liu1352183717 opened this issue Apr 24, 2024 · 10 comments
Open

no kernel image is available for execution on the device #806

liu1352183717 opened this issue Apr 24, 2024 · 10 comments

Comments

@liu1352183717
Copy link

[ctranslate2] [thread 10436] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
Detected language 'zh' with probability 1.000000
Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\pythonProject\main.py", line 23, in
for segment in segments:
File "C:\Users\Administrator.conda\envs\faster_whisper\lib\site-packages\faster_whisper\transcribe.py", line 1106, in restore_speech_timestamps
for segment in segments:
File "C:\Users\Administrator.conda\envs\faster_whisper\lib\site-packages\faster_whisper\transcribe.py", line 511, in generate_segments
encoder_output = self.encode(segment)
File "C:\Users\Administrator.conda\envs\faster_whisper\lib\site-packages\faster_whisper\transcribe.py", line 762, in encode
return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

@Purfview
Copy link
Contributor

Purfview commented Apr 24, 2024

What is your GPU model?

@liu1352183717
Copy link
Author

GeForce GTX 950

@LeonVeganMan
Copy link

Hi,

same problem here with Tesla M40, CUDA 12.4.

Thanx.
Ciao.
L.

@laraws
Copy link

laraws commented May 10, 2024

Same issue, my gpu is GTX 950M

@seclog
Copy link

seclog commented May 12, 2024

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

@laraws
Copy link

laraws commented May 13, 2024

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

Thx for your reply. I tried it. But it didn't work.

@seclog
Copy link

seclog commented May 16, 2024

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

Thx for your reply. I tried it. But it didn't work.

GTX 950M has the compute capability of 5.0.You have the same hardware as me. This is my own ctranslate2.dll compiled on windows11 compatible with cudnn9, you can try to replace the ctranslate2.dll in the fastwhisper script directory, pay attention to install the new graphics card driver, cudatoolkits12 ,cudnn9 support cuda12, otherwise some other dll dependencies lead to not work. This dll may not support other cards, because it is only compatible with cuda5.0, so the best way is to replace ctranslate2 in version 3.24.
ctranslate2_cuda5.0.zip

fastwhisper2024-05-17 044350
fastwhisper2024-05-17 2044350

@laraws
Copy link

laraws commented May 17, 2024

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

Thx for your reply. I tried it. But it didn't work.

GTX 950M has the compute capability of 5.0.You have the same hardware as me. This is my own ctranslate2.dll compiled on windows11 compatible with cudnn9, you can try to replace the ctranslate2.dll in the fastwhisper script directory, pay attention to install the new graphics card driver, cudatoolkits12 ,cudnn9 support cuda12, otherwise some other dll dependencies lead to not work. This dll may not support other cards, because it is only compatible with cuda5.0, so the best way is to replace ctranslate2 in version 3.24. ctranslate2_cuda5.0.zip

fastwhisper2024-05-17 044350 fastwhisper2024-05-17 2044350

Thank you for your reply. My system is Ubuntu, can I use the "ctranslate2.dll"

@risacher
Copy link

risacher commented May 25, 2024

@laraws No, a .dll is a Windows shared library and will not work on Ubuntu. I have also been trying to get faster-whisper to work on my Ubuntu 22.04 server with a GTX 750 Ti (which is compute capability 5.0, like others in this thread.) As noted above, the version of CTranslate2 bundled with faster-whisper does not support this, so I recompiled it from source.

This was slightly tricky as I wanted to be able to run on both CPU and GPU. CPU support requires Intel MKL, and the 22.04 package for that is missing a pkg-config file.

So, for reference of anyone else trying this, I configured CTranslate2 with: cmake .. -DWITH_MKL=ON -DWITH_CUDA=ON -DWITH_CUDNN=ON -DMKL_ROOT=/usr -DMKL_INCLUDE_DIR=/usr/include/mkl. In order for cmake to auto-detect the compute-capability, cmake must be run on the machine with the GPU and the NVIDIA driver and driver API must be installed and match. If you are compiling on a different machine you should be able to add -DCMAKE_CUDA_ARCHITECTURES=50 to force it to use compute capability 5.0.

Be sure to add /usr/local/lib to $LD_LIBRARY_PATH and /usr/local/bin to $PATH, and delete the ctranslate2.so files installed in ~/.local/lib/python3.10/site-packages/. (I think they were in site-packages/ctranslate2.libs/ or something like that). I think to compile CTranslate2 I also had to have CUDA installed and nvcc must be in the PATH, so I needed to add /usr/local/cuda/bin to PATH and /usr/local/cuda/lib64 to LD_LIBRARY_PATH.

All that said, when I tried to run the sample code on the GPU with a 30-second sample, it segfaulted, so I'd be interested if anyone else gets it to work. It works on the CPU, but too slow for my application.

@risacher
Copy link

I also note that CTranslate2 apparently thinks the GTX 750 Ti only really supports float32, which I determined by running

import ctranslate2
t = ctranslate2.get_supported_compute_types("cuda")
print(f"get_supported_compute_types(\"cuda\"): {t}")

As a result I requantized the model to float32 like this:

ct2-transformers-converter --model openai/whisper-small.en --output_dir f32-whisper-small.en --copy_files tokenizer.json --quantization float32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants