Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLamaSharp.Backend.Vulkan #3 #517

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

moozoo64
Copy link

This PR adds the llama.cpp vulkan backend.

@martindevans
Copy link
Collaborator

martindevans commented Feb 18, 2024

Does this supercede #514?

Just in case you're not aware you can always amend a PR by pushing to the branch, you don't need to re-open new ones :)

Edit: Just saw your comment on the other PR, never mind!

@martindevans
Copy link
Collaborator

Note to self: Tests failing is expected because the binaries are missing in this PR. Base the next binary update PR on this branch.

@moozoo64
Copy link
Author

Just some general comments
I have only tested on Windows with an AMD Radeon VII
I'd request that other test this PR in order to validate it.
I just know it works for me and my simple use case.
I've not tested under Linux but I believe it has a high chance of working.

On a system with CUDA the CUDA library should precede the vulkan library and hence be used.
In theory vulkan could be made to work on macos using the Vulkan->Metal emulation. But I don't see the point.

The llama.cpp vulkan backend fully loads into vram. I believe the CLBlast doesn't neither does the kcompute.
In my testing the Vulkan backend is also much faster than the CLBlast
Benchmarks on the llama.cpp site suggest the Vulkan backend is about 60% to 90% as fast as the HIPBlast backend However the HIPBlast backend supports a much limited set of graphic cards.

With
private bool _useVulkan = true;
The presence of vulkan will be checked for via GetVulkanVersion()
This uses the command "vulkaninfo --summary" which on windows with a AMD graphics card is installed along with the driver.
Ditto with Nvidia cards. I don't know about Intel but I assume so.

On linux you might need to use "sudo apt install vulkan-tools". I needed to do this under wsl on my nvidia gpu system but not on my AMD gpu system.
I've not tested under wsl. I don't believe vulkan works properly this way.

The safest option is _useVulkan = false and to remove the check via GetVulkanVersion()
This however means that the vulkan backend won't automatically be used on systems with it.
I don't fully parse "vulkaninfo -summary", only that there is a vulkan device present.
Obviously this would break if the format of "vulkaninfo -summary" was ever changed.
It would be complicated to workout if there was a vulkan device present and that it could run llama.cpp
The only other options to using "vulkaninfo --summary"

  1. have Llamasharp use the Vulkan SDK , which I don't think is desirable.
  2. create a small stub program using the Vulkan SDK that simply returns VulkanOK and have that linked against the Vulkan SDK.

@martindevans
Copy link
Collaborator

@moozoo64 Sorry for the delay with this PR. I'm just starting to look at the next round of binary updates now.

To make that easier, would you mind creating a new PR with just the changes to the github build action. That way I can merge it in without breaking CI, we can merge the rest of the stuff in a separate PR later once the binaries are in place.

@AsakusaRinne
Copy link
Collaborator

@moozoo64 Hi, would you like to continue the work and finish this PR? If you have any problem when resolving the conflicts, please feel free to ask me for help. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants