Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add the Ability to Use NVIDIA for Docker #43

Open
ZaxLofful opened this issue Mar 24, 2023 · 41 comments · May be fixed by #944
Open

[Enhancement] Add the Ability to Use NVIDIA for Docker #43

ZaxLofful opened this issue Mar 24, 2023 · 41 comments · May be fixed by #944

Comments

@ZaxLofful
Copy link

Allow large language models with graphics card with large RAM.

https://github.com/NVIDIA/nvidia-docker

@alph4b3th
Copy link

Yes please! the model is very slow on amd epyc cpu..
image

@fishscene
Copy link

fishscene commented May 17, 2023

I too would like to see this capability.
Another person here also looking for this: https://github.com/nsarrazin/serge/discussions/97

Looks like it is possible with this project: https://github.com/ggerganov/llama.cpp
Appears to be some kind of CPU/GPU sharing: https://www.reddit.com/r/LocalLLaMA/comments/13gok03/llamacpp_now_officially_supports_gpu_acceleration/

@SurvivaLlama
Copy link

I am willing to test GPU / Nvidia / Docker

@lucienn3
Copy link

Hey guys,

Please reach out to me if you want to test it.
I have an i9-13900KF, 64 GB DDR5 RAM, and an RTX4090. I would love to help with testing :)

@gaby
Copy link
Member

gaby commented Jul 1, 2023

This is in the TODO list, we have to split the Dockerfiles first though. Having GPU support will make the image huge. The plan is to publish separate images for GPU/Non-GPU support

@realies
Copy link

realies commented Jul 9, 2023

@gaby, any idea when could these be expected? Happy to test too.

@Jonpro03
Copy link

Jonpro03 commented Aug 8, 2023

I got it running. Feel free to borrow: https://gist.github.com/Jonpro03/604430a3e64735a0a9df6b7e385d15be

@gaby
Copy link
Member

gaby commented Aug 8, 2023

I got it running. Feel free to borrow: https://gist.github.com/Jonpro03/604430a3e64735a0a9df6b7e385d15be

Ideally during runtime the image should be "-runtime" not "-devel". My plan is to make a compose that use the dockerfile for the final stage

@gaby
Copy link
Member

gaby commented Aug 8, 2023

@gaby, any idea when could these be expected? Happy to test too.

This week

@Jonpro03
Copy link

Jonpro03 commented Aug 9, 2023

I got it running. Feel free to borrow: https://gist.github.com/Jonpro03/604430a3e64735a0a9df6b7e385d15be

Ideally during runtime the image should be "-runtime" not "-devel". My plan is to make a compose that use the dockerfile for the final stage

Groovy. Can confirm it works with the runtime image too.

@gaby
Copy link
Member

gaby commented Aug 9, 2023

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

@gaby
Copy link
Member

gaby commented Aug 9, 2023

I see that you are adding CMAKE flags, not sure what those do, haha

@Jonpro03
Copy link

Jonpro03 commented Aug 9, 2023

I see that you are adding CMAKE flags, not sure what those do, haha

Possibly a me problem, since I'm running older CPUs (Xeon E5-2640's), but these flags were necessary to get it to compile with the pip.
-DLLAMA_CUBLAS=ON -DLLAMA_NATIVE=ON should be all that's necessary, but wasn't in my case (I suspect a lama-cpp bug) ggerganov/llama.cpp#1982

@gaby
Copy link
Member

gaby commented Aug 9, 2023

I see that you are adding CMAKE flags, not sure what those do, haha
Possibly a me problem, since I'm running older CPUs (Xeon E5-2640's), but these flags were necessary to get it to compile with the pip.
-DLLAMA_CUBLAS=ON -DLLAMA_NATIVE=ON should be all that's necessary, but wasn't in my case (I suspect a lama-cpp bug) ggerganov/llama.cpp#1982

Thanks for the info!

@Jonpro03
Copy link

Jonpro03 commented Aug 9, 2023

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

@gaby
Copy link
Member

gaby commented Aug 9, 2023

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

Yeah, that's what i figured. The devel tag is needed to compile the llama-cpp-python wheel. After that the runtime tag can be used.

My main problem has been that the devel tag is like 6.5GB

@visorcraft
Copy link

visorcraft commented Sep 28, 2023

We're also willing to test Nvidia support, have 2x RTX 3090 w/nvlink set up on Pop!_OS and eager see Nvidia support in Serge!

@anethema
Copy link

anethema commented Oct 1, 2023

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

Yeah, that's what i figured. The devel tag is needed to compile the llama-cpp-python wheel. After that the runtime tag can be used.

My main problem has been that the devel tag is like 6.5GB

Any progress? I would love to be able to run a local LLM on unraid with GPU acceleration

@gaby
Copy link
Member

gaby commented Oct 1, 2023

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

Yeah, that's what i figured. The devel tag is needed to compile the llama-cpp-python wheel. After that the runtime tag can be used.
My main problem has been that the devel tag is like 6.5GB

Any progress? I would love to be able to run a local LLM on unraid with GPU acceleration

I'm on vacation, will get this done in like 2 weeks. :-)

@creed2415
Copy link

creed2415 commented Nov 3, 2023

企业微信截图_16990015161142

I just found no use of gpu when I started a chat ( 30 GPU Layers) ,Should I install CUDA inside Docker?

Below are my proposed modifications:
step1: set the LLAMA_CUBLAS=1 environment variable

# Set ENV
ENV NODE_ENV='production'
ENV TZ=Etc/UTC
ENV LLAMA_CUBLAS=1
WORKDIR /usr/src/app

step2: replace the installation method for "llama-cpp-python" in the "/scripts/deploy.sh" file

# Install python bindings
UNAME_M=$(dpkg --print-architecture) CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python || {
	echo 'Failed to install llama-cpp-python'
	exit 1
}

step3:docker build

step4:install nvidia-container-toolkit

step5:start docker

@SurvivaLlama
Copy link

@creed2415 Did you get GPU to work?

@syonfox
Copy link

syonfox commented Nov 7, 2023

Cool stuff does anyone have numbers on the infrance speed on gpu vs cpu using this

@creed2415
Copy link

@creed2415 Did you get GPU to work?

yes! After I installed CUDA inside Docker,it worked with gpu.
企业微信截图_16998546303524

@TechnoKittyKatyusha
Copy link

TechnoKittyKatyusha commented Nov 13, 2023

@syonfox

well, the GPU Creed showed has 3840 Cuda-cores compared to the 24 cores/32Threads a current consumer Intel CPU provides.
sooooooo, i would assume, depending on creeds CPU model/ number of CPUs, that its a moderate to huge boost to the performance.
i didnt manage to get it to run using my GPU, so im jealous XD

@anethema
Copy link

@creed2415 did you do this in unraid by chance? Not sure how to actually make those edits to the package since it comes from the unraid community applications.

@gaby any tips on how to make that happen? Or is that something you still planned on ?

@gaby
Copy link
Member

gaby commented Nov 14, 2023

@anethema I have it ready, but I need to merge and test #866 first :-)

@anethema
Copy link

@anethema I have it ready, but I need to merge and test #866 first :-)

Awesome! Exciting to see.

@creed2415
Copy link

@gaby If possible, I would like to submit my solution (linux for nvidia) to the repository.

@gaby
Copy link
Member

gaby commented Nov 14, 2023

@creed2415 Serge has to work for both gpu/non-gpu, so images have to be separate, etc. So, it's a bit more complicated than just adding nvidia-toolkit. If you got a solution that works for everyone, sure.

@creed2415
Copy link

@gaby well,here are the adjustments I made. just take a look and let me know if they are useful.
https://github.com/creed2415/serge/tree/gpu-support

@mkeshav
Copy link

mkeshav commented Dec 2, 2023

any news on this pls?

@gaby gaby self-assigned this Dec 4, 2023
@gaby
Copy link
Member

gaby commented Dec 4, 2023

@mkeshav This week :-)

@syonfox
Copy link

syonfox commented Dec 4, 2023

Woh knows the best nvidia gpu provider iaas XD looking forward to it!

ps maybe harddware really is worth it ! but need med scale ;

This was referenced Dec 4, 2023
@anethema
Copy link

@mkeshav This week :-)

Any luck with this?

@gaby
Copy link
Member

gaby commented Feb 14, 2024

@mkeshav This week :-)

Any luck with this?

Still in progress

@iamzoltan
Copy link

Can I help?

@TheQuickestFox
Copy link

@gaby looks like #944 passed checks, can I get excited? thanks so much for the amazing work on the rest of this by the way - by far the most user friendly AI interface for local LLMs on Unraid.

@gaby
Copy link
Member

gaby commented Apr 16, 2024

@TheQuickestFox Not yet, i'm helping the llama-cpp-python folks tests their new wheels to make this more straight forward.

@JuniperChris929
Copy link

any ETA on this? and also on the latest IPv6 fixes? The last release is from Feb :(

@gaby
Copy link
Member

gaby commented Apr 30, 2024

@JuniperChris929 Within the next 2 weeks. The main library used by Serge now has built-in support for CUDA Python wheels which will streamline our approach to support multiple platforms.

Even though last release was in February, there's over 100 commits between then and now.

@CodeMazeSolver
Copy link

Are there any updates regarding Serge using GPU acceleration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.