[Enhancement] Add the Ability to Use NVIDIA for Docker #43

ZaxLofful · 2023-03-24T01:53:53Z

Allow large language models with graphics card with large RAM.

alph4b3th · 2023-03-28T06:09:37Z

Yes please! the model is very slow on amd epyc cpu..

fishscene · 2023-05-17T05:09:08Z

I too would like to see this capability.
Another person here also looking for this: https://github.com/nsarrazin/serge/discussions/97

Looks like it is possible with this project: https://github.com/ggerganov/llama.cpp
Appears to be some kind of CPU/GPU sharing: https://www.reddit.com/r/LocalLLaMA/comments/13gok03/llamacpp_now_officially_supports_gpu_acceleration/

SurvivaLlama · 2023-05-31T12:13:03Z

I am willing to test GPU / Nvidia / Docker

lucienn3 · 2023-06-30T22:50:03Z

Hey guys,

Please reach out to me if you want to test it.
I have an i9-13900KF, 64 GB DDR5 RAM, and an RTX4090. I would love to help with testing :)

gaby · 2023-07-01T23:31:39Z

This is in the TODO list, we have to split the Dockerfiles first though. Having GPU support will make the image huge. The plan is to publish separate images for GPU/Non-GPU support

realies · 2023-07-09T07:47:20Z

@gaby, any idea when could these be expected? Happy to test too.

Jonpro03 · 2023-08-08T19:32:53Z

I got it running. Feel free to borrow: https://gist.github.com/Jonpro03/604430a3e64735a0a9df6b7e385d15be

gaby · 2023-08-08T22:41:16Z

I got it running. Feel free to borrow: https://gist.github.com/Jonpro03/604430a3e64735a0a9df6b7e385d15be

Ideally during runtime the image should be "-runtime" not "-devel". My plan is to make a compose that use the dockerfile for the final stage

gaby · 2023-08-08T22:41:23Z

@gaby, any idea when could these be expected? Happy to test too.

This week

Jonpro03 · 2023-08-09T00:22:05Z

I got it running. Feel free to borrow: https://gist.github.com/Jonpro03/604430a3e64735a0a9df6b7e385d15be

Ideally during runtime the image should be "-runtime" not "-devel". My plan is to make a compose that use the dockerfile for the final stage

Groovy. Can confirm it works with the runtime image too.

gaby · 2023-08-09T00:23:33Z

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

gaby · 2023-08-09T00:27:20Z

I see that you are adding CMAKE flags, not sure what those do, haha

Jonpro03 · 2023-08-09T00:35:59Z

I see that you are adding CMAKE flags, not sure what those do, haha

Possibly a me problem, since I'm running older CPUs (Xeon E5-2640's), but these flags were necessary to get it to compile with the pip.
-DLLAMA_CUBLAS=ON -DLLAMA_NATIVE=ON should be all that's necessary, but wasn't in my case (I suspect a lama-cpp bug) ggerganov/llama.cpp#1982

gaby · 2023-08-09T00:37:03Z

I see that you are adding CMAKE flags, not sure what those do, haha
Possibly a me problem, since I'm running older CPUs (Xeon E5-2640's), but these flags were necessary to get it to compile with the pip.
-DLLAMA_CUBLAS=ON -DLLAMA_NATIVE=ON should be all that's necessary, but wasn't in my case (I suspect a lama-cpp bug) ggerganov/llama.cpp#1982

Thanks for the info!

Jonpro03 · 2023-08-09T00:57:48Z

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

gaby · 2023-08-09T01:00:12Z

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

Yeah, that's what i figured. The devel tag is needed to compile the llama-cpp-python wheel. After that the runtime tag can be used.

My main problem has been that the devel tag is like 6.5GB

visorcraft · 2023-09-28T00:49:13Z

We're also willing to test Nvidia support, have 2x RTX 3090 w/nvlink set up on Pop!_OS and eager see Nvidia support in Serge!

anethema · 2023-10-01T21:52:33Z

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

Yeah, that's what i figured. The devel tag is needed to compile the llama-cpp-python wheel. After that the runtime tag can be used.

My main problem has been that the devel tag is like 6.5GB

Any progress? I would love to be able to run a local LLM on unraid with GPU acceleration

gaby · 2023-10-01T21:53:49Z

@Jonpro03 llama-cpp-python compiles correctly even using the "-runtime" tag?

Oops, I spoke too soon. It only works with the devel tag.

Yeah, that's what i figured. The devel tag is needed to compile the llama-cpp-python wheel. After that the runtime tag can be used.
My main problem has been that the devel tag is like 6.5GB

Any progress? I would love to be able to run a local LLM on unraid with GPU acceleration

I'm on vacation, will get this done in like 2 weeks. :-)

creed2415 · 2023-11-03T08:55:07Z

I just found no use of gpu when I started a chat ( 30 GPU Layers) ,Should I install CUDA inside Docker?

Below are my proposed modifications:
step1: set the LLAMA_CUBLAS=1 environment variable

# Set ENV
ENV NODE_ENV='production'
ENV TZ=Etc/UTC
ENV LLAMA_CUBLAS=1
WORKDIR /usr/src/app

step2: replace the installation method for "llama-cpp-python" in the "/scripts/deploy.sh" file

# Install python bindings
UNAME_M=$(dpkg --print-architecture) CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python || {
	echo 'Failed to install llama-cpp-python'
	exit 1
}

step3:docker build

step4:install nvidia-container-toolkit

step5:start docker

SurvivaLlama · 2023-11-05T15:38:35Z

@creed2415 Did you get GPU to work?

syonfox · 2023-11-07T21:38:45Z

Cool stuff does anyone have numbers on the infrance speed on gpu vs cpu using this

creed2415 · 2023-11-13T05:50:42Z

@creed2415 Did you get GPU to work?

yes! After I installed CUDA inside Docker,it worked with gpu.

TechnoKittyKatyusha · 2023-11-13T15:22:24Z

@syonfox

well, the GPU Creed showed has 3840 Cuda-cores compared to the 24 cores/32Threads a current consumer Intel CPU provides.
sooooooo, i would assume, depending on creeds CPU model/ number of CPUs, that its a moderate to huge boost to the performance.
i didnt manage to get it to run using my GPU, so im jealous XD

anethema · 2023-11-13T21:52:39Z

@creed2415 did you do this in unraid by chance? Not sure how to actually make those edits to the package since it comes from the unraid community applications.

@gaby any tips on how to make that happen? Or is that something you still planned on ?

gaby · 2023-11-14T03:58:21Z

@anethema I have it ready, but I need to merge and test #866 first :-)

anethema · 2023-11-14T06:30:59Z

@anethema I have it ready, but I need to merge and test #866 first :-)

Awesome! Exciting to see.

creed2415 · 2023-11-14T12:13:43Z

@gaby If possible, I would like to submit my solution (linux for nvidia) to the repository.

gaby · 2023-11-14T12:16:34Z

@creed2415 Serge has to work for both gpu/non-gpu, so images have to be separate, etc. So, it's a bit more complicated than just adding nvidia-toolkit. If you got a solution that works for everyone, sure.

creed2415 · 2023-11-14T12:23:50Z

@gaby well,here are the adjustments I made. just take a look and let me know if they are useful.
https://github.com/creed2415/serge/tree/gpu-support

mkeshav · 2023-12-02T00:25:28Z

any news on this pls?

gaby · 2023-12-04T13:02:59Z

@mkeshav This week :-)

syonfox · 2023-12-04T13:11:56Z

Woh knows the best nvidia gpu provider iaas XD looking forward to it!

ps maybe harddware really is worth it ! but need med scale ;

anethema · 2024-02-13T16:57:25Z

@mkeshav This week :-)

Any luck with this?

gaby · 2024-02-14T05:06:49Z

@mkeshav This week :-)

Any luck with this?

Still in progress

iamzoltan · 2024-03-19T20:14:37Z

Can I help?

TheQuickestFox · 2024-04-16T00:09:58Z

@gaby looks like #944 passed checks, can I get excited? thanks so much for the amazing work on the rest of this by the way - by far the most user friendly AI interface for local LLMs on Unraid.

gaby · 2024-04-16T02:30:41Z

@TheQuickestFox Not yet, i'm helping the llama-cpp-python folks tests their new wheels to make this more straight forward.

JuniperChris929 · 2024-04-29T23:09:16Z

any ETA on this? and also on the latest IPv6 fixes? The last release is from Feb :(

gaby · 2024-04-30T03:39:46Z

@JuniperChris929 Within the next 2 weeks. The main library used by Serge now has built-in support for CUDA Python wheels which will streamline our approach to support multiple platforms.

Even though last release was in February, there's over 100 commits between then and now.

CodeMazeSolver · 2024-05-24T09:37:29Z

Are there any updates regarding Serge using GPU acceleration?

gaby added the ✏️ Feature label Sep 4, 2023

gaby self-assigned this Dec 4, 2023

gaby added the 🛠 In Progress label Dec 4, 2023

This was referenced Dec 4, 2023

GPU support #128

Closed

Implement support for CUDA #944

Open

[Enhancement] Add the Ability to Use NVIDIA for Docker #43

[Enhancement] Add the Ability to Use NVIDIA for Docker #43

Comments

ZaxLofful commented Mar 24, 2023

alph4b3th commented Mar 28, 2023

fishscene commented May 17, 2023 • edited

SurvivaLlama commented May 31, 2023

lucienn3 commented Jun 30, 2023

gaby commented Jul 1, 2023

realies commented Jul 9, 2023

Jonpro03 commented Aug 8, 2023

gaby commented Aug 8, 2023

gaby commented Aug 8, 2023

Jonpro03 commented Aug 9, 2023

gaby commented Aug 9, 2023

gaby commented Aug 9, 2023

Jonpro03 commented Aug 9, 2023 • edited

gaby commented Aug 9, 2023

Jonpro03 commented Aug 9, 2023

gaby commented Aug 9, 2023 • edited

visorcraft commented Sep 28, 2023 • edited

anethema commented Oct 1, 2023

gaby commented Oct 1, 2023

creed2415 commented Nov 3, 2023 • edited

SurvivaLlama commented Nov 5, 2023

syonfox commented Nov 7, 2023

creed2415 commented Nov 13, 2023

TechnoKittyKatyusha commented Nov 13, 2023 • edited

anethema commented Nov 13, 2023

gaby commented Nov 14, 2023

anethema commented Nov 14, 2023

creed2415 commented Nov 14, 2023

gaby commented Nov 14, 2023

creed2415 commented Nov 14, 2023

mkeshav commented Dec 2, 2023

gaby commented Dec 4, 2023

syonfox commented Dec 4, 2023

anethema commented Feb 13, 2024

gaby commented Feb 14, 2024

iamzoltan commented Mar 19, 2024

TheQuickestFox commented Apr 16, 2024

gaby commented Apr 16, 2024

JuniperChris929 commented Apr 29, 2024

gaby commented Apr 30, 2024

CodeMazeSolver commented May 24, 2024

fishscene commented May 17, 2023 •

edited

Jonpro03 commented Aug 9, 2023 •

edited

gaby commented Aug 9, 2023 •

edited

visorcraft commented Sep 28, 2023 •

edited

creed2415 commented Nov 3, 2023 •

edited

TechnoKittyKatyusha commented Nov 13, 2023 •

edited