Replies: 1 comment
-
Did not see this question... create a issuse #439 for that... But before make a PR need to know what the best to activate it. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, thank you for developing llamafile, it's such a wonderful tool.
For some time now, llama.cpp on Linux has had support for unified memory architecture (UMA for AMD APU) to share main memory between the CPU and integrated GPU. This requires compiling llama.cpp with
-DLLAMA_HIP_UMA=on
setting.I'm trying to compile llamafile with this additional setting for the llama.cpp, but I'm having some problems. Could you point me in the right direction?
I'm using Ubuntu 22.04 with AMD 5600G APU and ROCm 6.1. When I compile llama.cpp I use
make LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx900
command. Then I can load any model that fits into my RAM, even though my system reports only 512 MB of VRAM. I also haveHSA_OVERRIDE_GFX_VERSION=9.0.0
andHSA_ENABLE_SDMA=0
environment variables set in my.profile
file.I tried adding
-DLLAMA_HIPBLAS=on -DLLAMA_HIP_UMA=on -DAMDGPU_TARGETS=gfx900
setting inllamafile-0.8/llamafile/cuda.c
file just under"-DGGML_USE_HIPBLAS",
instatic bool compile_amd_unix(const char *dso, const char *src, const char *tmpdso) {
method:And in
llamafile-0.8/llamafile/rocm.sh
file just under-DGGML_USE_HIPBLAS \
:Then I compiled llamafile:
Unfortunately, when I launched the same model that I use with llama.cpp the
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5169.86 MiB on device 0: cudaMalloc failed: out of memory
error occurred:When I run llama.cpp compiled with UMA support, everything works fine:
Should I add
-DLLAMA_HIPBLAS=on -DLLAMA_HIP_UMA=on -DAMDGPU_TARGETS=gfx900
for llama.cpp somewhere else? I would be very grateful for help.Beta Was this translation helpful? Give feedback.
All reactions