-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13
Conversation
- **Fixed GPU allocation in Ollama**: `gpus=all` flag used in `docker run` did not affect the behavior of `docker compose`. The `nvidia-smi` inside ollama would still detect a number of GPUs equal to the number set in `count:`. Now, using `count: all` allows for dynamic detection and utilization of all available GPUs in ollama container. **Bumped Ollama to v0.1.33**: As it offers improved multi-GPU workload optimization and enhanced support for llama3 and mixtral. - **Bumped Cat Core to v1.6.1**: Just to bring in the latest features and bug fixes from the Cat. - **Bumped Qdrant to the latest patch v1.7.4**: I'm currently verifying the compatibility of Qdrant v1.9.1 with Cheshire vectorDB calls.
Thanks @MRColorR ! |
On discord there are some support request regarding ollama with 1.6.1 of the cat, do you have the same problem? |
ok i'm reading on Discord right now. After reading them i'm confident on the following:
thanks @valentimarco for the headsup :) |
I have conducted tests on a clean local-cat repo clone i've found the following:
details:
Model in use: llama3:70b-instruct-q2_K |
Hope this can help the cheshirecat project :) thank you for your time and consideration |
ok, 1.6.1 is broken but soon we going to recover it. At the moment i gonna wait the new release and than merge the PR. |
@MRColorR kudos for how systematically you tested |
gpus=all
flag used indocker run
did not affect the behavior ofdocker compose
. Thenvidia-smi
inside ollama would still detect a number of GPUs equal to the number set incount:
.count: all
allows for dynamic detection and utilization of all available GPUs in ollama container.