CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13

MRColorR · 2024-05-07T09:14:26Z

Fixed GPU allocation in Ollama: gpus=all flag used in docker run did not affect the behavior of docker compose. The nvidia-smi inside ollama would still detect a number of GPUs equal to the number set in count:.
- Now, using count: all allows for dynamic detection and utilization of all available GPUs in ollama container.
Bumped Ollama to v0.1.33: As it offers improved multi-GPU workload optimization and enhanced support for llama3 and mixtral.
Bumped Cat Core to v1.6.1: Just to bring in the latest features and bug fixes from the Cat.
Bumped Qdrant to the latest patch v1.7.4: I'm currently verifying the compatibility of Qdrant v1.9.1 with Cheshire vectorDB calls.

- **Fixed GPU allocation in Ollama**: `gpus=all` flag used in `docker run` did not affect the behavior of `docker compose`. The `nvidia-smi` inside ollama would still detect a number of GPUs equal to the number set in `count:`. Now, using `count: all` allows for dynamic detection and utilization of all available GPUs in ollama container. **Bumped Ollama to v0.1.33**: As it offers improved multi-GPU workload optimization and enhanced support for llama3 and mixtral. - **Bumped Cat Core to v1.6.1**: Just to bring in the latest features and bug fixes from the Cat. - **Bumped Qdrant to the latest patch v1.7.4**: I'm currently verifying the compatibility of Qdrant v1.9.1 with Cheshire vectorDB calls.

pieroit · 2024-05-07T09:22:34Z

Thanks @MRColorR !
@valentimarco do you have time to review? Otherwise I'll check it out

valentimarco · 2024-05-07T11:12:14Z

On discord there are some support request regarding ollama with 1.6.1 of the cat, do you have the same problem?

MRColorR · 2024-05-07T15:27:11Z

On discord there are some support request regarding ollama with 1.6.1 of the cat, do you have the same problem?

ok i'm reading on Discord right now. After reading them i'm confident on the following:

For now i can confirm the multi GPU part count: all edit
regarding the other edits , for now i reverted the ones tha can cause issues, i'm starting fresh again on a different local-cat repo clone so i can double check and confirm that ollama 0.1.32 and ollama 0.1.33 work well with the cat or not

thanks @valentimarco for the headsup :)

…ation

MRColorR · 2024-05-07T16:42:39Z

I have conducted tests on a clean local-cat repo clone i've found the following:

	Qdrant 1.7.4	Qdrant 1.9.1	Ollama 0.1.28	Ollama 0.1.33
Cat_Core 1.5.1	✓	✓	✓	✓
Cat_Core 1.5.3	✓	✓	✓	✓
Cat_Core 1.6.1	✓	✓	✗	✗

details:

cat_core:
- logs shows deprecation warning about LLMSingleActionAgent deprecation in LangChain 0.1.0
- no errors until 1.6.1
  - core 1.6.1 log shows some errors:
    - with the new ollama 0.1.33 the error is: AttributeError: File "/app/cat/factory/ollama_utils.py", line 121, in _acreate_stream_patch
      optional_detail = await response.json().get("error")
      AttributeError: 'coroutine' object has no attribute 'get'
    - with the old ollama 0.1.28 the error is: ERROR cat.looking_glass.stray_cat.StrayCat.run::410
      NameError("name 'OllamaEndpointNotFoundError' is not defined") File "/app/cat/factory/ollama_utils.py", line 117, in _acreate_stream_patch
      raise OllamaEndpointNotFoundError( # noqa: F821
      NameError: name 'OllamaEndpointNotFoundError' is not defined
      - rolling back from the 1.6.1 version does not solve the problem in my case and the error remains.
qdrant
- no errors or warnings in logs
- tried a PDF ingestion , ingested without errors and the info were then used in the cat response
ollama
- no errors or warnings in logs
- count: all edit already in place and invidia-smi inside ollama container reports all the GPUs also as expected

Model in use: llama3:70b-instruct-q2_K
Embedder: intfloat/multilingual-e5-large

MRColorR · 2024-05-07T17:04:34Z

so for now the latest commit appears to be the most up-to-date version achievable. However, I’m not entirely confident about its stability, so I would greatly appreciate if you could spare a moment to double-check it.
As for the count: all edit, it appears to be working well in all configurations. This is likely because it pertains more to Docker than to the application code itself.

Hope this can help the cheshirecat project :) thank you for your time and consideration

valentimarco · 2024-05-07T17:10:37Z

ok, 1.6.1 is broken but soon we going to recover it. At the moment i gonna wait the new release and than merge the PR.
Thank you again for this PR

pieroit · 2024-05-07T21:15:59Z

@MRColorR kudos for how systematically you tested
@valentimarco thanks for checking, 1.6.2 will work ;)

REV - dropped edits under inverstigation

7dc953e

CHG - bumped cat_core ollama and qdrant mantaining a working configur…

23a393f

…ation

valentimarco mentioned this pull request May 31, 2024

Bump up ollama to the last version and adjust readme #14

Merged

Merge branch 'main' into patch-1

7ebf73d

valentimarco merged commit 87cc962 into cheshire-cat-ai:main Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13

CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13

MRColorR commented May 7, 2024

pieroit commented May 7, 2024

valentimarco commented May 7, 2024

MRColorR commented May 7, 2024 •

edited

MRColorR commented May 7, 2024 •

edited

MRColorR commented May 7, 2024

valentimarco commented May 7, 2024

pieroit commented May 7, 2024

CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13

CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13

Conversation

MRColorR commented May 7, 2024

pieroit commented May 7, 2024

valentimarco commented May 7, 2024

MRColorR commented May 7, 2024 • edited

MRColorR commented May 7, 2024 • edited

MRColorR commented May 7, 2024

valentimarco commented May 7, 2024

pieroit commented May 7, 2024

MRColorR commented May 7, 2024 •

edited

MRColorR commented May 7, 2024 •

edited