Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHG - Enhanced GPU Support and Dependency Updates on docker-compose.yml #13

Merged
merged 4 commits into from
Jun 1, 2024

Conversation

MRColorR
Copy link
Contributor

@MRColorR MRColorR commented May 7, 2024

  • Fixed GPU allocation in Ollama: gpus=all flag used in docker run did not affect the behavior of docker compose. The nvidia-smi inside ollama would still detect a number of GPUs equal to the number set in count:.
    • Now, using count: all allows for dynamic detection and utilization of all available GPUs in ollama container.
  • Bumped Ollama to v0.1.33: As it offers improved multi-GPU workload optimization and enhanced support for llama3 and mixtral.
  • Bumped Cat Core to v1.6.1: Just to bring in the latest features and bug fixes from the Cat.
  • Bumped Qdrant to the latest patch v1.7.4: I'm currently verifying the compatibility of Qdrant v1.9.1 with Cheshire vectorDB calls.

- **Fixed GPU allocation in Ollama**: `gpus=all` flag used in `docker run` did not affect the behavior of `docker compose`. The `nvidia-smi` inside ollama would still detect a number of GPUs equal to the number set in `count:`. Now, using `count: all` allows for dynamic detection and utilization of all available GPUs in ollama container.
**Bumped Ollama to v0.1.33**: As it offers improved multi-GPU workload optimization and enhanced support for llama3 and mixtral.
- **Bumped Cat Core to v1.6.1**: Just to bring in the latest features and bug fixes from the Cat.
- **Bumped Qdrant to the latest patch v1.7.4**: I'm currently verifying the compatibility of Qdrant v1.9.1 with Cheshire vectorDB calls.
@pieroit
Copy link
Member

pieroit commented May 7, 2024

Thanks @MRColorR !
@valentimarco do you have time to review? Otherwise I'll check it out

@valentimarco
Copy link
Collaborator

On discord there are some support request regarding ollama with 1.6.1 of the cat, do you have the same problem?

@MRColorR
Copy link
Contributor Author

MRColorR commented May 7, 2024

On discord there are some support request regarding ollama with 1.6.1 of the cat, do you have the same problem?

ok i'm reading on Discord right now. After reading them i'm confident on the following:

  • For now i can confirm the multi GPU part count: all edit
  • regarding the other edits , for now i reverted the ones tha can cause issues, i'm starting fresh again on a different local-cat repo clone so i can double check and confirm that ollama 0.1.32 and ollama 0.1.33 work well with the cat or not

thanks @valentimarco for the headsup :)

@MRColorR
Copy link
Contributor Author

MRColorR commented May 7, 2024

I have conducted tests on a clean local-cat repo clone i've found the following:

Qdrant 1.7.4 Qdrant 1.9.1 Ollama 0.1.28 Ollama 0.1.33
Cat_Core 1.5.1
Cat_Core 1.5.3
Cat_Core 1.6.1

details:

  • cat_core:
    • logs shows deprecation warning about LLMSingleActionAgent deprecation in LangChain 0.1.0
    • no errors until 1.6.1
      • core 1.6.1 log shows some errors:
        • with the new ollama 0.1.33 the error is: AttributeError: File "/app/cat/factory/ollama_utils.py", line 121, in _acreate_stream_patch
          optional_detail = await response.json().get("error")
          AttributeError: 'coroutine' object has no attribute 'get'
        • with the old ollama 0.1.28 the error is: ERROR cat.looking_glass.stray_cat.StrayCat.run::410
          NameError("name 'OllamaEndpointNotFoundError' is not defined") File "/app/cat/factory/ollama_utils.py", line 117, in _acreate_stream_patch
          raise OllamaEndpointNotFoundError( # noqa: F821
          NameError: name 'OllamaEndpointNotFoundError' is not defined
          • rolling back from the 1.6.1 version does not solve the problem in my case and the error remains.
  • qdrant
    • no errors or warnings in logs
    • tried a PDF ingestion , ingested without errors and the info were then used in the cat response
  • ollama
    • no errors or warnings in logs
    • count: all edit already in place and invidia-smi inside ollama container reports all the GPUs also as expected

Model in use: llama3:70b-instruct-q2_K
Embedder: intfloat/multilingual-e5-large

@MRColorR
Copy link
Contributor Author

MRColorR commented May 7, 2024

  • so for now the latest commit appears to be the most up-to-date version achievable. However, I’m not entirely confident about its stability, so I would greatly appreciate if you could spare a moment to double-check it.
  • As for the count: all edit, it appears to be working well in all configurations. This is likely because it pertains more to Docker than to the application code itself.

Hope this can help the cheshirecat project :) thank you for your time and consideration

@valentimarco
Copy link
Collaborator

ok, 1.6.1 is broken but soon we going to recover it. At the moment i gonna wait the new release and than merge the PR.
Thank you again for this PR

@pieroit
Copy link
Member

pieroit commented May 7, 2024

@MRColorR kudos for how systematically you tested
@valentimarco thanks for checking, 1.6.2 will work ;)

@valentimarco valentimarco merged commit 87cc962 into cheshire-cat-ai:main Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants