How can we make model calls faster #4493

userandpass · 2024-05-17T08:25:22Z

What is the issue?

I used docker to load multiple ollama images and distribute them using nginx, which was much slower than calling the deployed model directly

OS

Linux

GPU

Nvidia

CPU

No response

Ollama version

0.1.34

userandpass · 2024-05-17T09:32:37Z

After I added the "keep_alive": "24h" parameter, after a while I executed the nvidia-smi command, there was no ollama on the card, so I needed to call the interface to display it

userandpass added the bug Something isn't working label May 17, 2024

userandpass changed the title ~~How to solve the problem of automatically reloading the model on the card if it is called after a certain period of time~~ How can we make model calls faster May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can we make model calls faster #4493

How can we make model calls faster #4493

userandpass commented May 17, 2024 •

edited

userandpass commented May 17, 2024

How can we make model calls faster #4493

How can we make model calls faster #4493

Comments

userandpass commented May 17, 2024 • edited

What is the issue?

OS

GPU

CPU

Ollama version

userandpass commented May 17, 2024

userandpass commented May 17, 2024 •

edited