Skip to content

v0.1.35

Compare
Choose a tag to compare
@github-actions github-actions released this 10 May 15:15
· 95 commits to main since this release
86f9b58

New models

  • Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

What's Changed

  • Quantization: ollama create can now quantize models when importing them using the --quantize or -q flag:
ollama create -f Modelfile --quantize q4_0 mymodel

Note

--quantize works when importing float16 or float32 models:

  • From a binary GGUF files (e.g. FROM ./model.gguf)
  • From a library model (e.g. FROM llama3:8b-instruct-fp16)
  • Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
  • Fixed a series out of memory errors when loading models on multi-GPU systems
  • Ctrl+J characters will now properly add newlines in ollama run
  • Fixed issues when running ollama show for vision models
  • OPTIONS requests to the Ollama API will no longer result in errors
  • Fixed issue where partially downloaded files wouldn't be cleaned up
  • Added a new done_reason field in responses describing why generation stopped responding
  • Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another

New Contributors

Full Changelog: v0.1.34...v0.1.35