Skip to content

Releases: ollama/ollama

v0.1.37

12 May 01:59
41ba301
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where models with uppercase characters in the name would not show with ollama list
  • Fixed usage string for ollama create
  • Fix finish_reason being "" instead of null in the Open-AI compatible chat API.

New Contributors

Full Changelog: v0.1.36...v0.1.37

v0.1.36

11 May 06:37
Compare
Choose a tag to compare

What's Changed

  • Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
  • Fixed rare out of memory errors when loading a model to run with CPU

Full Changelog: v0.1.35...v0.1.36

v0.1.35

10 May 15:15
86f9b58
Compare
Choose a tag to compare

New models

  • Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

What's Changed

  • Quantization: ollama create can now quantize models when importing them using the --quantize or -q flag:
ollama create -f Modelfile --quantize q4_0 mymodel

Note

--quantize works when importing float16 or float32 models:

  • From a binary GGUF files (e.g. FROM ./model.gguf)
  • From a library model (e.g. FROM llama3:8b-instruct-fp16)
  • Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
  • Fixed a series out of memory errors when loading models on multi-GPU systems
  • Ctrl+J characters will now properly add newlines in ollama run
  • Fixed issues when running ollama show for vision models
  • OPTIONS requests to the Ollama API will no longer result in errors
  • Fixed issue where partially downloaded files wouldn't be cleaned up
  • Added a new done_reason field in responses describing why generation stopped responding
  • Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another

New Contributors

Full Changelog: v0.1.34...v0.1.35

v0.1.34

07 May 05:13
adeb40e
Compare
Choose a tag to compare

Ollama goes on an adventure to hunt down bugs

New models

  • Llava Llama 3: A new high-performing LLaVA model fine-tuned from Llama 3 Instruct.
  • Llava Phi 3: A new small LLaVA model fine-tuned from Phi 3.
  • StarCoder2 15B Instruct: A new instruct fine-tune of the StarCoder2 model
  • CodeGemma 1.1: A new release of the CodeGemma model.
  • StableLM2 12B: A new 12B version of the StableLM 2 model from Stability AI
  • Moondream 2: Moondream 2's runtime parameters have been improved for better responses

What's Changed

  • Fixed issues with LLaVa models where they would respond incorrectly after the first request
  • Fixed out of memory errors when running large models such as Llama 3 70B
  • Fixed various issues with Nvidia GPU discovery on Linux and Windows
  • Fixed a series of Modelfile errors when running ollama create
  • Fixed no slots available error that occurred when cancelling a request and then sending follow up requests
  • Improved AMD GPU detection on Fedora
  • Improved reliability when using the experimental OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED flags
  • ollama serve will now shut down quickly, even if a model is loading

New Contributors

Full Changelog: v0.1.33...v0.1.34

v0.1.33

28 Apr 17:51
9164b01
Compare
Choose a tag to compare

Llama 3

New models:

  • Llama 3: a new model by Meta, and the most capable openly available LLM to date
  • Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
  • Moondream moondream is a small vision language model designed to run efficiently on edge devices.
  • Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
  • Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
  • Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

  • Fixed issues where the model would not terminate, causing the API to hang.
  • Fixed a series of out of memory errors on Apple Silicon Macs
  • Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

  • OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
  • OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

Full Changelog: v0.1.32...v0.1.33

v0.1.32

10 Apr 23:01
fb9580d
Compare
Choose a tag to compare

picture of ollama levelling up

New models

  • WizardLM 2: State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.
    • wizardlm2:8x22b: large 8x22B model based on Mixtral 8x22B
    • wizardlm2:7b: fast, high-performing model based on Mistral 7B
  • Snowflake Arctic Embed: A suite of text embedding models by Snowflake, optimized for performance.
  • Command R+: a powerful, scalable large language model purpose-built for RAG use cases
  • DBRX: A large 132B open, general-purpose LLM created by Databricks.
  • Mixtral 8x22B: the new leading Mixture of Experts (MoE) base model by Mistral AI.

What's Changed

  • Ollama will now better utilize available VRAM, leading to less out-of-memory errors, as well as better GPU utilization
  • When running larger models that don't fit into VRAM on macOS, Ollama will now split the model between GPU and CPU to maximize performance.
  • Fixed several issues where Ollama would hang upon encountering an error
  • Fix issue where using quotes in OLLAMA_ORIGINS would cause an error

New Contributors

Full Changelog: v0.1.31...v0.1.32

v0.1.31

05 Apr 16:09
Compare
Choose a tag to compare

ollama embedding

Ollama supports embedding models. Bring your existing documents or other data, and combine it with text prompts to build RAG (retrieval augmented generation) apps using the Ollama REST API, Python or Javascript libraries.

New models

  • Qwen 1.5 32B: A new 32B multilingual model competitive with larger models such as Mixtral
  • StarlingLM Beta: A high ranking 7B model on popular benchmarks that includes a permissive Apache 2.0 license.
  • DolphinCoder StarCoder 7B: A 7B uncensored variant of the Dolphin model family that excels at coding, based on StarCoder2 7B.
  • StableLM 1.6 Chat: A new version of StableLM 1.6 tuned for instruction

What's new

  • Fixed issue where Ollama would hang when using certain unicode characters in the prompt such as emojis

Full Changelog: v0.1.30...v0.1.31

v0.1.30

26 Mar 18:19
756c257
Compare
Choose a tag to compare
Ollama now supports Cohere's Command R model

New models

  • Command R: a Large Language Model optimized for conversational interaction and long context tasks.
  • mxbai-embed-large: A new state-of-the-art large embedding model

What's Changed

  • Fixed various issues with ollama run on Windows
    • History now will work when pressing up and down arrow keys
    • Right and left arrow keys will now move the cursor appropriately
    • Pasting multi-line strings will now work on Windows
  • Fixed issue where mounting or sharing files between Linux and Windows (e.g. via WSL or Docker) would cause errors due to having : in the filename.
  • Improved support for AMD MI300 and MI300X Accelerators
  • Improved cleanup of temporary files resulting in better space utilization

Important change

For filesystem compatibility, Ollama has changed model data filenames to use - instead of :. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:

find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;

New Contributors

Full Changelog: v0.1.29...v0.1.30

v0.1.29

10 Mar 02:24
e87c780
Compare
Choose a tag to compare

AMD Preview

Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features are now accelerated by AMD graphics cards, and support is included by default in Ollama for Linux, Windows and Docker.

Supported cards and accelerators

Family Supported cards and accelerators
AMD Radeon RX 7900 XTX 7900 XT 7900 GRE 7800 XT 7700 XT 7600 XT 7600
6950 XT 6900 XTX 6900XT 6800 XT 6800
Vega 64 Vega 56
AMD Radeon PRO W7900 W7800 W7700 W7600 W7500
W6900X W6800X Duo W6800X W6800
V620 V420 V340 V320
Vega II Duo Vega II VII SSG
AMD Instinct MI300X MI300A MI300
MI250X MI250 MI210 MI200
MI100 MI60 MI50

What's Changed

  • ollama <command> -h will now show documentation for supported environment variables
  • Fixed issue where generating embeddings with nomic-embed-text, all-minilm or other embedding models would hang on Linux
  • Experimental support for importing Safetensors models using the FROM <directory with safetensors model> command in the Modelfile
  • Fixed issues where Ollama would hang when using JSON mode.
  • Fixed issue where ollama run would error when piping output to tee and other tools
  • Fixed an issue where memory would not be released when running vision models
  • Ollama will no longer show an error message when piping to stdin on Windows

New Contributors

Full Changelog: v0.1.28...v0.1.29

v0.1.28

01 Mar 06:41
21347e1
Compare
Choose a tag to compare

New models

  • StarCoder2: the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters.
  • DolphinCoder: a chat model based on StarCoder2 15B that excels at writing code.

What's Changed

  • Vision models such as llava should now respond better to text prompts
  • Improved support for llava 1.6 models
  • Fixed issue where switching between models repeatedly would cause Ollama to hang
  • Installing Ollama on Windows no longer requires a minimum of 4GB disk space
  • Ollama on macOS will now more reliably determine available VRAM
  • Fixed issue where running Ollama in podman would not detect Nvidia GPUs
  • Ollama will correctly return an empty embedding when calling /api/embeddings with an empty prompt instead of hanging

New Contributors

Full Changelog: v0.1.27...v0.1.28