Releases · ollama/ollama

10 Mar 02:24

v0.1.29

e87c780

v0.1.29

AMD Preview

Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features are now accelerated by AMD graphics cards, and support is included by default in Ollama for Linux, Windows and Docker.

Supported cards and accelerators

Family	Supported cards and accelerators
AMD Radeon RX	`7900 XTX` `7900 XT` `7900 GRE` `7800 XT` `7700 XT` `7600 XT` `7600` `6950 XT` `6900 XTX` `6900XT` `6800 XT` `6800` `Vega 64` `Vega 56`
AMD Radeon PRO	`W7900` `W7800` `W7700` `W7600` `W7500` `W6900X` `W6800X Duo` `W6800X` `W6800` `V620` `V420` `V340` `V320` `Vega II Duo` `Vega II` `VII` `SSG`
AMD Instinct	`MI300X` `MI300A` `MI300` `MI250X` `MI250` `MI210` `MI200` `MI100` `MI60` `MI50`

What's Changed

ollama <command> -h will now show documentation for supported environment variables
Fixed issue where generating embeddings with nomic-embed-text, all-minilm or other embedding models would hang on Linux
Experimental support for importing Safetensors models using the FROM <directory with safetensors model> command in the Modelfile
Fixed issues where Ollama would hang when using JSON mode.
Fixed issue where ollama run would error when piping output to tee and other tools
Fixed an issue where memory would not be released when running vision models
Ollama will no longer show an error message when piping to stdin on Windows

New Contributors

@tgraupmann made their first contribution in #2582
@andersrex made their first contribution in #2909
@leonid20000 made their first contribution in #2440
@hishope made their first contribution in #2973
@mrdjohnson made their first contribution in #2759
@mofanke made their first contribution in #3077
@racerole made their first contribution in #3073
@Chris-AS1 made their first contribution in #3094

Full Changelog: v0.1.28...v0.1.29

Contributors

tgraupmann, andersrex, and 6 other contributors

Assets 8

01 Mar 06:41

jmorganca

v0.1.28

21347e1

v0.1.28

New models

StarCoder2: the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters.
DolphinCoder: a chat model based on StarCoder2 15B that excels at writing code.

What's Changed

Vision models such as llava should now respond better to text prompts
Improved support for llava 1.6 models
Fixed issue where switching between models repeatedly would cause Ollama to hang
Installing Ollama on Windows no longer requires a minimum of 4GB disk space
Ollama on macOS will now more reliably determine available VRAM
Fixed issue where running Ollama in podman would not detect Nvidia GPUs
Ollama will correctly return an empty embedding when calling /api/embeddings with an empty prompt instead of hanging

New Contributors

@Bin-Huang made their first contribution in #1706
@elthommy made their first contribution in #2737
@peanut256 made their first contribution in #2354
@tylinux made their first contribution in #2827
@fred-bf made their first contribution in #2780
@bmwiedemann made their first contribution in #2836

Full Changelog: v0.1.27...v0.1.28

Contributors

bmwiedemann, tylinux, and 4 other contributors

Assets 7

22 Feb 22:55

jmorganca

v0.1.27

8782dd5

v0.1.27

Gemma

Gemma is a new, top-performing family of lightweight open models built by Google. Available in 2b and 7b parameter sizes:

ollama run gemma:2b
ollama run gemma:7b (default)

What's Changed

Performance improvements (up to 2x) when running Gemma models
Fixed performance issues on Windows without GPU acceleration. Systems with AVX and AVX2 instruction sets should be 2-4x faster.
Reduced likelihood of false positive Windows Defender alerts on Windows.

New Contributors

@joshyan1 made their first contribution in #2657
@pfrankov made their first contribution in #2138
@adminazhar made their first contribution in #2686
@b-tocs made their first contribution in #2510
@Yuan-ManX made their first contribution in #2249
@langchain4j made their first contribution in #1690
@logancyang made their first contribution in #1918

Full Changelog: v0.1.26...v0.1.27

Contributors

pfrankov, logancyang, and 5 other contributors

Assets 7

21 Feb 04:20

jmorganca

v0.1.26

2a7553c

v0.1.26

What's Changed

Support for bert and nomic-bert embedding models
Fixed issue where system prompt and prompt template would not be updated when loading a new model
Quotes will now be trimmed around the value of the OLLAMA_HOST on Windows
Fixed duplicate button issue on the Windows taskbar menu.
Fixed issue where system prompt would be be overridden when using the /api/chat endpoint
Hardened AMD driver lookup logic
Fixed issue where two versions of Ollama on Windows would run at the same time
Fixed issue where memory would not be released after a model is unloaded with modern CUDA-enabled GPUs
Fixed issue where AVX2 was required for GPU on Windows machines with GPUs
Fixed issue where /bye or /exit would not work with trailing spaces or characters after them

New Contributors

@tristanbob made their first contribution in #2545
@justinh-rahb made their first contribution in #2563
@gerazov made their first contribution in #2188
@eddumelendez made their first contribution in #2164
@lulzshadowwalker made their first contribution in #2381
@jakobhoeg made their first contribution in #2466
@jdetroyes made their first contribution in #1673
@djcopley made their first contribution in #1767
@pythops made their first contribution in #2329
@ttsugriy made their first contribution in #2511
@medoror made their first contribution in #2180
@nikeshparajuli made their first contribution in #1775
@n4ze3m made their first contribution in #2447

Full Changelog: v0.1.25...v0.1.26

Contributors

ttsugriy, eddumelendez, and 11 other contributors

Assets 7

14 Feb 07:21

jmorganca

v0.1.25

42e77e2

v0.1.25

Windows Preview

Ollama is now available on Windows in preview. Download it here. Ollama on Windows makes it possible to pull, run and create large language models in a new native Windows experience. It includes built-in GPU acceleration, access to the full model library, and the Ollama API including OpenAI compatibility.

What's Changed

Ollama on Windows is now available in preview.
Fixed an issue where requests would hang after being repeated several times
Ollama will now correctly error when provided an unsupported image format
Fixed issue where ollama serve wouldn't immediately quit when receiving a termination signal
Fixed issues with prompt templating for the /api/chat endpoint, such as where Ollama would omit the second system prompt in a series of messages
Fixed issue where providing an empty list of messages would return a non-empty response instead of loading the model
Setting a negative keep_alive value (e.g. -1) will now correctly keep the model loaded indefinitely

New Contributors

@lebrunel made their first contribution in #2477
@bnorick made their first contribution in #2480

Full Changelog: v0.1.24...v0.1.25

Contributors

bnorick and lebrunel

Assets 7

08 Feb 03:02

jmorganca

v0.1.24

69f392c

v0.1.24

OpenAI Compatibility

This release adds initial compatibility support for the OpenAI Chat Completions API.

Usage with cURL

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

New Models

Qwen 1.5: Qwen 1.5 is a new family of large language models by Alibaba Cloud spanning from 0.5B to 72B.

What's Changed

Fixed issue where requests to /api/chat would hang when providing empty user messages repeatedly
Fixed issue on macOS where Ollama would return a missing library error after being open for a long period of time

New Contributors

@easp made their first contribution in #2340
@mraiser made their first contribution in #1849

Full Changelog: v0.1.23...v0.1.24

Contributors

easp and mraiser

Assets 6

02 Feb 06:34

jmorganca

v0.1.23

09a6f76

v0.1.23

New vision models

The LLaVA model family on Ollama has been updated to version 1.6, and now includes a new 34b version:

ollama run llava A new 7B LLaVA model based on mistral.
ollama run llava:13b 13B LLaVA model
ollama run llava:34b 34B LLaVA model – one of the most powerful open-source vision models available

These new models share new improvements:

More permissive licenses: LLaVA 1.6 models are distributed via the Apache 2.0 license or the LLaMA 2 Community License.
Higher image resolution: support for up to 4x more pixels, allowing the model to grasp more details.
Improved text recognition and reasoning capabilities: these models are trained on additional document, chart and diagram data sets.

`keep_alive` parameter: control how long models stay loaded

When making API requests, the new keep_alive parameter can be used to control how long a model stays loaded in memory:

curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "Why is the sky blue?",
  "keep_alive": "30s"
}'

If set to a positive duration (e.g. 20m, 1hr or 30), the model will stay loaded for the provided duration
If set to a negative duration (e.g. -1), the model will stay loaded indefinitely
If set to 0, the model will be unloaded immediately once finished
If not set, the model will stay loaded for 5 minutes by default

Support for more Nvidia GPUs

GeForce GTX TITAN X 980 Ti 980 970 960 950 750 Ti 750
GeForce GTX 980M 970M 965M 960M 950M 860M 850M
GeForce 940M 930M 910M 840M 830M
Quadro M6000 M5500M M5000 M2200 M1200 M620 M520
Tesla M60 M40
NVS 810

What's Changed

New keep_alive API parameter to control how long models stay loaded
Image paths can now be provided to ollama run when running multimodal models
Fixed issue where downloading models via ollama pull would slow down to 99%
Fixed error when running Ollama with Nvidia GPUs and CPUs without AVX instructions
Support for additional Nvidia GPUs (compute capability 5)
Fixed issue where system prompt would be repeated in subsequent messages
ollama serve will now print prompt when OLLAMA_DEBUG=1 is set
Fixed issue where exceeding context size would cause erroneous responses in ollama run and the /api/chat API
ollama run will now allow sending messages without images to multimodal models

New Contributors

@jaglinux made their first contribution in #2224
@textspur made their first contribution in #2252
@rjmacarthy made their first contribution in #1950
@hugo53 made their first contribution in #1957
@RussellCanfield made their first contribution in #2313

Full Changelog: v0.1.22...v0.1.23

Contributors

hugo53, jaglinux, and 3 other contributors

Assets 6

26 Jan 18:19

jmorganca

v0.1.22

a47d8b2

v0.1.22

New models

Stable LM 2: A state-of-the-art 1.6B small language model.

What's Changed

Fixed issue with Nvidia GPU detection that would cause Ollama to error instead of falling back to CPU
Fixed issue where AMD integrated GPUs caused an error

Full Changelog: v0.1.21...v0.1.22

Assets 6

20 Jan 01:20

jmorganca

v0.1.21

3ebd6a8

v0.1.21

New models

Qwen: Qwen is a series of large language models by Alibaba Cloud spanning from 1.8B to 72B parameters.
DuckDB-NSQL: A text-to-sql LLM for DuckDB
Stable Code: A new code completion model on par with Code Llama 7B and similar models.
Nous Hermes 2 Mixtral: The Nous Hermes 2 model from Nous Research, now trained over Mixtral.

Saving and loading models and messages

Models can now be saved and loaded with /save <model> and /load <model> when using ollama run. This will save or load conversations and any model changes with /set parameter, /set system and more as a new model with the provided name.

`MESSAGE` modelfile command

Messages can now be specified in a Modelfile ahead of time using the MESSAGE command:

# example Modelfile
FROM llama2
SYSTEM You are a friendly assistant that only answers with 'yes' or 'no'
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes

After creating this model, running it will restore the message history. This is useful for techniques such as Chain-Of-Thought prompting

ollama create -f Modelfile yesno
ollama run yesno
>>> Is Toronto in Canada?
yes

>>> Is Sacramento in Canada?
no

>>> Is Ontario in Canada?
yes

>>> Is Havana in Canada?
no

Python and Javascript libraries

The first versions of the Python and JavaScript libraries for Ollama are now available.

Intel & AMD CPU improvements

Ollama now supports CPUs without AVX. This means Ollama will now run on older CPUs and in environments (such as virtual machines, Rosetta, GitHub actions) that don't provide support for AVX instructions. For newer CPUs that support AVX2, Ollama will receive a small performance boost, running models about 10% faster.

What's Changed

Support for a much broader set of CPUs, including CPUs without AVX instruction set support
If a GPU detection error is hit when attempting to run a model, Ollama will fallback to CPU
Fixed issue where generating responses with the same prompt would hang after around 20 requests
New MESSAGE Modelfile command to set the conversation history when building a model
Ollama will now use AVX2 for faster performance if available
Improved detection of Nvidia GPUs, especially in WSL
Fixed issue where models with LoRA layers may not load
Fixed incorrect error that would occur when retrying network connections in ollama pull and ollama push
Fixed issue where /show parameter would round decimal numbers
Fixed issue where upon hitting the context window limit, requests would hang

New Contributors

@fpreiss made their first contribution in #1921
@eavanvalkenburg made their first contribution in #1931
@0atman made their first contribution in #1924
@sachinsachdeva made their first contribution in #2021
@Arrendy made their first contribution in #2016
@purificant made their first contribution in #1958
@lainedfles made their first contribution in #1999

Full Changelog: v0.1.20...v0.1.21

Contributors

0atman, purificant, and 5 other contributors

Assets 6

11 Jan 07:12

jmorganca

v0.1.20

ab6be85

v0.1.20

New models

MegaDolphin: A new 120B version of the Dolphin model.
OpenChat: Updated to the latest version 3.5-0106.
Dolphin Mistral: Updated to the latest DPO Laser version, which achieves higher scores with more robust outputs.

What's Changed

Fixed additional cases where Ollama would fail with out of memory CUDA errors
Multi-GPU machines will now correctly allocate memory across all GPUs
Fixed issue where Nvidia GPUs would not be detected by Ollama

Full Changelog: v0.1.19...v0.1.20

Assets 6

Releases: ollama/ollama

v0.1.29

AMD Preview

Supported cards and accelerators

What's Changed

New Contributors

Contributors

v0.1.28

New models

What's Changed

New Contributors

Contributors

v0.1.27

Gemma

What's Changed

New Contributors

Contributors

v0.1.26

What's Changed

New Contributors

Contributors

v0.1.25

Windows Preview

What's Changed

New Contributors

Contributors

v0.1.24

OpenAI Compatibility

Usage with cURL

New Models

What's Changed

New Contributors

Contributors

v0.1.23

New vision models

keep_alive parameter: control how long models stay loaded

Support for more Nvidia GPUs

What's Changed

New Contributors

Contributors

v0.1.22

New models

What's Changed

v0.1.21

New models

Saving and loading models and messages

MESSAGE modelfile command

Python and Javascript libraries

Intel & AMD CPU improvements

What's Changed

New Contributors

Contributors

v0.1.20

New models

What's Changed

`keep_alive` parameter: control how long models stay loaded

`MESSAGE` modelfile command