Skip to content

Releases: huggingface/text-generation-inference

v2.0.4

24 May 10:55
Compare
Choose a tag to compare

Main changes

What's Changed

New Contributors

Full Changelog: v2.0.3...v2.0.4

v2.0.3

16 May 05:05
40213c9
Compare
Choose a tag to compare

Important changes

What's Changed

New Contributors

Full Changelog: v2.0.2...v2.0.3

v2.0.2

01 May 07:22
6073ece
Compare
Choose a tag to compare

Tl;dr

  • New models (idefics2, phi3)
  • Cleaner VLM support in the openai layer
  • Upgraded to pytorch 2.3.0

What's Changed

New Contributors

Full Changelog: v2.0.1...v2.0.2

v2.0.1

18 Apr 15:22
Compare
Choose a tag to compare

What's Changed

  • feat: improve tools to include name and add tests by @drbh in #1693
  • Update response type for /v1/chat/completions and /v1/completions by @Wauplin in #1747
  • accept list as prompt for OpenAI API by @drbh in #1702
  • fix ROCm docker image

Full Changelog: v2.0.0...v2.0.1

v2.0.0

12 Apr 16:44
c38a7d7
Compare
Choose a tag to compare

TGI is back to Apache 2.0!

Highlights

  • License was reverted to Apache 2.0
  • Cuda graphs are now used by default. They improve latency substancially on high end nodes.
  • Llava-next was added. It is the second multimodal model available on TGI after Idefics.
  • Cohere Command R+ support. TGI is the fastest open source backend for Command R+
  • FP8 support.
  • We now share the vocabulary for all medusa heads, greatly improving latency and memory use.

Try out Command R+ with Medusa heads on 4xA100s with:

model=text-generation-inference/commandrplus-medusa
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model --speculate 3 --num-shard 4

What's Changed

New Contributors

Full Changelog: v1.4.5...v2.0.0

v.1.4.5

29 Mar 18:18
4ee0a0c
Compare
Choose a tag to compare

Highlights

  • DBRX support #1685. See #1679 on how to prompt the model.

What's Changed

Full Changelog: v1.4.4...v1.4.5

v.1.4.4

22 Mar 17:45
6c4496a
Compare
Choose a tag to compare

Highlights

  • CohereForAI/c4ai-command-r-v01 model support

What's Changed

New Contributors

Full Changelog: v1.4.3...v1.4.4

v1.4.3

28 Feb 15:14
e6bb3ff
Compare
Choose a tag to compare

Highlights

  • Add support for Starcoder 2
  • Add support for Qwen2

What's Changed

Full Changelog: v1.4.2...v1.4.3

v1.4.2

21 Feb 13:52
9c1cb81
Compare
Choose a tag to compare

Highlights

  • Add support for Google Gemma models

What's Changed

Full Changelog: v1.4.1...v1.4.2

v1.4.1

16 Feb 16:53
4139054
Compare
Choose a tag to compare

Highlights

What's Changed

New Contributors

Full Changelog: v1.4.0...v1.4.1