Skip to content

Releases: marella/ctransformers

0.2.27

10 Sep 15:20
Compare
Choose a tag to compare

Changes

  • Skip evaluating tokens that are evaluated in the past. This can significantly speed up prompt processing in chat applications that prepend previous messages to prompt.
  • Deprecate LLM.reset() method. Use high-level API instead.
  • Add support for batching and beam search to 🤗 model.
  • Remove universal binary option when building for AVX2, AVX on macOS.

0.2.26

30 Aug 21:43
Compare
Choose a tag to compare

Changes

  • Add support for 🤗 Transformers

0.2.25

29 Aug 00:31
Compare
Choose a tag to compare

Changes

  • Add support for GGUF v2
  • Add CUDA support for Falcon GGUF models
  • Add ROCm support
  • Add low-level API for add_bos_token, bos_token_id

0.2.24

24 Aug 23:38
Compare
Choose a tag to compare

Changes

  • Add GGUF format support for Llama and Falcon models
  • Add support for Code Llama models

0.2.23

20 Aug 19:20
Compare
Choose a tag to compare

Changes

  • Add mmap and mlock parameters for LLaMA and Falcon models
  • Add revision option for models on Hugging Face Hub

0.2.22

12 Aug 15:22
Compare
Choose a tag to compare

Changes

  • Add experimental CUDA support for StarCoder, StarChat models
  • Add gpt_bigcode as model type for StarCoder, StarChat models
  • Fix loading GPTQ models from a local path

0.2.21

07 Aug 19:00
Compare
Choose a tag to compare

Changes

  • Simplify CUDA installation by using precompiled runtime libraries from NVIDIA

0.2.20

05 Aug 18:50
Compare
Choose a tag to compare

Changes

  • Add experimental CUDA support for MPT models

0.2.19

04 Aug 22:32
Compare
Choose a tag to compare

Changes

  • Add Metal support for LLaMA 2 70B models
  • Update llama.cpp

0.2.18

02 Aug 20:09
Compare
Choose a tag to compare

Changes

  • Add experimental support for GPTQ models using ExLlama