Skip to content

Releases: intel/auto-round

Intel® auto-round v0.1 Release

08 Mar 08:11
514aa49
Compare
Choose a tag to compare

Overview

AutoRound introduces an innovative weight-only quantization algorithm designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models including gemma-7B, Mistral-7b, Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1, Phi2, LLAMA2 and more at W4G128. AutoRound consistently outperforms established methods across the majority of scenarios at W4G128, W4G-1, W3G128, and W2G128 .

Key Features

  • Wide Model Support: AutoRound caters to a diverse range of model families. About 20 model families have been verified.
  • Export Flexibility: Effortlessly export quantized models to ITREX[1] and AutoGPTQ[2] formats for seamless deployment on Intel CPU and Nvidia GPU platforms respectively.
  • Device Compatibility: Compatible with tuning devices including Intel CPUs, Intel Guadi2, and Nvidia GPUs.
  • Dataset Flexibility: AutoRound supports calibration with Pile10k and MBPP datasets, with easy extensibility to incorporate additional datasets.

Examples

  • Explore language modeling and code generation examples to unlock the full potential of AutoRound.

Additional Benefits

  • PreQuantized Models: Access a variety of pre-quantized models on Hugging Face for immediate integration into your projects, with more models under review and coming soon.
  • Comprehensive Accuracy Data: Simplified user deployment with extensive accuracy data provided.

Known issues:

  • baichuan-inc/Baichuan2-13B-Chat has some issues, we will support it soon

Reference:

[1] https://github.com/intel/intel-extension-for-transformers

[2] https://github.com/AutoGPTQ/AutoGPTQ