Releases · intel/auto-round

Overview

AutoRound introduces an innovative weight-only quantization algorithm designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models including gemma-7B, Mistral-7b, Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1, Phi2, LLAMA2 and more at W4G128. AutoRound consistently outperforms established methods across the majority of scenarios at W4G128, W4G-1, W3G128, and W2G128 .

Key Features

Wide Model Support: AutoRound caters to a diverse range of model families. About 20 model families have been verified.
Export Flexibility: Effortlessly export quantized models to ITREX[1] and AutoGPTQ[2] formats for seamless deployment on Intel CPU and Nvidia GPU platforms respectively.
Device Compatibility: Compatible with tuning devices including Intel CPUs, Intel Guadi2, and Nvidia GPUs.
Dataset Flexibility: AutoRound supports calibration with Pile10k and MBPP datasets, with easy extensibility to incorporate additional datasets.

Examples

Explore language modeling and code generation examples to unlock the full potential of AutoRound.

Additional Benefits

PreQuantized Models: Access a variety of pre-quantized models on Hugging Face for immediate integration into your projects, with more models under review and coming soon.
Comprehensive Accuracy Data: Simplified user deployment with extensive accuracy data provided.

Known issues:

baichuan-inc/Baichuan2-13B-Chat has some issues, we will support it soon

Reference:

[1] https://github.com/intel/intel-extension-for-transformers

[2] https://github.com/AutoGPTQ/AutoGPTQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: intel/auto-round

Intel® auto-round v0.1 Release