Quantization support #163

generalsvr · 2023-10-16T06:59:48Z

How to use 8bit quantized models? Can I run GGML/GGUF models?

hiworldwzj · 2023-10-16T09:24:17Z

8bit weightonly quantized only support llama now

generalsvr · 2023-10-16T09:32:26Z

Any examples?

hiworldwzj · 2023-10-17T06:14:37Z

parser.add_argument("--mode", type=str, default=[], nargs='+',
                    help="Model mode: [int8kv] [int8weight | int4weight]")

XHPlus · 2023-10-19T01:30:05Z

As for the model file format, we have not tested GGML/GGUF up to now. What is the motivation to use these formats?

JustinLin610 · 2023-10-19T16:59:39Z

Will GPTQ be supported?

suhjohn · 2023-11-14T19:57:42Z

@XHPlus There's a lot of open source models on HuggingFace driven by https://huggingface.co/TheBloke. Many people in the open source community use those quantized models on TGI / vLLM.

adi · 2024-02-08T18:34:38Z

parser.add_argument("--mode", type=str, default=[], nargs='+',
                    help="Model mode: [int8kv] [int8weight | int4weight]")

Using this option with Llama2-13B gives this error:

_get_exception_class.<locals>.Derived: 'LlamaTransformerLayerWeightQuantized' object has no attribute 'quantize_weight'

I tried both --mode int8kv int4weight and --mode int8kv int4weight

Any suggestions how to fix this?

VfBfoerst · 2024-03-07T10:04:26Z

@XHPlus Quantization is partially the only way to run bigger models in smaller GPUs, e.g. Mixtral. With vLLM, I can run mixtral quantized with 48 GBs of VRAM. The unquantized model would use up to 100GB VRam i guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization support #163

Quantization support #163

generalsvr commented Oct 16, 2023

hiworldwzj commented Oct 16, 2023

generalsvr commented Oct 16, 2023

hiworldwzj commented Oct 17, 2023

XHPlus commented Oct 19, 2023

JustinLin610 commented Oct 19, 2023

suhjohn commented Nov 14, 2023

adi commented Feb 8, 2024

VfBfoerst commented Mar 7, 2024

Quantization support #163

Quantization support #163

Comments

generalsvr commented Oct 16, 2023

hiworldwzj commented Oct 16, 2023

generalsvr commented Oct 16, 2023

hiworldwzj commented Oct 17, 2023

XHPlus commented Oct 19, 2023

JustinLin610 commented Oct 19, 2023

suhjohn commented Nov 14, 2023

adi commented Feb 8, 2024

VfBfoerst commented Mar 7, 2024