GPTQ & 4bit #180

olihough86 · 2023-04-22T18:42:09Z

My apologies if this is a really stupid question... but

Is there scope here to provide the ability to load 4bit models? such as vicuna-13B-1.1-GPTQ-4bit-128g or even 4bit 30B llama models will squeeze into 24GB VRAM. I know this can all be done in other web-ui projects, but having an OpenAI like API such as this project would be amazing.

olihough86 · 2023-04-22T20:22:09Z

I'm a moron and didn't check the closed issues...

AntouanK · 2023-06-08T12:30:41Z

@olihough86 How did you get it to work?

djmaze · 2023-06-13T23:00:55Z

GPTQ seems not supported yet, only QLora. This issue should be reopened.

olihough86 closed this as completed Apr 22, 2023

peakji mentioned this issue Apr 23, 2023

Getting error for model when using vicuna model #152

Closed

fardeon added the duplicate This issue or pull request already exists label Apr 24, 2023

peakji reopened this Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ & 4bit #180

GPTQ & 4bit #180

olihough86 commented Apr 22, 2023

olihough86 commented Apr 22, 2023

AntouanK commented Jun 8, 2023

djmaze commented Jun 13, 2023

GPTQ & 4bit #180

GPTQ & 4bit #180

Comments

olihough86 commented Apr 22, 2023

olihough86 commented Apr 22, 2023

AntouanK commented Jun 8, 2023

djmaze commented Jun 13, 2023