Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

GPTQ & 4bit #180

Open
olihough86 opened this issue Apr 22, 2023 · 3 comments
Open

GPTQ & 4bit #180

olihough86 opened this issue Apr 22, 2023 · 3 comments
Labels
duplicate This issue or pull request already exists

Comments

@olihough86
Copy link

My apologies if this is a really stupid question... but

Is there scope here to provide the ability to load 4bit models? such as vicuna-13B-1.1-GPTQ-4bit-128g or even 4bit 30B llama models will squeeze into 24GB VRAM. I know this can all be done in other web-ui projects, but having an OpenAI like API such as this project would be amazing.

@olihough86
Copy link
Author

I'm a moron and didn't check the closed issues...

@fardeon fardeon added the duplicate This issue or pull request already exists label Apr 24, 2023
@AntouanK
Copy link

AntouanK commented Jun 8, 2023

@olihough86 How did you get it to work?

@djmaze
Copy link

djmaze commented Jun 13, 2023

GPTQ seems not supported yet, only QLora. This issue should be reopened.

@peakji peakji reopened this Jun 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

5 participants