Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weight int4 quantization, but actually it is int16 #162

Open
dongxuemin666 opened this issue Mar 19, 2024 · 4 comments
Open

Weight int4 quantization, but actually it is int16 #162

dongxuemin666 opened this issue Mar 19, 2024 · 4 comments

Comments

@dongxuemin666
Copy link

Hi I used weight int4, but when I run inference, finding that weight is actually int16, is my pipeline wrong
Uploading 屏幕截图 2024-03-19 112200.png…

@dongxuemin666
Copy link
Author

屏幕截图 2024-03-19 112200 image seems to be broken, please see this one

@dongxuemin666
Copy link
Author

below is my script to do quant

python -m awq.entry --model_path $MODEL
--w_bit 4 --q_group_size 128
--run_awq --dump_awq awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128.pt

python -m awq.entry --model_path $MODEL
--w_bit 4 --q_group_size 128
--load_awq awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128.pt
--q_backend real --dump_quant awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128-awq.pt

@dongxuemin666
Copy link
Author

I get this, weight is fake int4, in calculation, actually is int16

@ponytaill
Copy link

I get this, weight is fake int4, in calculation, actually is int16

If it's convenient for you, could you explain it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants