We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi I used weight int4, but when I run inference, finding that weight is actually int16, is my pipeline wrong
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
below is my script to do quant
python -m awq.entry --model_path $MODEL --w_bit 4 --q_group_size 128 --run_awq --dump_awq awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128.pt
python -m awq.entry --model_path $MODEL --w_bit 4 --q_group_size 128 --load_awq awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128.pt --q_backend real --dump_quant awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128-awq.pt
I get this, weight is fake int4, in calculation, actually is int16
If it's convenient for you, could you explain it?
No branches or pull requests
Hi I used weight int4, but when I run inference, finding that weight is actually int16, is my pipeline wrong
The text was updated successfully, but these errors were encountered: