-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@wenhuach21 It appears that If confirmed incompatible, I will push commit to disable this for autogptq export by raising an error. |
yes, there are some issues with autogptq backend while intel gpu backend could support this by setting deployment device to "xpu". AutoGPTQ should have some way to support lm-head quantization I think, though I haven't studied yet |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…gptq is the packer. separation for future compat
@wenhuach21 May I have your permission to move this dual license in this file? Or another question is why only this file has a special dual/partial MIT license?
|
Hi, |
My bad. I thought autogptq was apache too. =) FYI, this PR is tested and ready to go when combined with autogptq PR #640. Will remove WIP tag once that base PR is merged. |
Thank you so much for your hard work. I have a question regarding your recent pull request (AutoGPTQ/AutoGPTQ#640). It appears there's an accuracy issue for sym=False in AutoGPTQ. With the format conversion from v2 to v1 in this PR, does the v1 version still preserve the accuracy issue? Besides, as we plan to add support for Marlin later on, are there any specific considerations we should be mindful of? |
for more information, see https://pre-commit.ci
There was lots of discussion and tests made to see if the v2 -> v1 with sym=False has accuracy degradation. Based on the all the test results I have done using ppl and human eval, I believe the accuracy degration of sym=False saving to v1 vs saving to v2 format is negligible. In fact, the autogptq PR currently has v1 as the default save checkpoint_format because of the real-world ppl results plus v1 has full compat with 3rd inference libs like vllm/sglang. However, it still needs to get reviewed. |
@wenhuach21 Here is the marlin kernel compat code used by autogptq https://github.com/AutoGPTQ/AutoGPTQ/blob/main/auto_gptq/utils/marlin_utils.py#L115 # Adapted from https://github.com/rib-2/marlin/tree/conversion
def _validate_marlin_compatibility(cfg: BaseQuantizeConfig):
if not MARLIN_AVAILABLE:
return f"AutoGPTQ is not compiled with the Marlin kernel, with the following error: {MARLIN_EXCEPTION}"
if cfg.bits != 4:
return f"The quantized model uses a bitwidth different than 4 (found {cfg.bits})"
if cfg.group_size != 128 and cfg.group_size != -1:
return "The quantized model uses a group size that is not 128 or -1 (found quantization_config.group_size)"
if not cfg.sym:
return "The quantized model uses asymmetric quantization"
if cfg.desc_act:
return "The quantized model uses act-order (also called desc-act) scheme"
if cfg.quant_method == QUANT_METHOD.AWQ:
return "awq_gemm format is currently not compatible with marlin"
return None |
@wenhuach21 Here is the latest result regarding v2 to v1 sym=False accuracy issue. We tested/ppl/human evals of models ranging from tinyllama 1.1b, llama3-8b, yi9b, and command-r-v01 (30b?) and it's all good news for sym=False and v2 -> v1 conversion on vllm where the underflows are not autocorrected. Turns out it matters very little if any. |
for more information, see https://pre-commit.ci
remove costly operations(ad3a7bb)
Update: We will keep this PR updated with main but it is currently not mergable since it depends on autogptq prs to be merged first. Chicken-n-egg problem. If this continues to drag on, I will fork autogpq so this can move forward. |
Reason for PR:
meta_set_quantizer(name, version)
apimeta_set
apiPending merge/changes to AutoGPTQ/AutoGPTQ#640