Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87

Draft
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

Qubitium
Copy link

@Qubitium Qubitium commented Apr 24, 2024

Reason for PR:

  1. Fix compat with latest autogptq
  2. Store autoround fingerprint/version using meta_set_quantizer(name, version) api
  3. Store autoround specific parameters, unrelated to actual autogptq inference/quantization, into meta region via meta_set api
  4. add tqdm progress to quantization so user get good estimation of iter/s and remaining time

Pending merge/changes to AutoGPTQ/AutoGPTQ#640

  • Tested quant with autogptq inference with sym=True and False.

@Qubitium Qubitium changed the title WIP: fix compat with latest autogptq and use meta region to store autoround specific properties WIP: fix compat with latest autogptq and use meta region to store autoround properties Apr 25, 2024
@Qubitium Qubitium changed the title WIP: fix compat with latest autogptq and use meta region to store autoround properties WIP: fix compat with latest autogptq and use meta region to store auto-round properties Apr 25, 2024
@Qubitium
Copy link
Author

@wenhuach21 It appears that quant_lm_head is not compatible with autogptq. Can you confirm? We are getting nonsensical output when this is enabled.

If confirmed incompatible, I will push commit to disable this for autogptq export by raising an error.

@wenhuach21
Copy link
Contributor

wenhuach21 commented Apr 25, 2024

@wenhuach21 It appears that quant_lm_head is not compatible with autogptq. Can you confirm? We are getting nonsensical output when this is enabled.

If confirmed incompatible, I will push commit to disable this for autogptq export by raising an error.

yes, there are some issues with autogptq backend while intel gpu backend could support this by setting deployment device to "xpu".

AutoGPTQ should have some way to support lm-head quantization I think, though I haven't studied yet

@Qubitium
Copy link
Author

Qubitium commented Apr 25, 2024

@wenhuach21 May I have your permission to move this dual license in this file? Or another question is why only this file has a special dual/partial MIT license?

# MIT License
#
# Copyright (c) 2023 潘其威(William)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

@wenhuach21
Copy link
Contributor

@WeiweiZhang1 May I have your permission to move this dual license in this file? Or another question is why only this file has a special dual/partial MIT license?

# MIT License
#
# Copyright (c) 2023 潘其威(William)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

Hi,
Since much of the code in this file is derived from autogptq, we are replicating the license from autogptq to honor the original author. Therefore, we'd better not to remove this license.

@Qubitium
Copy link
Author

Hi, Since much of the code in this file is derived from autogptq, we are replicating the license from autogptq to honor the original author. Therefore, we'd better not to remove this license.

My bad. I thought autogptq was apache too. =)

FYI, this PR is tested and ready to go when combined with autogptq PR #640. Will remove WIP tag once that base PR is merged.

@wenhuach21
Copy link
Contributor

wenhuach21 commented Apr 25, 2024

Hi, Since much of the code in this file is derived from autogptq, we are replicating the license from autogptq to honor the original author. Therefore, we'd better not to remove this license.

My bad. I thought autogptq was apache too. =)

FYI, this PR is tested and ready to go when combined with autogptq PR #640. Will remove WIP tag once that base PR is merged.

Thank you so much for your hard work. I have a question regarding your recent pull request (AutoGPTQ/AutoGPTQ#640). It appears there's an accuracy issue for sym=False in AutoGPTQ. With the format conversion from v2 to v1 in this PR, does the v1 version still preserve the accuracy issue? Besides, as we plan to add support for Marlin later on, are there any specific considerations we should be mindful of?

@Qubitium
Copy link
Author

Qubitium commented Apr 25, 2024

With the format conversion from v2 to v1 in this PR, does the v1 version still preserve the accuracy issue? Besides, as we plan to add support for Marlin later on, are there any specific considerations we should be mindful of?

There was lots of discussion and tests made to see if the v2 -> v1 with sym=False has accuracy degradation. Based on the all the test results I have done using ppl and human eval, I believe the accuracy degration of sym=False saving to v1 vs saving to v2 format is negligible. In fact, the autogptq PR currently has v1 as the default save checkpoint_format because of the real-world ppl results plus v1 has full compat with 3rd inference libs like vllm/sglang. However, it still needs to get reviewed.

@Qubitium
Copy link
Author

@wenhuach21 Here is the marlin kernel compat code used by autogptq

https://github.com/AutoGPTQ/AutoGPTQ/blob/main/auto_gptq/utils/marlin_utils.py#L115

# Adapted from https://github.com/rib-2/marlin/tree/conversion
def _validate_marlin_compatibility(cfg: BaseQuantizeConfig):
    if not MARLIN_AVAILABLE:
        return f"AutoGPTQ is not compiled with the Marlin kernel, with the following error: {MARLIN_EXCEPTION}"
    if cfg.bits != 4:
        return f"The quantized model uses a bitwidth different than 4 (found {cfg.bits})"
    if cfg.group_size != 128 and cfg.group_size != -1:
        return "The quantized model uses a group size that is not 128 or -1 (found quantization_config.group_size)"
    if not cfg.sym:
        return "The quantized model uses asymmetric quantization"
    if cfg.desc_act:
        return "The quantized model uses act-order (also called desc-act) scheme"
    if cfg.quant_method == QUANT_METHOD.AWQ:
        return "awq_gemm format is currently not compatible with marlin"
    return None

@Qubitium
Copy link
Author

Qubitium commented Apr 25, 2024

@wenhuach21 Here is the latest result regarding v2 to v1 sym=False accuracy issue.

We tested/ppl/human evals of models ranging from tinyllama 1.1b, llama3-8b, yi9b, and command-r-v01 (30b?) and it's all good news for sym=False and v2 -> v1 conversion on vllm where the underflows are not autocorrected. Turns out it matters very little if any.

AutoGPTQ/AutoGPTQ#640 (comment)

@Qubitium
Copy link
Author

Update: We will keep this PR updated with main but it is currently not mergable since it depends on autogptq prs to be merged first. Chicken-n-egg problem. If this continues to drag on, I will fork autogpq so this can move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants