WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87

Qubitium · 2024-04-24T17:44:49Z

Reason for PR:

Fix compat with latest autogptq
Store autoround fingerprint/version using meta_set_quantizer(name, version) api
Store autoround specific parameters, unrelated to actual autogptq inference/quantization, into meta region via meta_set api
add tqdm progress to quantization so user get good estimation of iter/s and remaining time

Pending merge/changes to AutoGPTQ/AutoGPTQ#640

Tested quant with autogptq inference with sym=True and False.

…ic fields

for more information, see https://pre-commit.ci

…he names

Qubitium · 2024-04-25T05:01:24Z

@wenhuach21 It appears that quant_lm_head is not compatible with autogptq. Can you confirm? We are getting nonsensical output when this is enabled.

If confirmed incompatible, I will push commit to disable this for autogptq export by raising an error.

wenhuach21 · 2024-04-25T05:14:41Z

@wenhuach21 It appears that quant_lm_head is not compatible with autogptq. Can you confirm? We are getting nonsensical output when this is enabled.

If confirmed incompatible, I will push commit to disable this for autogptq export by raising an error.

yes, there are some issues with autogptq backend while intel gpu backend could support this by setting deployment device to "xpu".

AutoGPTQ should have some way to support lm-head quantization I think, though I haven't studied yet

for more information, see https://pre-commit.ci

into autogptq-compat

for more information, see https://pre-commit.ci

…gptq is the packer. separation for future compat

Qubitium · 2024-04-25T06:40:31Z

@wenhuach21 May I have your permission to move this dual license in this file? Or another question is why only this file has a special dual/partial MIT license?

# MIT License
#
# Copyright (c) 2023 潘其威(William)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

wenhuach21 · 2024-04-25T06:46:45Z

@WeiweiZhang1 May I have your permission to move this dual license in this file? Or another question is why only this file has a special dual/partial MIT license?

# MIT License
#
# Copyright (c) 2023 潘其威(William)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

Hi,
Since much of the code in this file is derived from autogptq, we are replicating the license from autogptq to honor the original author. Therefore, we'd better not to remove this license.

Qubitium · 2024-04-25T06:52:22Z

Hi, Since much of the code in this file is derived from autogptq, we are replicating the license from autogptq to honor the original author. Therefore, we'd better not to remove this license.

My bad. I thought autogptq was apache too. =)

FYI, this PR is tested and ready to go when combined with autogptq PR #640. Will remove WIP tag once that base PR is merged.

wenhuach21 · 2024-04-25T07:25:35Z

Hi, Since much of the code in this file is derived from autogptq, we are replicating the license from autogptq to honor the original author. Therefore, we'd better not to remove this license.

My bad. I thought autogptq was apache too. =)

FYI, this PR is tested and ready to go when combined with autogptq PR #640. Will remove WIP tag once that base PR is merged.

Thank you so much for your hard work. I have a question regarding your recent pull request (AutoGPTQ/AutoGPTQ#640). It appears there's an accuracy issue for sym=False in AutoGPTQ. With the format conversion from v2 to v1 in this PR, does the v1 version still preserve the accuracy issue? Besides, as we plan to add support for Marlin later on, are there any specific considerations we should be mindful of?

for more information, see https://pre-commit.ci

Qubitium · 2024-04-25T08:10:52Z

With the format conversion from v2 to v1 in this PR, does the v1 version still preserve the accuracy issue? Besides, as we plan to add support for Marlin later on, are there any specific considerations we should be mindful of?

There was lots of discussion and tests made to see if the v2 -> v1 with sym=False has accuracy degradation. Based on the all the test results I have done using ppl and human eval, I believe the accuracy degration of sym=False saving to v1 vs saving to v2 format is negligible. In fact, the autogptq PR currently has v1 as the default save checkpoint_format because of the real-world ppl results plus v1 has full compat with 3rd inference libs like vllm/sglang. However, it still needs to get reviewed.

Qubitium · 2024-04-25T08:15:30Z

@wenhuach21 Here is the marlin kernel compat code used by autogptq

https://github.com/AutoGPTQ/AutoGPTQ/blob/main/auto_gptq/utils/marlin_utils.py#L115

# Adapted from https://github.com/rib-2/marlin/tree/conversion
def _validate_marlin_compatibility(cfg: BaseQuantizeConfig):
    if not MARLIN_AVAILABLE:
        return f"AutoGPTQ is not compiled with the Marlin kernel, with the following error: {MARLIN_EXCEPTION}"
    if cfg.bits != 4:
        return f"The quantized model uses a bitwidth different than 4 (found {cfg.bits})"
    if cfg.group_size != 128 and cfg.group_size != -1:
        return "The quantized model uses a group size that is not 128 or -1 (found quantization_config.group_size)"
    if not cfg.sym:
        return "The quantized model uses asymmetric quantization"
    if cfg.desc_act:
        return "The quantized model uses act-order (also called desc-act) scheme"
    if cfg.quant_method == QUANT_METHOD.AWQ:
        return "awq_gemm format is currently not compatible with marlin"
    return None

Qubitium · 2024-04-25T08:26:54Z

@wenhuach21 Here is the latest result regarding v2 to v1 sym=False accuracy issue.

We tested/ppl/human evals of models ranging from tinyllama 1.1b, llama3-8b, yi9b, and command-r-v01 (30b?) and it's all good news for sym=False and v2 -> v1 conversion on vllm where the underflows are not autocorrected. Turns out it matters very little if any.

AutoGPTQ/AutoGPTQ#640 (comment)

for more information, see https://pre-commit.ci

remove costly operations(ad3a7bb)

Qubitium · 2024-05-22T14:53:48Z

Update: We will keep this PR updated with main but it is currently not mergable since it depends on autogptq prs to be merged first. Chicken-n-egg problem. If this continues to drag on, I will fork autogpq so this can move forward.

Qubitium and others added 6 commits April 24, 2024 17:38

use autogptq meta api to store autoround version and autoround specif…

45227d9

…ic fields

[pre-commit.ci] auto fixes from pre-commit.com hooks

7334f43

for more information, see https://pre-commit.ci

fix safetensors pkg req

8e68fdb

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa67b2a

for more information, see https://pre-commit.ci

add todo for test

bd25aa4

cleanup

e8c8bb8

Qubitium changed the title ~~WIP: fix compat with latest autogptq and use meta region to store autoround specific properties~~ WIP: fix compat with latest autogptq and use meta region to store autoround properties Apr 25, 2024

Qubitium changed the title ~~WIP: fix compat with latest autogptq and use meta region to store autoround properties~~ WIP: fix compat with latest autogptq and use meta region to store auto-round properties Apr 25, 2024

Qubitium and others added 5 commits April 25, 2024 01:52

tqdm quant progress

5df20d9

[pre-commit.ci] auto fixes from pre-commit.com hooks

c277257

for more information, see https://pre-commit.ci

fix progress for n_blocks > 1

b0fa076

off by 1 in log

9141ed1

misc

edab042

Qubitium mentioned this pull request Apr 25, 2024

Quantization/layer speed is very slow #88

Closed

Qubitium added 2 commits April 25, 2024 02:42

fix console output when n_blocks >= 8 and pbar has no space for all t…

dd0c03f

…he names

misc

4f12a8e

Qubitium and others added 12 commits April 25, 2024 05:41

disable quant_lm_head and raise error for gpu/autogptq code path

d88a702

[pre-commit.ci] auto fixes from pre-commit.com hooks

4aeeafb

for more information, see https://pre-commit.ci

fix saving to gptq v1 format

b11e829

[pre-commit.ci] auto fixes from pre-commit.com hooks

33ba6c0

for more information, see https://pre-commit.ci

misc

ecb9908

Merge branch 'autogptq-compat' of https://github.com/Qubitium/auto-round

f2ce6d9

into autogptq-compat

refractor

b83165a

[pre-commit.ci] auto fixes from pre-commit.com hooks

6c4e0e7

for more information, see https://pre-commit.ci

wrong pass

2289226

add meta.packer and set to autogptq. autoround is the quantizer. auto…

e27aaab

…gptq is the packer. separation for future compat

fix init

b0b2503

bad var

b6a1b31

Qubitium mentioned this pull request Apr 25, 2024

[BUG/FEATURE] Fix Sym=False, new checkpoint_format = gptq_v2 AutoGPTQ/AutoGPTQ#640

Open

29 tasks

Qubitium and others added 3 commits April 25, 2024 07:59

use meta_set_versionable

a7ebbbc

update

a71ca1b

[pre-commit.ci] auto fixes from pre-commit.com hooks

b2e54ea

for more information, see https://pre-commit.ci

This was referenced Apr 25, 2024

Why doesn't AutoGPTQ quantize lm_head layer? AutoGPTQ/AutoGPTQ#647

Open

[FEATURE] Allow loading of quantized lm_head AutoGPTQ/AutoGPTQ#648

Open

Qubitium and others added 12 commits April 26, 2024 01:20

eval requires lm-eval pkg

4d548d6

misc

0074fd0

cleanup log format for iter/loss

3c3f13d

[pre-commit.ci] auto fixes from pre-commit.com hooks

f884cb0

for more information, see https://pre-commit.ci

update

a56144a

merge

81e551f

add weight_config debug

2b4fcc8

temp enable lm_head for testing

e9984a7

misc

46129b9

.6f -> .7f for loss format

90ce4f1

autogptq/main fixed padding of outfeature not divisible by 32

b1b5e76

Merge branch 'main' into autogptq-compat

071508e

Qubitium mentioned this pull request Apr 29, 2024

[CORE] Allow loading of quantized lm_head (ParallelLMHead) vllm-project/vllm#4442

Open

Liurl26 added 3 commits May 6, 2024 10:03

Merge remote-tracking branch 'auto-round/main' into autogptq-compat

5f80ce7

FIX bad var.

6a56be8

Merge commit

27adae9

remove costly operations(ad3a7bb)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87

WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87

Qubitium commented Apr 24, 2024 •

edited

Qubitium commented Apr 25, 2024

wenhuach21 commented Apr 25, 2024 •

edited

Qubitium commented Apr 25, 2024 •

edited

wenhuach21 commented Apr 25, 2024

Qubitium commented Apr 25, 2024

wenhuach21 commented Apr 25, 2024 •

edited

Qubitium commented Apr 25, 2024 •

edited

Qubitium commented Apr 25, 2024

Qubitium commented Apr 25, 2024 •

edited

Qubitium commented May 22, 2024

WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87

Are you sure you want to change the base?

WIP: fix compat with latest autogptq and use meta region to store auto-round properties #87

Conversation

Qubitium commented Apr 24, 2024 • edited

Qubitium commented Apr 25, 2024

wenhuach21 commented Apr 25, 2024 • edited

Qubitium commented Apr 25, 2024 • edited

wenhuach21 commented Apr 25, 2024

Qubitium commented Apr 25, 2024

wenhuach21 commented Apr 25, 2024 • edited

Qubitium commented Apr 25, 2024 • edited

Qubitium commented Apr 25, 2024

Qubitium commented Apr 25, 2024 • edited

Qubitium commented May 22, 2024

Qubitium commented Apr 24, 2024 •

edited

wenhuach21 commented Apr 25, 2024 •

edited

Qubitium commented Apr 25, 2024 •

edited

wenhuach21 commented Apr 25, 2024 •

edited

Qubitium commented Apr 25, 2024 •

edited

Qubitium commented Apr 25, 2024 •

edited