Extend SmoothQuant support (exclude nodes, fuse into layernorm) #1357

fxmarty · 2023-10-27T13:39:37Z

Type of Change

As per title.

Description

This PR includes two new features for SmoothQuant (that I was too lazy to split into two PRs):

Add the fusion of MatMul -> Add -> MatMul into MatMul -> Add (that is typically the case for layernorm) as https://github.com/mit-han-lab/smoothquant/blob/78badc0d975506de9fe44b2fe79d9a35d0fd4914/smoothquant/smooth.py#L46
Add a parameter nodes_to_exclude following ORT quantizer & Optimum quantizer fashion, to allow to exclude some nodes from being smoothed. This is following paper says smoothing all linear layers, but code seems to smooth only the qkv projection in attention and the first fc in ffn? mit-han-lab/smoothquant#15 (comment) (out_proj & fc2 should not be smoothed out to reproduce the paper results).

How has this PR been tested?

Locally - I did not test that output from the fusion match but I just reused the code from the mul method. Let me know if I should add tests.

fxmarty · 2023-10-27T13:41:25Z

@chensuyue @mengniwang95 @PenghuiCheng @xin3he happy to get a review on this one!

for more information, see https://pre-commit.ci

chensuyue · 2023-10-31T05:42:21Z

There are some issues detected by CI, could you have a look?
https://dev.azure.com/lpot-inc/neural-compressor/_build/results?buildId=21391&view=logs&j=e5896b99-a49d-517b-218b-3b918f0c116d&t=b18c099a-26ee-5571-9980-67a803d9b7da&l=20474
https://dev.azure.com/lpot-inc/neural-compressor/_build/results?buildId=21391&view=logs&j=c6aa4c58-99e4-54e9-e3eb-cd322b75c938&t=dd753db2-b82a-57eb-728f-6b88742237f1&l=6311

fxmarty · 2023-10-31T08:46:06Z

Thank you @chensuyue, will have a look!

mengniwang95 · 2023-11-07T04:18:46Z

neural_compressor/adaptor/ox_utils/smooth_quant.py

@@ -145,6 +146,7 @@ def transform(
        calib_iter=100,
        quantize_config=None,
        auto_alpha_args={"alpha_min": 0.3, "alpha_max": 0.7, "alpha_step": 0.05, "attn_method": "min"},
+        nodes_to_exclude: Optional[List[str]] = None,


how is nodes_to_exclude setting passed from user facing API of neural-compressor?

mengniwang95 · 2023-11-07T04:33:33Z

neural_compressor/adaptor/ox_utils/smooth_quant.py

@@ -330,6 +336,37 @@ def conv(node, scale):  # pragma: no cover
                )
            return True

+        def mul_add(node, scale):  # pragma: no cover
+            node_parent = self.model.get_parent(node, 0)
+            if not len(self.model.get_parents(node)) == 1 or node_parent.op_type != "Mul":


below comment is "Add has itself a MatMul before" but here is "node_parent.op_type != 'Mul'", could you align it?

mengniwang95 · 2023-11-07T04:34:58Z

Further, could you add a ut to make sure it can work correctly?

chensuyue · 2023-11-16T06:20:14Z

@fxmarty would you like to follow this PR?

chensuyue · 2023-11-16T06:21:20Z

We will have code freeze on 11/22, if this PR could be merged before the date, it can be packaged into v2.4 release.

chensuyue · 2023-11-21T14:38:02Z

@fxmarty will you fix the PR?

fxmarty · 2023-11-21T14:59:17Z

@chensuyue Sorry I did not get time to fix it, I won't be able before the release unfortunately.

chensuyue · 2024-05-21T06:07:31Z

Close the PR first due to pending for a long time, feel free to reopen when you have time to handle the issue.

extend smoothquant support

35ba87e

[pre-commit.ci] auto fixes from pre-commit.com hooks

4590df1

for more information, see https://pre-commit.ci

chensuyue requested review from mengniwang95 and yuwenzho October 30, 2023 15:57

Merge branch 'master' into improve-smoothquant

653a3fb

chensuyue added the enhancement New feature or request label Oct 31, 2023

mengniwang95 reviewed Nov 7, 2023

View reviewed changes

chensuyue added this to the v2.5 milestone Nov 23, 2023

chensuyue added the pending label Jan 12, 2024

chensuyue removed this from the v2.5 milestone Mar 21, 2024

chensuyue closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend SmoothQuant support (exclude nodes, fuse into layernorm) #1357

Extend SmoothQuant support (exclude nodes, fuse into layernorm) #1357

fxmarty commented Oct 27, 2023 •

edited

fxmarty commented Oct 27, 2023

chensuyue commented Oct 31, 2023

fxmarty commented Oct 31, 2023

mengniwang95 Nov 7, 2023

mengniwang95 Nov 7, 2023 •

edited

mengniwang95 commented Nov 7, 2023

chensuyue commented Nov 16, 2023

chensuyue commented Nov 16, 2023

chensuyue commented Nov 21, 2023

fxmarty commented Nov 21, 2023

chensuyue commented May 21, 2024

Extend SmoothQuant support (exclude nodes, fuse into layernorm) #1357

Extend SmoothQuant support (exclude nodes, fuse into layernorm) #1357

Conversation

fxmarty commented Oct 27, 2023 • edited

Type of Change

Description

How has this PR been tested?

fxmarty commented Oct 27, 2023

chensuyue commented Oct 31, 2023

fxmarty commented Oct 31, 2023

mengniwang95 Nov 7, 2023

Choose a reason for hiding this comment

mengniwang95 Nov 7, 2023 • edited

Choose a reason for hiding this comment

mengniwang95 commented Nov 7, 2023

chensuyue commented Nov 16, 2023

chensuyue commented Nov 16, 2023

chensuyue commented Nov 21, 2023

fxmarty commented Nov 21, 2023

chensuyue commented May 21, 2024

fxmarty commented Oct 27, 2023 •

edited

mengniwang95 Nov 7, 2023 •

edited