-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend SmoothQuant support (exclude nodes, fuse into layernorm) #1357
Conversation
@chensuyue @mengniwang95 @PenghuiCheng @xin3he happy to get a review on this one! |
for more information, see https://pre-commit.ci
Thank you @chensuyue, will have a look! |
@@ -145,6 +146,7 @@ def transform( | |||
calib_iter=100, | |||
quantize_config=None, | |||
auto_alpha_args={"alpha_min": 0.3, "alpha_max": 0.7, "alpha_step": 0.05, "attn_method": "min"}, | |||
nodes_to_exclude: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is nodes_to_exclude setting passed from user facing API of neural-compressor?
@@ -330,6 +336,37 @@ def conv(node, scale): # pragma: no cover | |||
) | |||
return True | |||
|
|||
def mul_add(node, scale): # pragma: no cover | |||
node_parent = self.model.get_parent(node, 0) | |||
if not len(self.model.get_parents(node)) == 1 or node_parent.op_type != "Mul": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below comment is "Add has itself a MatMul before" but here is "node_parent.op_type != 'Mul'", could you align it?
Further, could you add a ut to make sure it can work correctly? |
@fxmarty would you like to follow this PR? |
We will have code freeze on 11/22, if this PR could be merged before the date, it can be packaged into v2.4 release. |
@fxmarty will you fix the PR? |
@chensuyue Sorry I did not get time to fix it, I won't be able before the release unfortunately. |
Close the PR first due to pending for a long time, feel free to reopen when you have time to handle the issue. |
Type of Change
As per title.
Description
This PR includes two new features for SmoothQuant (that I was too lazy to split into two PRs):
MatMul -> Add -> MatMul
intoMatMul -> Add
(that is typically the case for layernorm) as https://github.com/mit-han-lab/smoothquant/blob/78badc0d975506de9fe44b2fe79d9a35d0fd4914/smoothquant/smooth.py#L46nodes_to_exclude
following ORT quantizer & Optimum quantizer fashion, to allow to exclude some nodes from being smoothed. This is following paper says smoothing all linear layers, but code seems to smooth only the qkv projection in attention and the first fc in ffn? mit-han-lab/smoothquant#15 (comment) (out_proj & fc2 should not be smoothed out to reproduce the paper results).How has this PR been tested?
Locally - I did not test that output from the fusion match but I just reused the code from the
mul
method. Let me know if I should add tests.