Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added quantization utils to allow extending FP16 CoreML models to FP32 #637

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tianrui
Copy link

@tianrui tianrui commented Feb 18, 2020

Extend the parameters of a FP16 MLModel to FP32 by typecasting in numpy and converting definitions in the model's graph definition protobuf.

@tianrui tianrui requested a review from 1duo February 25, 2020 20:36
@1duo 1duo requested a review from aseemw February 26, 2020 17:16
@1duo
Copy link
Collaborator

1duo commented Feb 26, 2020

Thanks for the changes @tianrui! Can you comment on the use cases for FP16 -> FP32?

@aseemw
Copy link
Collaborator

aseemw commented Feb 26, 2020

@tianrui I think there is already a mode that does this:

"dequantization": _QUANTIZATION_MODE_DEQUANTIZE,

This mode hasn't been documented but is currently being used by unit tests that test the weight quantization feature.
Can you verify whether that gives the same result as the changes in this PR?

@tianrui
Copy link
Author

tianrui commented Feb 28, 2020

Thanks for the changes @tianrui! Can you comment on the use cases for FP16 -> FP32?

@1duo I was working on demoing the CoreML BERT model from Apple that can be optimized if using MPS, but only for FP32 parameters at the moment. I did notice that dequantization mode is possible, and I will verify them with my PR.

@tianrui
Copy link
Author

tianrui commented Feb 28, 2020

When trying to quantize using _dequantize_nn_spec() from quantization_utils.py with the spec that was extracted from model.get_spec(), I hit an attribute error: layers. Is there another way to dequantize the model that I'm not aware of? I've verified that the performance of the dequantized model is the same as the FP16 model downloaded from https://docs-assets.developer.apple.com/coreml/models/Text/QuestionAnswering/BERT_SQUAD/BERTSQUADFP16.mlmodel.

@aseemw
Copy link
Collaborator

aseemw commented Feb 28, 2020

When trying to quantize using _dequantize_nn_spec() from quantization_utils.py with the spec that was extracted from model.get_spec(), I hit an attribute error: layers. Is there another way to dequantize the model that I'm not aware of? I've verified that the performance of the dequantized model is the same as the FP16 model downloaded from https://docs-assets.developer.apple.com/coreml/models/Text/QuestionAnswering/BERT_SQUAD/BERTSQUADFP16.mlmodel.

Did you also try by using quantize_weights(quantization_mode="dequantization") instead of _dequantize_nn_spec()?

@tianrui
Copy link
Author

tianrui commented Mar 2, 2020

Hi @aseemw, I tried the function you suggested, but this mode fails when dequantizing a layer of embeddings in BERT, where it makes a call to _dequantize_wp(), and assumes there is a LUT where it doesn't exist, so the call to _dequantize_lut() fails. The FP16 weight parameter has a field float16Value holding the byte array of weights, and empty in its rawValue and floatValue fields. Do you have any suggestions on further verifying the feature?

Copy link
Collaborator

@aseemw aseemw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new API, fix the quantization_mode= dequantization in the existing API (quantize_weights(quantization_mode="dequantization"))

@aseemw
Copy link
Collaborator

aseemw commented Mar 2, 2020

@tianrui There seems to be a bug, if its a LUT and its not. Can you look into fixing that bug? Which is the line where it surfaces that error? (maybe the check that quantization type is linear or LUT is missing)

Birch-san pushed a commit to Birch-san/coremltools that referenced this pull request Nov 27, 2022
* Fix the LMS pytorch regression

* Copy over the changes from apple#637

* Copy over the changes from apple#637

* Fix betas test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants