Added quantization utils to allow extending FP16 CoreML models to FP32 #637

tianrui · 2020-02-18T21:20:42Z

Extend the parameters of a FP16 MLModel to FP32 by typecasting in numpy and converting definitions in the model's graph definition protobuf.

1duo · 2020-02-26T17:17:43Z

Thanks for the changes @tianrui! Can you comment on the use cases for FP16 -> FP32?

aseemw · 2020-02-26T17:57:15Z

@tianrui I think there is already a mode that does this:

coremltools/coremltools/models/neural_network/quantization_utils.py

Line 1130 in cde7e76

"dequantization": _QUANTIZATION_MODE_DEQUANTIZE,

This mode hasn't been documented but is currently being used by unit tests that test the weight quantization feature.
Can you verify whether that gives the same result as the changes in this PR?

tianrui · 2020-02-28T02:05:33Z

Thanks for the changes @tianrui! Can you comment on the use cases for FP16 -> FP32?

@1duo I was working on demoing the CoreML BERT model from Apple that can be optimized if using MPS, but only for FP32 parameters at the moment. I did notice that dequantization mode is possible, and I will verify them with my PR.

tianrui · 2020-02-28T21:49:03Z

When trying to quantize using _dequantize_nn_spec() from quantization_utils.py with the spec that was extracted from model.get_spec(), I hit an attribute error: layers. Is there another way to dequantize the model that I'm not aware of? I've verified that the performance of the dequantized model is the same as the FP16 model downloaded from https://docs-assets.developer.apple.com/coreml/models/Text/QuestionAnswering/BERT_SQUAD/BERTSQUADFP16.mlmodel.

aseemw · 2020-02-28T22:33:46Z

When trying to quantize using _dequantize_nn_spec() from quantization_utils.py with the spec that was extracted from model.get_spec(), I hit an attribute error: layers. Is there another way to dequantize the model that I'm not aware of? I've verified that the performance of the dequantized model is the same as the FP16 model downloaded from https://docs-assets.developer.apple.com/coreml/models/Text/QuestionAnswering/BERT_SQUAD/BERTSQUADFP16.mlmodel.

Did you also try by using quantize_weights(quantization_mode="dequantization") instead of _dequantize_nn_spec()?

tianrui · 2020-03-02T19:35:37Z

Hi @aseemw, I tried the function you suggested, but this mode fails when dequantizing a layer of embeddings in BERT, where it makes a call to _dequantize_wp(), and assumes there is a LUT where it doesn't exist, so the call to _dequantize_lut() fails. The FP16 weight parameter has a field float16Value holding the byte array of weights, and empty in its rawValue and floatValue fields. Do you have any suggestions on further verifying the feature?

aseemw

Instead of adding a new API, fix the quantization_mode= dequantization in the existing API (quantize_weights(quantization_mode="dequantization"))

aseemw · 2020-03-02T22:56:21Z

@tianrui There seems to be a bug, if its a LUT and its not. Can you look into fixing that bug? Which is the line where it surfaces that error? (maybe the check that quantization type is linear or LUT is missing)

* Fix the LMS pytorch regression * Copy over the changes from apple#637 * Copy over the changes from apple#637 * Fix betas test

Added quantization utils to allow extending FP16 CoreML models to FP32

fe54b0e

tianrui requested a review from 1duo February 25, 2020 20:36

1duo requested a review from aseemw February 26, 2020 17:16

aseemw requested changes Mar 2, 2020

View reviewed changes

Birch-san pushed a commit to Birch-san/coremltools that referenced this pull request Nov 27, 2022

Fix the LMS pytorch regression (apple#664)

765506c

* Fix the LMS pytorch regression * Copy over the changes from apple#637 * Copy over the changes from apple#637 * Fix betas test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added quantization utils to allow extending FP16 CoreML models to FP32 #637

Added quantization utils to allow extending FP16 CoreML models to FP32 #637

tianrui commented Feb 18, 2020

1duo commented Feb 26, 2020 •

edited

aseemw commented Feb 26, 2020

tianrui commented Feb 28, 2020

tianrui commented Feb 28, 2020

aseemw commented Feb 28, 2020 •

edited

tianrui commented Mar 2, 2020

aseemw left a comment

aseemw commented Mar 2, 2020 •

edited

Added quantization utils to allow extending FP16 CoreML models to FP32 #637

Are you sure you want to change the base?

Added quantization utils to allow extending FP16 CoreML models to FP32 #637

Conversation

tianrui commented Feb 18, 2020

1duo commented Feb 26, 2020 • edited

aseemw commented Feb 26, 2020

tianrui commented Feb 28, 2020

tianrui commented Feb 28, 2020

aseemw commented Feb 28, 2020 • edited

tianrui commented Mar 2, 2020

aseemw left a comment

Choose a reason for hiding this comment

aseemw commented Mar 2, 2020 • edited

1duo commented Feb 26, 2020 •

edited

aseemw commented Feb 28, 2020 •

edited

aseemw commented Mar 2, 2020 •

edited