08 Jul 12:52

lapid92

3c1ff38

Release 2.4.2 Latest

Latest

What's Changed

API changes

Introduced the output_names argument in pytorch_export_model . This optional parameter specifies a list of output node names for export compatibility and is applicable only when using PytorchExportSerializationFormat.ONNX.
Usage example: To set the model output names, use the output_names argument:

mct.exporter.pytorch_export_model(
    model=quantized_exportable_model,
    save_model_path=onnx_file_path,
    repr_dataset=representative_data_gen,
    output_names=['model_output'])

Now the output name (i.e., the name of the output layer in the exported ONNX model) will be model_output, instead of the default name output, which would be assigned if output_names is not specified.

Assets 2

02 Jul 13:00

yuvalavr24

bd_2.4.2

3c1ff38

Black-Duck Reports 2.4.2 Pre-release

Pre-release

bd_2.4.2

Generating docs

Assets 3

23 Jun 10:37

ofirgo

v2.4.1

e8a1707

Release 2.4.1

What's Changed

Improved activation memory estimation for Mixed Precision

We’ve refined how MCT estimates activation memory to ensure more accurate resource planning for quantized models, critical for edge deployment.

Quantization-Preserving and Fused Operators: Quantization-preserving and fused operators are now included in activation memory estimation, thereby addressing potential memory underestimation.
- Note: Quantization preserving for a layer with multiple inputs is disabled.
Shift Negative Activation Correction Fix: An issue when SNC was used during mixed-precision quantization for activations has been fixed. During SNC, a 16-bit quantization layer is used. Thus, it was mistakenly considered in the activation memory estimation for the mixed-precision solution, even though it is part of a fused operation.
Finding activation cuts for Max Cut computation became deterministic for consistent results.

Improved layers' sensitivity evaluation for Mixed Precision

Isolated Bitwidth Testing: When evaluating a specific bitwidth candidate, other layers now stay in float precision (previously set to maximal bitwidth). This isolates the impact of the tested layer, providing clearer insights into its effect on model accuracy.
Sensitivity Normalization: The new metric_normalization parameter in MixedPrecisionQuantizationConfig lets you optionally normalize sensitivity metrics by either the maximal or minimal bitwidth candidate of the layer. The default behaviour remains as before (non-normalized).
New Exponential Weighting Method: A new weighting method MpDistanceWeighting.EXP was added, based on the exponent of negative distances between the quantized and the float models, with exp_distance_weighting_sigma parameter in MixedPrecisionQuantizationConfig controlling the normalization of the distances prior to applying the exponent.
Custom Sensitivity Metrics: A new custom_metric_fn parameter in MixedPrecisionQuantizationConfig allows you to define your own sensitivity metric function. It takes a model (configured with a candidate bitwidth) and returns a float scalar.

New Version of TPC Schema - v2

We’ve upgraded the Target Platform Capabilities (TPC) schema to v2, expanding support for new operations and model configurations.

New OperatorSetNames: EXP, COS, SIN: These additions enable quantization of models with exponential, cosine, and sine operations.
Quantization-Preserving Layers: A new boolean flag, insert_preserving_quantizers in TargetPlatformCapabilities, lets you add quantization-preserving activation holder layers to the final quantized model. Note that this is supported only in PyTorch.

Introduced Activation Threshold Search Using Hessian-Weighted MSE (HMSE)

A new method leverages a weighted activation histogram, with weights derived from Hessian values, to improve activation threshold search. This can be configured via QuantizationErrorMethod.

Weights Quantization Configuration Updates

We’ve added flexibility to weights quantization to give you more control over compression.

Positional Weights Quantization**: Add support for positional weights quantization config where weights are used in functional layers.
Manual Bitwidth Option**: You can now specify manual bitwidth for weights, overriding automatic settings for precise control.

Support for edge-mdt-cl Custom Layers

Added integration with the edge-mdt-cl package, enabling custom layers optimized for edge deployment. Check the edge-mdt-cl repo for layer documentation.

Extending Supported Versions

Added support for PyTorch 2.6 and for NumPy 2.

Breaking changes

Support Discontinued for Old Frameworks Versions: Support discontinued for TensorFlow 2.12, 2.13 and PyTorch 2.2.
Layers from sony-custom-layers No Longer Supported:
The sony-custom-layers package is deprecated and replaced by edge-mdt-cl. Update your code to use edge-mdt-cl layers. Refer to the new package’s documentation for more information.

Additional Changes and Bug Fixes

Improved Error Reporting:
We’ve enhanced error messages to help you diagnose and resolve issues faster, especially in complex PyTorch models.
- Disconnected Input Nodes: Better detection and clearer reporting for PyTorch models with disconnected input nodes, which often occur when the forward pass includes optional or unused inputs (#1360).
- PyTorch .to Misuse: Improved error messages for incorrect use of the .to method in PyTorch, providing more context to simplify debugging (#1382).
Fix for Reused Nodes: Addressed a bug where reused nodes were incorrectly added before original nodes, which could disrupt model structure (#1418).
ONNX Export Enhancements:
- Weights Sharing Support: Added support for weights sharing in exported models, reducing model size and eliminating redundancy for more efficient storage and inference (#1402).
- ONNX Opset 20: Now uses ONNX opset 20 for PyTorch versions > 2.4 for better compatibility with ONNX tools and runtimes.
- Custom Output Names: Enabled the ability to specify output names, giving you more control for integrating exported models with other pipelines.
- Positional Weights Quantization in ONNX Fake-Quant Mode: We’ve fixed an issue that prevented the export of ONNX models in fake-quantized mode when the model included positional weights in functional layers.
- Multi-Input Fix: Fixed an export issue for fake-quantized models with multiple inputs.
Dynamic Output Size Support: Added support for dynamic output sizes in nn.ConvTranspose2d (#1381).
Debug Option to Bypass MCT Facade: Introduced a debug mode to bypass the MCT facade, allowing you to quickly determine whether an issue originates from your model or from MCT itself (#1410).
Upgrade to Pydantic 2: Upgraded to Pydantic 2 for improved data validation (#1426).

Tutorials

Add PyTorch Tutorial for Activation Z-Score Threshold: We’ve added a new PyTorch tutorial to guide you through using activation z-score thresholds for the quantization of PyTorch models. Try it on Google Colab!

Assets 2

18 Jun 15:58

reuvenperetz

bd_2.4.1

e8a1707

Black-Duck Reports 2.4.1 Pre-release

Pre-release

bd_2.4.1

Update version to 2.4.1

Assets 3

22 Jan 10:17

reuvenperetz

bd_2.4.0

a6593bd

Black-Duck Reports 2.4.0 Pre-release

Pre-release

bd_2.4.0

Fix bug in export using FQ ONNX in replacing activation holder with m…

Assets 3

12 Feb 11:31

reuvenperetz

v2.3.0

33c45ff

Release 2.3.0

What's Changed

Major Changes

Target Platform Capabilities (TPC) Changes

TPC Schema

Introduced a new Schema (version v1) mechanism to establish the language for building a target platform capabilities description.
- The schema defines the TargetPlatformCapabilites class, which can be built to describe the platform capabilities.
- The OperatorSetNames enum provides a closed set of operator set names that allows to set quantization configuration options for commonly used operators.
- Using a custom operator set name is also available.
- All schema classes are using pydantic BaseModel for enhanced validation and schema flexibility.
  - MCT has a new dependency in "pydantic < 2.0".
In addition, a new versioning system was introduced, using minor and patch versions.

Naming Refactor

Creating the schema mechanism was followed by some classes renaming:
- TargetPlatformModel → TargetPlatformCapabilities
- TargetPlatformCapabilities → FrameworkQuantizationCapabilities
- OperatorSetConcat → OperatorSetGroup

Attach TPC to Framework

A new module named AttachTpcToFramework handles the conversion from a framework-independent TargetPlatformCapabilities description to a framework-specific FrameworkQuantizationCapabilities that maps each framework's operator to its possible quantization configurations.
Available for Tensorflow and PyTorch via AttachTpcToKeras and AttachTpcToPytorch, respectively.

API changes

All MCT's APIs are expecting to get a target_platform_capabilities object ( TargetPlatformCapabilities), which contains the framework-independent platform capabilities description.
This is changed from the previous behaviour which expected an initialized framework-specific object.
Note: the default behavior of MCT's APIs is not changed! calling an API function without passing a TPC object or passing an object obtained using the following API: get_target_platform_capabilities(<FW_NAME>, DEFAULT_TP_MODEL) would use the same default TPC as in previous release.
- Regardless, users that accessed TPC-related classes not via the published API may encounter breaking changes due to class renaming and files hierarchy changes.

Tighter activation memory estimation via Max-Cut(Experimental)

Replace Max-Tensor with Max-Cut as the activation memory estimation method in the mixed precision algorithm.
The Max-Cut metric considers the model operator's execution schedule for a more precise estimation of activation memory (#1295)
Note: this is an estimation of the actual memory usage during runtime, the actual memory in runtime may differ.
16-bit Activation Quantization (experimental)
- The new activation memory estimation allows flexible usage of the mixed precision algorithm to enable 16-bit activation quantization (dependent on a TPC that supports 16-bit quantization for different operators).
- 16-bit quantization can be enabled either via Manual Bit-width selection API or automatically, by executing mixed precision with a proper activation or total memory constraint.
- Note that when running mixed precision with activation memory constraint to enable 16-bit allocation, shift negative correction should be disabled.

Improved GPTQ algorithm via Sample Layer Attention (SLA):

Enabled SLA by default in both Keras and PyTorch (#1287, #1260)
Added gradual activation quantization support for enhanced results when quantizing activations (#1244, #1237)
Implemented Rademacher distribution for Hessian estimation (#1250)
For more details, please visit our paper.

Resource Utilization (RU) calculation:

Use max cut activation method for activation and total resource utilization computation.
Compute the total target from weights and activations utilization instead of using it as a separate metric.
Weights memory computation now includes all quantized weights in the model, instead of considering only kernel attributes. This may change the results of existing execution of mixed precision scenarios.
Note that the ResourceUtilization API did not change.

Minor Changes

Added Activation Bias Correction feature to potentially enhance quantization results of vision transformers (#1256)
Added substitution to decompose MatMul operation into baseline components in PyTorch (#1313)
Added substitution decompose scaled dot product attention operator in PyTorch (#1229)
Converted core configuration classes to dataclasses for simpler usage and strict behavior verification (CoreConfig, QuantizationConfig, etc.) (#1203)
Trainable Infrastructure changes:
- Moved STE/LSQ activation quantizers from QAT to trainable infrastructure.
- Renamed Trainable QAT quantizer to Weight Trainable quantizer (#1240)
Added support for PyTorch 2.4, PyTorch 2.5, and Python 3.12

Bug Fixes

Fix activation gradient backpropagating in GPTQ for PyTorch models. It now uses STE Activation Trainable quantizers with frozen quantization parameters instead of Activation Inferable quantizers, which did not propagate gradients. (#1197)
Fix ONNX export when PyTorch models have multiple inputs/outputs (#1223)
Fixed the issue of duplicating reused layers in PyTorch models (#1217)
Fixed HMSE being overridden by MSE after resource utilization computation (#1253)
Resolved duplicate QCOs error handling (#1282, #1149)
Fixed tf.nn.{conv2d,convolution} substitution to handle attributes with default values that were not passed explicitly (#1275)
Fixed handling errors in PyTorch graphs by managing nodes with missing outputs and ensuring robust extraction of output shapes (#1186)

New Contributors

Welcome @ambitious-octopus and @itai-berman for their first contributions! #1186 , #1266

Contributors

ambitious-octopus and itai-berman

Assets 2

11 Nov 08:35

ofirgo

v2.2.2

163e1af

Release 2.2.2

Removed TPCs - Breaking Change

This patch release removes IMX500 TPCs versions v2 and v3, for both Keras and PyTorch.
The following features are no longer available out-of-the-box via the provided IMX500 TPC by MCT:
- Quantized model metadata
- Activation 16-bit quantization
- Constant weights quantization

Full Changelog: v2.2.1...v2.2.2

Assets 2

29 Oct 09:56

Idan-BenAmi

v2.2.1

423a02e

Release 2.2.1

Bug Fixes and Other Changes:

A necessary modification for YOLOv8 quantization, derived from #1186 .

Assets 2

29 Oct 07:53

Idan-BenAmi

v2.1.1

ca431cf

Release 2.1.1

Bug Fixes and Other Changes:

A necessary modification for YOLOv8 quantization, derived from #1186 .

Assets 2

25 Aug 11:00

ofirgo

v2.2.0

bb24123

Release 2.2.0

What's Changed

This release includes breaking changes in the Target Platform Capabilities module (TPC). If you use a custom TPC, be sure to review the Breaking changes section.

General changes

Quantization enhancements:
- Improved Hessian information computation runtime: speeds-up GPTQ, HMSE and Mixed Precision with Hessian-based loss.
  - get_keras_gptq_config and get_pytorch_gptq_config functions now allow to get hessian_batch_size argument to control the size of the batch in Hessian computation for GPTQ.
- Data Generation Upgrade: Improved Speed, Performance and Coverage.
  - Add SmoothAugmentationImagePipeline – an image pipeline implementation that includes gaussian smoothing and random cropping and clipping.
  - Improved performance with float16 support in PyTorch.
  - Introduced ReduceLROnPlateauWithReset scheduler – a learning rate scheduler which reduce learning rate when a metric has stopped improving and allows resetting the learning rate to the initial value after a specified number of bad epochs.
- Shift negative correction for activations:
  - Update shift negative for GELU activation operator.
  - Enable shift negative correction by default in QuantizationConfig in CoreConfig.
Introduce new Explainable Quantization (Xquant) tool (experimental):
- Generate a report (viewable in TensorBoard) to troubleshoot performance issues, with histograms and similarity metrics to compare float and quantized models.
- Xquant tutorial in available in PyTorch and Keras.
Introduced TPC IMX500.v3 (experimental):
- Support constants quantization. Constants Add, Sub, Mul & Div operators will be quantized to 8 bits Power of Two quantization, per-axis. Axis is chosen per constant according to minimum quantization error.
- IMX500 TPC now supports 16-bit activation quantization for the following operators: Add, Sub, Mul, Concat & Stack.
- Support assigning allowed input precision options to each operator, that is, the precision representation of the input activation tensor of the operator.
- Default TPC remains IMX500.v1.
- For selecting IMX500.v3 in keras:
  - tpc_v3 = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version="v3")
  - mct.ptq.keras_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v3)
- For selecting IMX500.v3 in pytorch:
  - tpc_v3 = mct.get_target_platform_capabilities("pytorch", 'imx500', target_platform_version="v3")
  - mct.ptq. pytorch_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v3)
Introduced BitWidthConfig API:
- Allow manual adjustment of activation bit-widths for specific model layers through a new class under CoreConfig.
- Usage example of manual selection of 16bit activations available at PyTorch object detection YOLOv8n tutorial.
Tutorials:
- MCT tutorial notebooks updates:
  - Added new tutorials for IMX500:
    - instance segmentation YOLOv8n and a pose estimation YOLOv8n quantization in PyTorch, including an optional Gradient-Based PTQ step for optimized performance.
    - A torchvision model quantization for IMX500.
  - Added new classification models to MCT’s IMX500-Notebooks.
- Added new MCT features tutorials: Xquant tutorial in PyTorch and Keras. In addition, a new tutorial for GPTQ in PyTorch has been added.
- Update PyTorch object detection YOLOv8n tutorial with 16 bits manual configuration.

Breaking changes

To configure OpQuantizationConfig in the TPC, an additional arguments has been added:
- Signedness specifies the signedness of the quantization method (signed or unsigned quantization).
- supported_input_activation_n_bits sets the number of bits that operator accepts as input.

Bug fixes:

Fixed a bug in PyTorch model reader of reshape operator #1086.
Fixed a bug in GPTQ with bias learning for cases that a convolutional layer with None as a bias #1109.
Fixed an issue with mixed precision where when running only weights/activation compression with mixed precision. If layers with multiple candidates of the other (activation/weights) exist, the search would fail or be incorrect. A new filtering procedure has been added before running mixed precision, to filter out unnecessary candidates #1162.

New Contributors

Welcome @DaniAffCH, @irenaby, @yardeny-sony for their first contribution! PR #1094, PR #1118, PR #1163

Full Changelog: v2.1.0...v2.2.0

Contributors

DaniAffCH, irenaby, and yarden-yagil-sony

Assets 2

Releases: SonySemiconductorSolutions/mct-model-optimization

Release 2.4.2

What's Changed

API changes

Uh oh!

Black-Duck Reports 2.4.2

Uh oh!

Release 2.4.1

What's Changed

Improved activation memory estimation for Mixed Precision

Improved layers' sensitivity evaluation for Mixed Precision

New Version of TPC Schema - v2

Introduced Activation Threshold Search Using Hessian-Weighted MSE (HMSE)

Weights Quantization Configuration Updates

Support for edge-mdt-cl Custom Layers

Extending Supported Versions

Breaking changes

Additional Changes and Bug Fixes

Tutorials

Uh oh!

Black-Duck Reports 2.4.1

Uh oh!

Black-Duck Reports 2.4.0

Uh oh!

Release 2.3.0

What's Changed

Major Changes

Target Platform Capabilities (TPC) Changes

TPC Schema

Naming Refactor

Attach TPC to Framework

API changes

Tighter activation memory estimation via Max-Cut(Experimental)

Improved GPTQ algorithm via Sample Layer Attention (SLA):

Resource Utilization (RU) calculation:

Minor Changes

Bug Fixes

New Contributors

Contributors

Uh oh!

Release 2.2.2

Removed TPCs - Breaking Change

Uh oh!

Release 2.2.1

Bug Fixes and Other Changes:

Uh oh!

Release 2.1.1

Bug Fixes and Other Changes:

Uh oh!

Release 2.2.0

What's Changed

General changes

Breaking changes

Bug fixes:

New Contributors

Contributors

Uh oh!