Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837

Closed
bernardrb opened this issue May 2, 2024 · 4 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@bernardrb
Copy link

Description

[05/02/2024-13:11:13] [TRT] [I] Starting Calibration.
[05/02/2024-13:11:13] [TRT] [I]   Post Processing Calibration data in 9.32e-07 seconds.
[05/02/2024-13:11:13] [TRT] [V] Assigning tensor scales: /image_encoder/backbone/stages.4/op_list.1/context_module/main/Concat_output_0 using /image_encoder/backbone/stages.4/op_list.1/context_module/main/Concat_output_0 [
[05/02/2024-13:11:13] [TRT] [E] 1: Unexpected exception _Map_base::at

We are trying to quantize a specific layer-type for an engine:
{trt.LayerType.CONVOLUTION : trt.DataType.INT8}

In the log.txt, we show that we identify and set the precision of layers with convolution type. In this way:

self.config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)

if layer.type in calibration.layers.keys():
                logger.debug(f"Found layer in calibration: {layer.type}, with type {layer.precision}")
                logger.debug(f"Setting layer {layer.name} to {calibration.layers[layer.type]}")
                self.network.get_layer(i).precision = calibration.layers[layer.type]

Our previous mixed-precision strategy when considering "blocks" of the network did not stumble upon this error. Only when we did mixed-precision based on the layer type.

Environment

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 40%   32C    P8              7W /  450W |      11MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1469      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+

Baremetal or Container (if so, version): tensorrt-24.0.3py

Relevant Files

log.txt

@lix19937
Copy link

lix19937 commented May 4, 2024

[05/02/2024-13:11:13] [TRT] [I] Post Processing Calibration data in 9.32e-07 seconds.

Your ptq takes so little time ?
How many calib data you prepare ?
And calib data enter the get_batch api ?

@bernardrb
Copy link
Author

No, the PTQ calibration usually takes around 40s. Here is an example log:

2024-05-02_13-42-10.txt

Calibration data consists of 4500 images with a calibration batch size of 500.

In this case, we simply want to try to benchmark our model when we quantize only one type layer type at a time. In our case, Convolutional and Point-wise. We got these layer types from profiling the layers using trtexec, and then analysing the latency using trt-engine explorer.

I've linked a Google Drive folder https://drive.google.com/drive/folders/1MJAP7NDO7zzRJlUJFexpTcxKVWT9tnuP?usp=drive_link with the files that are concerned.

@zerollzeng
Copy link
Collaborator

I think you can calibrate without set the layer precision, just generate scale for all layers and get the calibration cache, then use the calibration cache, fallback some layers to FP32/FP16.

Or use QAT, you can control the layer precision with Q/DQ pairs explicitly.

@zerollzeng zerollzeng self-assigned this May 8, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 8, 2024
@bernardrb
Copy link
Author

bernardrb commented May 8, 2024

I think you can calibrate without set the layer precision, just generate scale for all layers and get the calibration cache, then use the calibration cache, fallback some layers to FP32/FP16.

I'm not sure if I am following. Our current approach is a modified version of the sample script: https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py.

If I want to control the precision of layers based on their type, can I use implicit quantization while setting layer precision as done in my code?

def set_mixed_precision(self, calibration: DictConfig):
        self.config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)

        for i in range(self.network.num_layers):

            layer = self.network.get_layer(i)
            if (
                layer.precision == trt.DataType.INT32
                or any(
                    layer.get_output_type(j) == trt.DataType.INT32
                    for j in range(layer.num_outputs)
                )
            ):
                logger.info(
                    "Skipping layer {} with INT64/INT32 data type".format(layer.name)
                )
            elif "norm" in layer.name:
                self.network.get_layer(i).precision = trt.DataType.FLOAT
                logger.info("Output layer {} set to PREFER FP32 data type".format(layer.name))
            elif layer.type in calibration.layers.keys():
                logger.debug(f"Found layer in calibration: {layer.type}, with type {layer.precision}")
                logger.debug(f"Setting layer {layer.name} to {calibration.layers[layer.type]}")
                self.network.get_layer(i).precision = calibration.layers[layer.type]
                for j in range(layer.num_outputs):
                        self.network.get_layer(i).get_output_type(j) == calibration.layers[layer.type]
            else:
                if any(sub_name in layer.name for sub_name in calibration.int8):
                    self.network.get_layer(i).precision = trt.DataType.INT8
                    for j in range(layer.num_outputs):
                        self.network.get_layer(i).get_output_type(j) == trt.DataType.INT8
                        
                    logger.info("Layer {} set to PREFER INT8 data type".format(layer.name))

                elif any(sub_name in layer.name for sub_name in calibration.fp16):
                    self.network.get_layer(i).precision = trt.DataType.HALF
                    logger.info("Layer {} set to PREFER FP16 data type".format(layer.name))
                    for j in range(layer.num_outputs):
                        self.network.get_layer(i).get_output_type(j) == trt.DataType.HALF

                elif any(sub_name in layer.name for sub_name in calibration.fp32):
                    self.network.get_layer(i).precision = trt.DataType.FLOAT
                    logger.info("Layer {} set to PREFER FP32 data type".format(layer.name))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants