Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837

bernardrb · 2024-05-02T13:26:19Z

Description

[05/02/2024-13:11:13] [TRT] [I] Starting Calibration.
[05/02/2024-13:11:13] [TRT] [I]   Post Processing Calibration data in 9.32e-07 seconds.
[05/02/2024-13:11:13] [TRT] [V] Assigning tensor scales: /image_encoder/backbone/stages.4/op_list.1/context_module/main/Concat_output_0 using /image_encoder/backbone/stages.4/op_list.1/context_module/main/Concat_output_0 [
[05/02/2024-13:11:13] [TRT] [E] 1: Unexpected exception _Map_base::at

We are trying to quantize a specific layer-type for an engine:
{trt.LayerType.CONVOLUTION : trt.DataType.INT8}

In the log.txt, we show that we identify and set the precision of layers with convolution type. In this way:

self.config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)

if layer.type in calibration.layers.keys():
                logger.debug(f"Found layer in calibration: {layer.type}, with type {layer.precision}")
                logger.debug(f"Setting layer {layer.name} to {calibration.layers[layer.type]}")
                self.network.get_layer(i).precision = calibration.layers[layer.type]

Our previous mixed-precision strategy when considering "blocks" of the network did not stumble upon this error. Only when we did mixed-precision based on the layer type.

Environment

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 40%   32C    P8              7W /  450W |      11MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1469      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+

Baremetal or Container (if so, version): tensorrt-24.0.3py

Relevant Files

log.txt

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-05-04T14:05:59Z

[05/02/2024-13:11:13] [TRT] [I] Post Processing Calibration data in 9.32e-07 seconds.

Your ptq takes so little time ?
How many calib data you prepare ?
And calib data enter the get_batch api ?

bernardrb · 2024-05-04T18:29:36Z

No, the PTQ calibration usually takes around 40s. Here is an example log:

2024-05-02_13-42-10.txt

Calibration data consists of 4500 images with a calibration batch size of 500.

In this case, we simply want to try to benchmark our model when we quantize only one type layer type at a time. In our case, Convolutional and Point-wise. We got these layer types from profiling the layers using trtexec, and then analysing the latency using trt-engine explorer.

I've linked a Google Drive folder https://drive.google.com/drive/folders/1MJAP7NDO7zzRJlUJFexpTcxKVWT9tnuP?usp=drive_link with the files that are concerned.

zerollzeng · 2024-05-08T03:24:07Z

I think you can calibrate without set the layer precision, just generate scale for all layers and get the calibration cache, then use the calibration cache, fallback some layers to FP32/FP16.

Or use QAT, you can control the layer precision with Q/DQ pairs explicitly.

bernardrb · 2024-05-08T09:01:11Z

I think you can calibrate without set the layer precision, just generate scale for all layers and get the calibration cache, then use the calibration cache, fallback some layers to FP32/FP16.

I'm not sure if I am following. Our current approach is a modified version of the sample script: https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py.

If I want to control the precision of layers based on their type, can I use implicit quantization while setting layer precision as done in my code?

def set_mixed_precision(self, calibration: DictConfig):
        self.config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)

        for i in range(self.network.num_layers):

            layer = self.network.get_layer(i)
            if (
                layer.precision == trt.DataType.INT32
                or any(
                    layer.get_output_type(j) == trt.DataType.INT32
                    for j in range(layer.num_outputs)
                )
            ):
                logger.info(
                    "Skipping layer {} with INT64/INT32 data type".format(layer.name)
                )
            elif "norm" in layer.name:
                self.network.get_layer(i).precision = trt.DataType.FLOAT
                logger.info("Output layer {} set to PREFER FP32 data type".format(layer.name))
            elif layer.type in calibration.layers.keys():
                logger.debug(f"Found layer in calibration: {layer.type}, with type {layer.precision}")
                logger.debug(f"Setting layer {layer.name} to {calibration.layers[layer.type]}")
                self.network.get_layer(i).precision = calibration.layers[layer.type]
                for j in range(layer.num_outputs):
                        self.network.get_layer(i).get_output_type(j) == calibration.layers[layer.type]
            else:
                if any(sub_name in layer.name for sub_name in calibration.int8):
                    self.network.get_layer(i).precision = trt.DataType.INT8
                    for j in range(layer.num_outputs):
                        self.network.get_layer(i).get_output_type(j) == trt.DataType.INT8
                        
                    logger.info("Layer {} set to PREFER INT8 data type".format(layer.name))

                elif any(sub_name in layer.name for sub_name in calibration.fp16):
                    self.network.get_layer(i).precision = trt.DataType.HALF
                    logger.info("Layer {} set to PREFER FP16 data type".format(layer.name))
                    for j in range(layer.num_outputs):
                        self.network.get_layer(i).get_output_type(j) == trt.DataType.HALF

                elif any(sub_name in layer.name for sub_name in calibration.fp32):
                    self.network.get_layer(i).precision = trt.DataType.FLOAT
                    logger.info("Layer {} set to PREFER FP32 data type".format(layer.name))

zerollzeng self-assigned this May 8, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label May 8, 2024

bernardrb closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837

Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837

bernardrb commented May 2, 2024

lix19937 commented May 4, 2024 •

edited

bernardrb commented May 4, 2024

zerollzeng commented May 8, 2024

bernardrb commented May 8, 2024 •

edited

Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837

Unexpected exception _Map_base::at during of TensorRT 8.6.3 when running INT8-calibration on GPU RTX 4090 #3837

Comments

bernardrb commented May 2, 2024

Description

Environment

Relevant Files

lix19937 commented May 4, 2024 • edited

bernardrb commented May 4, 2024

zerollzeng commented May 8, 2024

bernardrb commented May 8, 2024 • edited

lix19937 commented May 4, 2024 •

edited

bernardrb commented May 8, 2024 •

edited