Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 [Bug] Example notebook qat-ptq-workflow yields a calibration error failure #2777

Open
choosehappy opened this issue Apr 25, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@choosehappy
Copy link

Bug Description

All previous lines work as expected, but cell number 28 in this example notebook:

https://github.com/pytorch/TensorRT/blob/main/notebooks/qat-ptq-workflow.ipynb

yields this error:


W0425 19:29:52.724189 140194752341824 _compile.py:108] Input graph is a Torchscript module but the ir provided is default (dynamo). Please set ir=torchscript to suppress the warning. Compiling the module with ir=torchscript
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32 or Bool.
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [standardEngineBuilder.cpp::initCalibrationParams::1718] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.)
----------
RuntimeError                              Traceback (most recent call last)
Cell In[28], line 6
      2 qat_model = torch.jit.load("mobilenetv2_qat.jit.pt").eval()
      3 compile_spec = {"inputs": [torch_tensorrt.Input([64, 3, 224, 224])],
      4                 "enabled_precisions": torch.int8
      5                }
----> 6 trt_mod = torch_tensorrt.compile(qat_model, **compile_spec)

File /usr/local/lib/python3.10/dist-packages/torch_tensorrt/_compile.py:185, in compile(module, ir, inputs, enabled_precisions, **kwargs)
    183         ts_mod = torch.jit.script(module)
    184     assert _non_fx_input_interface(input_list)
--> 185     compiled_ts_module: torch.jit.ScriptModule = torchscript_compile(
    186         ts_mod,
    187         inputs=input_list,
    188         enabled_precisions=enabled_precisions_set,
    189         **kwargs,
    190     )
    191     return compiled_ts_module
    192 elif target_ir == _IRType.fx:

File /usr/local/lib/python3.10/dist-packages/torch_tensorrt/ts/_compiler.py:151, in compile(module, inputs, input_signature, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, capability, num_avg_timing_iters, workspace_size, dla_sram_size, dla_local_dram_size, dla_global_dram_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules, allow_shape_tensors)
    124     raise ValueError(
    125         f"require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: {torch_executed_ops}, torch_executed_modules: {torch_executed_modules}"
    126     )
    128 spec = {
    129     "inputs": input_list,
    130     "input_signature": input_signature,
   (...)
    148     "allow_shape_tensors": allow_shape_tensors,
    149 }
--> 151 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    152 compiled_module: torch.jit.ScriptModule = torch.jit._recursive.wrap_cpp_module(
    153     compiled_cpp_mod
    154 )
    155 return compiled_module

RuntimeError: [Error thrown at core/conversion/conversionctx/ConversionCtx.cpp:169] Building serialized network failed in TensorRT


To Reproduce

Steps to reproduce the behavior:

  1. docker run -it --gpus all nvcr.io/nvidia/pytorch:24.03-py3 bash
  2. jupyter notebook
  3. sequentially run cells in qat-ptq-workflow.ipynb

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 2.3.0a0
  • PyTorch Version (e.g. 1.0): 2.3.0a0+40ec155e58.nv24.03
  • CPU Architecture: 11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz
  • OS (e.g., Linux): WSL2-Ubuntu 22.04
  • How you installed PyTorch (conda, pip, libtorch, source): Docker
  • Python version: 3.10.12
  • CUDA version: 12.2
  • GPU models and configuration: NVIDIA GeForce RTX 3080
@choosehappy choosehappy added the bug Something isn't working label Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants