Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Why ONNX with RTMO takes so long? #3010

Open
2 tasks done
Daanfb opened this issue Apr 4, 2024 · 0 comments
Open
2 tasks done

[Bug] Why ONNX with RTMO takes so long? #3010

Daanfb opened this issue Apr 4, 2024 · 0 comments
Assignees

Comments

@Daanfb
Copy link

Daanfb commented Apr 4, 2024

Prerequisite

Environment

04/04 12:28:41 - mmengine - INFO -

04/04 12:28:41 - mmengine - INFO - Environmental information
04/04 12:28:45 - mmengine - INFO - sys.platform: win32
04/04 12:28:45 - mmengine - INFO - Python: 3.8.19 (default, Mar 20 2024, 19:55:45) [MSC v.1916 64 bit (AMD64)]
04/04 12:28:45 - mmengine - INFO - CUDA available: True
04/04 12:28:45 - mmengine - INFO - MUSA available: False
04/04 12:28:45 - mmengine - INFO - numpy_random_seed: 2147483648
04/04 12:28:45 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 2060
04/04 12:28:45 - mmengine - INFO - CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
04/04 12:28:45 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.55
04/04 12:28:45 - mmengine - INFO - MSVC: Compilador de optimización de C/C++ de Microsoft (R) versión 19.39.33523 para x64
04/04 12:28:45 - mmengine - INFO - GCC: n/a
04/04 12:28:45 - mmengine - INFO - PyTorch: 2.2.1+cu118
04/04 12:28:45 - mmengine - INFO - PyTorch compiling details: PyTorch built with:

  • C++ Version: 201703
  • MSVC 192930151
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  • OpenMP 2019
  • LAPACK is enabled (usually provided by MKL)
  • CPU capability usage: AVX2
  • CUDA Runtime 11.8
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.7
  • Magma 2.5.4
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

04/04 12:28:45 - mmengine - INFO - TorchVision: 0.17.1+cu118
04/04 12:28:45 - mmengine - INFO - OpenCV: 4.8.0
04/04 12:28:45 - mmengine - INFO - MMEngine: 0.10.3
04/04 12:28:45 - mmengine - INFO - MMCV: 2.1.0
04/04 12:28:45 - mmengine - INFO - MMCV Compiler: MSVC 193933523
04/04 12:28:45 - mmengine - INFO - MMCV CUDA Compiler: 11.6
04/04 12:28:45 - mmengine - INFO - MMDeploy: 1.3.1+bc75c9d
04/04 12:28:45 - mmengine - INFO -

04/04 12:28:45 - mmengine - INFO - Backend information
04/04 12:28:46 - mmengine - INFO - tensorrt: 8.6.1
04/04 12:28:46 - mmengine - INFO - tensorrt custom ops: NotAvailable
04/04 12:28:47 - mmengine - INFO - ONNXRuntime: None
04/04 12:28:47 - mmengine - INFO - ONNXRuntime-gpu: 1.16.0
04/04 12:28:47 - mmengine - INFO - ONNXRuntime custom ops: NotAvailable
04/04 12:28:47 - mmengine - INFO - pplnn: None
04/04 12:28:47 - mmengine - INFO - ncnn: None
04/04 12:28:47 - mmengine - INFO - snpe: None
04/04 12:28:47 - mmengine - INFO - openvino: None
04/04 12:28:47 - mmengine - INFO - torchscript: 2.2.1+cu118
04/04 12:28:47 - mmengine - INFO - torchscript custom ops: NotAvailable
04/04 12:28:47 - mmengine - INFO - rknn-toolkit: None
04/04 12:28:47 - mmengine - INFO - rknn-toolkit2: None
04/04 12:28:47 - mmengine - INFO - ascend: None
04/04 12:28:47 - mmengine - INFO - coreml: None
04/04 12:28:47 - mmengine - INFO - tvm: None
04/04 12:28:47 - mmengine - INFO - vacc: None
04/04 12:28:47 - mmengine - INFO -

04/04 12:28:47 - mmengine - INFO - Codebase information
04/04 12:28:47 - mmengine - INFO - mmdet: 3.2.0
04/04 12:28:47 - mmengine - INFO - mmseg: None
04/04 12:28:47 - mmengine - INFO - mmpretrain: 1.2.0
04/04 12:28:47 - mmengine - INFO - mmocr: None
04/04 12:28:47 - mmengine - INFO - mmagic: None
04/04 12:28:47 - mmengine - INFO - mmdet3d: None
04/04 12:28:47 - mmengine - INFO - mmpose: 1.3.1
04/04 12:28:47 - mmengine - INFO - mmrotate: None
04/04 12:28:47 - mmengine - INFO - mmaction: None
04/04 12:28:47 - mmengine - INFO - mmrazor: None
04/04 12:28:47 - mmengine - INFO - mmyolo: None

Reproduces the problem - code sample

from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
import time

class ModelOnnx:

    def __init__(self, deploy_cfg, model_cfg, device, backend_model):
        # read deploy_cfg and model_cfg
        deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)

        # build task and backend model
        self.task_processor = build_task_processor(model_cfg, deploy_cfg, device)
        self.model = self.task_processor.build_backend_model(backend_model)

        self.input_shape = get_input_shape(deploy_cfg)

    def process_one_image(self, image):

        start = time.time()

        start_input = time.time()
        model_inputs, _ = self.task_processor.create_input(image, self.input_shape)
        end_input = time.time()

        print(f'Input preparation time: {((end_input - start_input)*1000):.2f} ms')
        # do model inference
        with torch.no_grad():
            result = self.model.test_step(model_inputs)

        end = time.time()

        print(f'Inference time: {((end - start)*1000):.2f} ms')

        # visualize results
        self.task_processor.visualize(
            image=image,
            model=self.model,
            result=result[0],
            window_name='visualize',
            output_file=f'{image}_output.png')
        
if __name__ == "__main__":
    deploy_cfg = 'mmdeploy/configs/mmpose/pose-detection_rtmo_onnxruntime_dynamic.py'
    model_cfg = 'mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-m_16xb16-600e_body7-640x640.py'
    device = 'cuda'
    backend_model = ['rtmo-m_body7_onnx/end2end.onnx']
    image = 'image.jpg'

    model_onnx = ModelOnnx(deploy_cfg, model_cfg, device, backend_model)
    model_onnx.process_one_image(image)

Reproduces the problem - command or script

from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
import time

class ModelOnnx:

    def __init__(self, deploy_cfg, model_cfg, device, backend_model):
        # read deploy_cfg and model_cfg
        deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)

        # build task and backend model
        self.task_processor = build_task_processor(model_cfg, deploy_cfg, device)
        self.model = self.task_processor.build_backend_model(backend_model)

        self.input_shape = get_input_shape(deploy_cfg)

    def process_one_image(self, image):

        start = time.time()

        start_input = time.time()
        model_inputs, _ = self.task_processor.create_input(image, self.input_shape)
        end_input = time.time()

        print(f'Input preparation time: {((end_input - start_input)*1000):.2f} ms')
        # do model inference
        with torch.no_grad():
            result = self.model.test_step(model_inputs)

        end = time.time()

        print(f'Inference time: {((end - start)*1000):.2f} ms')

        # visualize results
        self.task_processor.visualize(
            image=image,
            model=self.model,
            result=result[0],
            window_name='visualize',
            output_file=f'{image}_output.png')
        
if __name__ == "__main__":
    deploy_cfg = 'mmdeploy/configs/mmpose/pose-detection_rtmo_onnxruntime_dynamic.py'
    model_cfg = 'mmpose/configs/body_2d_keypoint/rtmo/body7/rtmo-m_16xb16-600e_body7-640x640.py'
    device = 'cuda'
    backend_model = ['rtmo-m_body7_onnx/end2end.onnx']
    image = 'image.jpg'

    model_onnx = ModelOnnx(deploy_cfg, model_cfg, device, backend_model)
    model_onnx.process_one_image(image)

Reproduces the problem - error message

I don't get any error message, but the infer takes too much time. I have a RTX 2060 laptop

Input preparation time: 39.98 ms
Inference time: 29846.40 ms

With pytorch RTMO I just take about 40ms to do all the process.

Additional information

I have created that script to run a RTMO onnx model, but it takes too much time, so I have to do something wrong.
After run that script I get the following:

Input preparation time: 39.98 ms
Inference time: 29846.40 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants