Include torch-tensorrt in dev container #2789

sachanub · 2023-11-14T00:04:56Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

The objective of this PR is to install tensorrt and torch-tensorrt in the TorchServe dev container. The following changes have been made:

Included line to install tensorrt and torch-tensorrt if $CUDA_VERSION is not empty.
Created a new regression test for the Torch TensorRT usecase. Included tests to check:
- Successful creation of the ResNet-50 model artifact compiled with Torch TensorRT using the torch-model-archiver.
- Successful loading and unloading of the ResNet-50 model compiled with Torch TensorRT.
- Successful inference against the ResNet-50 model compiled with Torch TensorRT.
Minor fixes in build_image.sh script:
- Changed default CUDA version to 11.8 since default base image for GPU is nvidia/cuda:11.8.0-base-ubuntu20.04.
- Changed default base image for CUDA 11.6 to nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04. The old one i.e. nvidia/cuda:11.6.0-cudnn8-runtime-ubuntu20.04 is no longer available on Docker hub.
- Changed default base image for CUDA 11.3 to nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04. The old one i.e. nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu20.04 is no longer available on Docker hub.

UPDATE:

Added dev CI stage in Dockerfile to run regression tests in the Docker regression test workflow.
Updated regression_tests_docker.yml to build dev CI image (IPEX for CPU and CUDA 12.1 for GPU) and run regression tests.
Modified build_image.sh script to support dev-ci option.
Updated Torch TensorRT example to work with CUDA 11.8, CUDA 12.1 and torch-tensorrt==2.1.0.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Ran regression tests in CPU, CUDA 11.8 and CUDA 12.1 dev CI images.

CPU dev CI image: https://github.com/pytorch/serve/actions/runs/6883668921/job/18724674808?pr=2789
CUDA 11.8 dev CI image: regression_tests_cu118.txt
CUDA 12.1 dev CI image: https://github.com/pytorch/serve/actions/runs/6883668921/job/18724674984?pr=2789

agunapal

@sachanub We need to think how we want to test this.
Currently, this test won't get executed anywhere?
Can you please create a nightly workflow to test the "dev" image.

sachanub · 2023-11-16T07:23:19Z

@sachanub We need to think how we want to test this. Currently, this test won't get executed anywhere? Can you please create a nightly workflow to test the "dev" image.

Hi @agunapal I have modified the nightly docker regression test workflow to run tests for the dev container (IPEX and CUDA 12.1). I have included the test logs in the PR description.

agunapal · 2023-11-16T18:00:35Z

Hi @sachanub I'm not sure if we should be combining this with the prod workflow. Thinking in terms of integration with other libraries and may be we want to run this once a week.

sachanub and others added 2 commits November 13, 2023 23:48

Include torch-tensorrt in dev container

b82627d

Merge branch 'master' into dev_container_torch_tensorrt

e157840

agunapal requested changes Nov 14, 2023

View reviewed changes

sachanub and others added 9 commits November 14, 2023 00:17

Fix lint checks

7aa47e8

Add dev-ci stage in Dockerfile

031220c

Add dev-ci stage in Dockerfile

f6a23e7

Update torch-tensorrt example

e4137f1

Add dev container testing in docker regression test workflow

8808c2f

Modify docker regression tests workflow to temporarily run on push

f47cc0f

Remove branch name for test

4b38aa1

Revert changes in Docker regression test workflow

ce5ea32

Merge branch 'master' into dev_container_torch_tensorrt

f552d65

Merge branch 'pytorch:master' into dev_container_torch_tensorrt

9592994

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include torch-tensorrt in dev container #2789

Include torch-tensorrt in dev container #2789

sachanub commented Nov 14, 2023 •

edited

agunapal left a comment

sachanub commented Nov 16, 2023

agunapal commented Nov 16, 2023

Include torch-tensorrt in dev container #2789

Are you sure you want to change the base?

Include torch-tensorrt in dev container #2789

Conversation

sachanub commented Nov 14, 2023 • edited

Description

Type of change

Feature/Issue validation/testing

agunapal left a comment

Choose a reason for hiding this comment

sachanub commented Nov 16, 2023

agunapal commented Nov 16, 2023

sachanub commented Nov 14, 2023 •

edited