Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include torch-tensorrt in dev container #2789

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

sachanub
Copy link
Collaborator

@sachanub sachanub commented Nov 14, 2023

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

The objective of this PR is to install tensorrt and torch-tensorrt in the TorchServe dev container. The following changes have been made:

  • Included line to install tensorrt and torch-tensorrt if $CUDA_VERSION is not empty.
  • Created a new regression test for the Torch TensorRT usecase. Included tests to check:
    • Successful creation of the ResNet-50 model artifact compiled with Torch TensorRT using the torch-model-archiver.
    • Successful loading and unloading of the ResNet-50 model compiled with Torch TensorRT.
    • Successful inference against the ResNet-50 model compiled with Torch TensorRT.
  • Minor fixes in build_image.sh script:
    • Changed default CUDA version to 11.8 since default base image for GPU is nvidia/cuda:11.8.0-base-ubuntu20.04.
    • Changed default base image for CUDA 11.6 to nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04. The old one i.e. nvidia/cuda:11.6.0-cudnn8-runtime-ubuntu20.04 is no longer available on Docker hub.
    • Changed default base image for CUDA 11.3 to nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04. The old one i.e. nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu20.04 is no longer available on Docker hub.

UPDATE:

  • Added dev CI stage in Dockerfile to run regression tests in the Docker regression test workflow.
  • Updated regression_tests_docker.yml to build dev CI image (IPEX for CPU and CUDA 12.1 for GPU) and run regression tests.
  • Modified build_image.sh script to support dev-ci option.
  • Updated Torch TensorRT example to work with CUDA 11.8, CUDA 12.1 and torch-tensorrt==2.1.0.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Ran regression tests in CPU, CUDA 11.8 and CUDA 12.1 dev CI images.

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sachanub We need to think how we want to test this.
Currently, this test won't get executed anywhere?
Can you please create a nightly workflow to test the "dev" image.

@sachanub
Copy link
Collaborator Author

@sachanub We need to think how we want to test this. Currently, this test won't get executed anywhere? Can you please create a nightly workflow to test the "dev" image.

Hi @agunapal I have modified the nightly docker regression test workflow to run tests for the dev container (IPEX and CUDA 12.1). I have included the test logs in the PR description.

@agunapal
Copy link
Collaborator

Hi @sachanub I'm not sure if we should be combining this with the prod workflow. Thinking in terms of integration with other libraries and may be we want to run this once a week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants