feat: Implement FP8 functionality #2763

peri044 · 2024-04-18T23:14:58Z

Description

This PR adds FP8 & BF16 datatype support. It also implements converter for FP8 quantized ops.

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

chore: updates to trt api chore: trt 10 fixes chore: more fixes

author Dheeraj Peri <peri.dheeraj@gmail.com> 1711393059 -0700 committer Dheeraj Peri <peri.dheeraj@gmail.com> 1711393072 -0700 chore: minor updates chore: Fix save failures chore: minor fixes chore: remove duplicate bert test case chore: remove comments chore: add load api chore: minor updates chore: minor updates chore: minor updates chore: more updates

zewenli98 · 2024-05-17T21:12:08Z

@peri044 I remember we've already removed cudnn dependency on the release/2.3, but it still stays here:

TensorRT/py/torch_tensorrt/__init__.py

Line 9 in 3f6999d

__cudnn_version__,

This causes an error when I import torch-trt:

>>> import torch_tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/__init__.py", line 7, in <module>
    from torch_tensorrt._version import (  # noqa: F401
ImportError: cannot import name '__cudnn_version__' from 'torch_tensorrt._version' (/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/_version.py)

Can you take a look?

peri044 · 2024-05-17T21:49:38Z

@peri044 I remember we've already removed cudnn dependency on the release/2.3, but it still stays here:

TensorRT/py/torch_tensorrt/__init__.py

Line 9 in 3f6999d

__cudnn_version__,

This causes an error when I import torch-trt:
>>> import torch_tensorrt
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/__init__.py", line 7, in <module>
    from torch_tensorrt._version import (  # noqa: F401
ImportError: cannot import name '__cudnn_version__' from 'torch_tensorrt._version' (/home/scratch.zewenl_sw/docker_workspace/TRT10/TensorRT/py/torch_tensorrt/_version.py)
Can you take a look?

fixed it now

zewenli98 · 2024-05-17T21:51:22Z

Cool thanks! And did you implement the unit test for torch.ops.trt.quantize_fp8.default?

examples/int8/training/vgg16/main.py

zewenli98 · 2024-05-21T23:34:21Z

@peri044 Thanks for the comments. I have refactored based on your suggestions.

examples/dynamo/vgg16_fp8_ptq.py

gs-olive

Overall looks good! Added a few comments

py/torch_tensorrt/dynamo/conversion/impl/quantize.py

gs-olive · 2024-05-23T16:37:54Z

py/torch_tensorrt/dynamo/lowering/_remove_sym_nodes.py

        node
        for node in gm.graph.nodes
        if (
            node.op == "placeholder"
            and isinstance(node.type, type)
            and issubclass(node.type, torch.SymInt)
+            and not node.users


Do SymInt inputs always have 0 users?

Nope they have users. But there's one no-user sym_int node added by torch.compile workflow when we do torch._dynamo.mark_dynamic and this pass is removing that specific node.

py/torch_tensorrt/dynamo/lowering/passes/remove_detach.py

narendasan · 2024-05-23T17:05:49Z

py/torch_tensorrt/_enums.py

-    # bf16 = auto()
+
+    f8 = auto()
+    bf16 = auto()


Make sure this is aligned with main, otherwise the enum values would change version to version

This will be added when we merge this #2845 right ?

narendasan · 2024-05-23T17:06:45Z

py/torch_tensorrt/dynamo/_defaults.py

@@ -27,7 +27,7 @@
 REQUIRE_FULL_COMPILATION = False
 DRYRUN = False
 HARDWARE_COMPATIBLE = False
-SUPPORTED_KERNEL_PRECISIONS = {dtype.f32, dtype.f16, dtype.i8}
+SUPPORTED_KERNEL_PRECISIONS = {dtype.f32, dtype.f16, dtype.i8, dtype.f8}


Missing bf16 here, can you just use a cherrypick of main?

This will be added when we merge this #2845 right ?

py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py

peri044 added 30 commits March 12, 2024 02:11

chore: Upgrade to TRT 10.0

9ad87ac

chore: updates to trt api

a655c9a

feat: Add save API for torch-trt compiled models

cd86660

feat: Add FP8 support including dtype and converters

31285e5

chore: minor fixes

7c9c646

Merge branch 'main' into trt_10

4eabeb0

Merge branch 'trt_10' into fp8_trt10

a320e56

chore: resolve merge conflicts

3ece71b

chore: Fix save failures

eab0dba

chore: update to 2.3 rc build

b191d62

chore: rebase with release/2.3 branch

ce606fe

chore: minor fixes

8674a3c

chore: remove duplicate bert test case

f4e8fe9

chore: remove comments

4ae6ab9

chore: Upgrade to TRT 10.0

fff1b80

chore: updates to trt api chore: trt 10 fixes chore: more fixes

chore: more fixes

39ca77d

chore: update trt version

5431ee3

chore: more updates

0c03de5

chore: more updates

1ae46e9

chore: rebase with save

ae87fba

chore: Update versions

beb5920

chore: update tensorrt version in CI

f0068c6

chore: more updates

39261b9

chore: more fixes

3753150

Merge branch 'release/2.3' into trt_10

16a191c

chore: remove NvUtils.h

c355766

chore: more updates

2d237dc

chore: change lib64 to lib in rhel BUILD file

e4b4429

chore: more updates

fa4fb9c

peri044 added 5 commits May 15, 2024 23:44

chore: fixes

ff231b5

chore: updates

2f167c6

chore: updates

367eaf0

chore: updates

8cb6b91

chore: updates

4d38368

peri044 requested review from narendasan, gs-olive and zewenli98 May 16, 2024 23:32

peri044 added 5 commits May 16, 2024 17:16

chore: updates

ee54da6

chore: updates

f4ccd62

chore: fixes

681a6d1

chore: updates

44071aa

chore: updates

3f6999d

chore: updates

5de9325

peri044 commented May 20, 2024

View reviewed changes

examples/int8/training/vgg16/main.py Outdated Show resolved Hide resolved

refactor vgg16 with fp8 and ptq example

c677ef9

zewenli98 force-pushed the fp8_trt10 branch from ea1053f to c677ef9 Compare May 21, 2024 23:20

github-actions bot added the documentation Improvements or additions to documentation label May 21, 2024

peri044 commented May 22, 2024

View reviewed changes

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

examples/dynamo/vgg16_fp8_ptq.py Outdated Show resolved Hide resolved

zewenli98 and others added 2 commits May 22, 2024 15:27

fix bugs

f0b8d47

chore: rebase

3ce9bed

gs-olive reviewed May 23, 2024

View reviewed changes

chore: updates

beb888d

narendasan reviewed May 23, 2024

View reviewed changes

peri044 added 2 commits May 23, 2024 12:14

chore: address review comments

e7989a0

chore: updates

96fd462

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement FP8 functionality #2763

feat: Implement FP8 functionality #2763

peri044 commented Apr 18, 2024

zewenli98 commented May 17, 2024

peri044 commented May 17, 2024 •

edited

zewenli98 commented May 17, 2024

zewenli98 commented May 21, 2024

gs-olive left a comment

gs-olive May 23, 2024

peri044 May 23, 2024 •

edited

narendasan May 23, 2024

peri044 May 23, 2024

narendasan May 23, 2024

narendasan May 23, 2024

peri044 May 23, 2024

feat: Implement FP8 functionality #2763

Are you sure you want to change the base?

feat: Implement FP8 functionality #2763

Conversation

peri044 commented Apr 18, 2024

Description

Type of change

Checklist:

zewenli98 commented May 17, 2024

peri044 commented May 17, 2024 • edited

zewenli98 commented May 17, 2024

zewenli98 commented May 21, 2024

gs-olive left a comment

Choose a reason for hiding this comment

gs-olive May 23, 2024

Choose a reason for hiding this comment

peri044 May 23, 2024 • edited

Choose a reason for hiding this comment

narendasan May 23, 2024

Choose a reason for hiding this comment

peri044 May 23, 2024

Choose a reason for hiding this comment

narendasan May 23, 2024

Choose a reason for hiding this comment

narendasan May 23, 2024

Choose a reason for hiding this comment

peri044 May 23, 2024

Choose a reason for hiding this comment

peri044 commented May 17, 2024 •

edited

peri044 May 23, 2024 •

edited