New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement FP8 functionality #2763
base: release/2.3
Are you sure you want to change the base?
Conversation
chore: updates to trt api chore: trt 10 fixes chore: more fixes
author Dheeraj Peri <peri.dheeraj@gmail.com> 1711393059 -0700 committer Dheeraj Peri <peri.dheeraj@gmail.com> 1711393072 -0700 chore: minor updates chore: Fix save failures chore: minor fixes chore: remove duplicate bert test case chore: remove comments chore: add load api chore: minor updates chore: minor updates chore: minor updates chore: more updates
@peri044 I remember we've already removed
This causes an error when I import torch-trt:
Can you take a look? |
fixed it now |
Cool thanks! And did you implement the unit test for |
@peri044 Thanks for the comments. I have refactored based on your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! Added a few comments
node | ||
for node in gm.graph.nodes | ||
if ( | ||
node.op == "placeholder" | ||
and isinstance(node.type, type) | ||
and issubclass(node.type, torch.SymInt) | ||
and not node.users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do SymInt
inputs always have 0 users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope they have users. But there's one no-user sym_int node added by torch.compile workflow when we do torch._dynamo.mark_dynamic
and this pass is removing that specific node.
# bf16 = auto() | ||
|
||
f8 = auto() | ||
bf16 = auto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this is aligned with main, otherwise the enum values would change version to version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be added when we merge this #2845 right ?
@@ -27,7 +27,7 @@ | |||
REQUIRE_FULL_COMPILATION = False | |||
DRYRUN = False | |||
HARDWARE_COMPATIBLE = False | |||
SUPPORTED_KERNEL_PRECISIONS = {dtype.f32, dtype.f16, dtype.i8} | |||
SUPPORTED_KERNEL_PRECISIONS = {dtype.f32, dtype.f16, dtype.i8, dtype.f8} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing bf16 here, can you just use a cherrypick of main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be added when we merge this #2845 right ?
Description
This PR adds FP8 & BF16 datatype support. It also implements converter for FP8 quantized ops.
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: