-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Vectorize more dtype for int4mm #126512
Conversation
It used to be vectorized only for f16, but no reason not to do the same for bf16 or f32 Spiritual followup of #125290
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126512
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 5fff689 with merge base e3c5d1b (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
@pytorchbot merge -f "Lint , Mac test and aarch64 builds are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
It used to be vectorized only for f16, but no reason not to do the same for bf16 or f32 Spiritual followup of pytorch#125290 Pull Request resolved: pytorch#126512 Approved by: https://github.com/Skylion007
It used to be vectorized only for f16, but no reason not to do the same for bf16 or f32
Spiritual followup of #125290
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10