New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOT via GEMM TPP #803
Comments
General Q (raised by others), this would not benefit from AMX as the latter needs matrices/tiles to benefit from reuse etc. |
DOT and GEMV are BW BOUND ops, so AMX is irrelevant
…On Mon, Aug 14, 2023 at 10:24 AM Hans Pabst ***@***.***> wrote:
General Q (raised by others), this would not benefit from AMX as the
latter needs matrices/tiles to benefit from reuse etc.
Is this a correct assumption?
—
Reply to this email directly, view it on GitHub
<#803 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIZ2NX7F5LV6W2AS55S3UDXVHHBJANCNFSM6AAAAAA3F2OUUI>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
Indeed. However, this came up in a context of "batched dot-products" which in turn can be thought like a matrix multiplication. I still wonder if there is something useful in it and if I should get the exact use case. |
Well, then one is better off by reformulating the algorithm/math to use
matmul. This is a standard trick in linear algebra…
…On Mon, Aug 14, 2023 at 11:00 AM Hans Pabst ***@***.***> wrote:
DOT and GEMV are BW BOUND ops, so AMX is irrelevant
Indeed. However, this came up in a context of "batched dot-products" which
in turn can be thought like a matrix multiplication. I still wonder if
there is something useful in it and if I should get the exact use case.
—
Reply to this email directly, view it on GitHub
<#803 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIZ2NVTT4MXRG75QZFM6MLXVHLJJANCNFSM6AAAAAA3F2OUUI>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
For the record, this can be related to https://github.com/vondele/Stockfish/tree/amx_v1. |
For various LLM and GNN operators, we need vector a fast dot. Right now we have only very slow A^T GEMM for M=1 or we have to run a sequence of unary/binary TPPs.
Plan for improvement: Add fast A^T GEMM for M=1 which uses an inner-product approach + vector reduce in the end.
The text was updated successfully, but these errors were encountered: