You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It works pretty well for float and double, performance gain is up to 3x times for full model.
But I cannot use it for BF16, I got https://github.com/libxsmm/libxsmm/blob/main_stable/src/generator_gemm.c#L344.
In such case we will still use torch for BF16 and model will be 2 times slower than float version .
I checked, we do not have Bf16 TN-case or at least not tested/exercised, i.e., this seems to be a valid issue (beside of TN/A-transpose being an unfortunate case in the hot-path ;-).
Could you please enable such case in GEMM for BF16?
This is related to DGL project optimizations
The text was updated successfully, but these errors were encountered: