Replies: 4 comments 20 replies
-
Just grabbed a A770 16GB off eBay. They are so cheap, an issue is the cost per PCI-E slot for the computer is higher than the card! So the card itself might have a decent TFLOPS/$, but what do you plug in into? |
Beta Was this translation helpful? Give feedback.
-
It does seems to run somewhat, albeit slowly, using the GPU runtime. Tried adding IPEX to the TORCH runtime, but I am running into some tests failures, need to take a deeper look. Might be as simple as adding the following after
The main benefits of the A770 are the XMX matrix operations with their 8:1 throughput for FP16/BF16 (157.28 TFLOPS theorical performance at base 2.1 GHz, mine is boosting to 2.4GHz all day without even overclocking it, soo probably closer to 180 TFLOPS) and the good memory bandwith for the price, 560 GBps (definitely better than recently released rtx40xx series cards...). The FP32 performance is okay, not amazing, not bad either. Right now, IPEX is more optimized towards BF16 than FP16 tho... |
Beta Was this translation helpful? Give feedback.
-
You can use OpenCL by Rusticl |
Beta Was this translation helpful? Give feedback.
-
Hi, Got the benchmark joint_matrix_bfloat16.cpp running on latest oneAPI ("2024.0") , according to the logs it's running on ARC. Cannot tell if this is fast or slow compared to other platforms.
haven't found out yet why the accuracy check fails. |
Beta Was this translation helpful? Give feedback.
-
Thoughts? There's also a successful PyTorch extension. I know this hardware is in its infancy, but the performance to price ratio is insane. Not to mention that with added driver support in the future, the performance gains are only going to be better...
Beta Was this translation helpful? Give feedback.
All reactions