New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add suport for int8 matmul in afcuda #1656
Comments
I would also like to mention my own interest in having general support for int8 as a feature. |
This is a great idea but we need to be careful to up/down convert on certain hardware when adding this feature. For example fp16 performance on compute_61(non-Tesla Pascal) cards is absolutely abysmal. We will also need to support this on older hardware. I wonder how cuda provides fall back in those cases. |
AFAIK, there is no support for half precision floating point numbers in C and C++ standards. So fp16 is going to be a bit to support in a general purpose manner. int8 can be supported easily (although it is a bit tedious). |
Found the half library. It seems to be well documented. Probably need to run some tests on performance and compatibility with native types. |
@umar456 Does boost have anything similar ? |
Would int8 easier to implement than fp16, especially the int8 cuda8 pascal gpu backend ? |
@WilliamTambellini yes |
ok so we should better split this feature/issue in 2: int8 and fp16. |
The issue with int8 (char) is that we use it for b8. That is a differentiation we will have to keep in mind. |
@shehzan10 |
Ok thanks. |
I do confirm our interest in int8 mainly via cuda. |
@WilliamTambellini This involves changing a lot of files across 3 backends. This is not fairly straight forward. |
I'd just like to add my support for int8 support. |
Seen here: @umar456 would it be possible to do a minimalist implementation in AF in order to call cublasGemm() when af array datatype is int8 or int16 ? |
Attaching a minimalist poc (seen on the nvidia forum) of int8 matmul using cublasGemmEx : |
@umar456 this one is not needed for 3.7.0 on my side. |
Could we please remove that one from the 3.7.0 scope ? |
up ? |
I see these new features in cuda 8:
"Native FP16 and INT8 computation for deep learning and other workloads;" :
https://devblogs.nvidia.com/parallelforall/cuda-8-features-revealed/
This feature request is to be able to use it through AF at least for matmul and arithmetic (+ - * /)
Thanks
WT.
The text was updated successfully, but these errors were encountered: