chgemm

chgemm is an symmetric int8 project, which is slightly different from BLAS sgemm:

when you input an int8_t type of matrix [-127,+127], you will get an int32_t one. PS: pay attention to the overflow;
considering the application scene of the deeep learning, the packAB interface is open and can be adjusted;
the common design plan is alpha*A*B+beta*C=C, but mine is C=A*B, because they have no utility in deep learning inference;
row major;
the speed of this project is quicker than any other projects'.

chgemm 是一个 int8 gemm 工程，与 BLAS gemm 不完全相同：

Compiled on RK3399 with -O3 flag. The current peek can be 18.6 gflops, and the orange line is the single-core fp32 limit(14.3 gflops).

-O3 编译，目前在 rk3399 单核结果。目前极限可以到 18.6 gflops，橙线是 rk3399 单核 fp32 极限。在 aws A72 单核测试约 23 gflops，是此实现方法的极限（发挥 100% 性能）。

参照 MMult_4x8_21.c 调用矩阵乘法，将代码嵌入到自己的项目中。可根据推理库的实现做相应修改。

chgemm is pleased to support ncnn available, check gemm_symm_int8.h.

Provide feedback