chgemm

chgemm is an symmetric int8 project, which is slightly different from BLAS sgemm:

when you input an int8_t type of matrix [-127,+127], you will get an int32_t one. PS: pay attention to the overflow;
considering the application scene of the deeep learning, the packAB interface is open and can be adjusted;
the common design plan is alpha*A*B+beta*C=C, but mine is C=A*B, because they have no utility in deep learning inference;
row major;
the speed of this project is quicker than any other projects'.

chgemm 是一个 int8 gemm 工程，与 BLAS gemm 不完全相同：

Compiled on RK3399 with -O3 flag. The current peek can be 18.6 gflops, and the orange line is the single-core fp32 limit(14.3 gflops).

-O3 编译，目前在 rk3399 单核结果。目前极限可以到 18.6 gflops，橙线是 rk3399 单核 fp32 极限。在 aws A72 单核测试约 23 gflops，是此实现方法的极限（发挥 100% 性能）。

参照 MMult_4x8_21.c 调用矩阵乘法，将代码嵌入到自己的项目中。可根据推理库的实现做相应修改。

chgemm is pleased to support ncnn available, check gemm_symm_int8.h.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
android/MyApplication		android/MyApplication
.travis.yml		.travis.yml
0.png		0.png
MMult_4x16_18.c		MMult_4x16_18.c
MMult_4x16_19.c		MMult_4x16_19.c
MMult_4x16_20.c		MMult_4x16_20.c
MMult_4x8_21.c		MMult_4x8_21.c
MMult_4x8_22.c		MMult_4x8_22.c
README.md		README.md
REF_MMult.c		REF_MMult.c
compare_matrices.c		compare_matrices.c
copy_matrix.c		copy_matrix.c
dclock.c		dclock.c
int8kernel_m1.S		int8kernel_m1.S
int8kernel_m1_requant.S		int8kernel_m1_requant.S
int8kernel_m2.S		int8kernel_m2.S
int8kernel_m2_requant.S		int8kernel_m2_requant.S
int8kernel_m4.S		int8kernel_m4.S
int8kernel_m4_requant.S		int8kernel_m4_requant.S
kernel_m4n4k16.S		kernel_m4n4k16.S
makefile		makefile
parameters.h		parameters.h
plot.py		plot.py
print_matrix.c		print_matrix.c
public.h		public.h
random_matrix.c		random_matrix.c
reorder_a.S		reorder_a.S
reorder_b.S		reorder_b.S
reorder_b.h		reorder_b.h
test_MMult.c		test_MMult.c

tpoisonooo/chgemm