ConvLIB is a library of convolution kernels for multicore processors with ARM (NEON) or RISC-V architecture
-
Updated
Jan 12, 2024 - C
ConvLIB is a library of convolution kernels for multicore processors with ARM (NEON) or RISC-V architecture
Fast SpMM implementation on GPUs for GNN (IPDPS'23)
row-major matmul optimization
My attempt of making a GEMM kernel...
Implementations of SGEMM algorithm on Nvidia GPU using different tricks to optimize the performance.
Case Studies for using Accera - the open source cross-platform compiler from Microsoft Research - to create high performance deep learning computations (i.e. GEMM, Convolution, etc.)
Implementations of DGEMM algorithm using different tricks to optimize the performance.
Manually optimize the GEMM (GEneral Matrix Multiply) operation. There is a long way to go.
Fast Matrix Multiplication Implementation in C programming language. This matrix multiplication algorithm is similar to what Numpy uses to compute dot products.
My experiments with convolution
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
phiGEMM: CPU-GPU hybrid matrix-matrix multiplication library
Add a description, image, and links to the gemm-optimization topic page so that developers can more easily learn about it.
To associate your repository with the gemm-optimization topic, visit your repo's landing page and select "manage topics."