Skip to content

Latest commit

 

History

History
126 lines (94 loc) · 3.82 KB

File metadata and controls

126 lines (94 loc) · 3.82 KB

Introduction

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix, which is used in our paper.

[Official Docs]      [Official Download]      [Official Github]

Installation

This file provides instructions for installing the cusparseLt 0.2.0 library, which worked on my device, and setup it for using in Python.

  1. Firstly, install cusparselt from anaconda
conda install cusparselt -c conda-forge -y

Or you can click Official Download cusparseLt 0.2.0 and choose the target platform.

  1. Install spmm
python setup.py install
  1. Check the hardware(only support cusparseLt on NVIDIA Ampere, e.g., A100, H100.)
python cspmm/test.py

and normally, it should return the result "HARDWARE PASSED"

Please note that the library cusparselt is updated frequently, but this guide is still valid for now (Last Update: 2023 July 3th ).

Instructions

We calculate the matrxi multiplication like: $$ C = A * B $$ where $A$ is a sparse matrix.

Always import the spmm repo after torch.

Always use spmm for the tensor which is on CUDA and contiguous.

  • initSpmmNum(int num): Allocate a series of memory for subsequent sparse-matrix-multiplication.
import torch
import spmm
spmm.initSpmmNum(4)
  • checkCusparseLt(): Check if hardware supports cusparselt.
import torch
import spmm
spmm.checkCusparseLt() # it should return 0
  • initSpmmDescriptor(int index, int num_batches, int num_A_rows, int num_A_cols, int lda, int num_B_rows, int num_B_cols, int ldb, int num_C_rows, int num_C_cols, int ldc): Init the Sparse Matrix Descriptor. Function locates the memory by index, and normally lda is equals to num_A_cols.
import torch
import spmm
spmm.initSpmmDescriptor(0, 128, 64, 64, 64, 64, 64, 64, 64, 64, 64)
  • pruneMatrix(int index, float* original_matrix, float* prunned_matrix): Prune the Dense Matrix to a Sparse Matrix automatically.
import torch
import spmm
A = torch.rand(64, 64).cuda().contiguous()
A_prunned = torch.rand(64, 64).cuda().contiguous()
spmm.pruneMatrix(0, A, A_prunned)
import torch
import spmm
mask = ... # 1:2 or 2:4
A = ...
A_prunned = (A * mask).contiguous()
spmm.checkPrunned(0, A_prunned)
  • compressMatrix(int index, float* A_prunned): Compress sparse matrix.
import torch
import spmm
...
spmm.compressMatrix(0, A_prunned)
  • spmm(int index, float* dB, float* dC): Perform sparse matrix multiplication, where dB is another matrix and dC is the results.
import torch
import spmm
...
spmm.compressMatrix(0, B, C)

Here is a pseudo code for using spmm:

import torch
import spmm
spmm.checkCusparseLt()
spmm.initSpmmNum(n) # suppose you have n nn.Linear
for i in range(n):
    spmm.initSpmmDescriptor(i, ...) 
    mask = cal_mask(...) # suppose you get the mask from your own function manually 
    A_prunned = (A * mask).cuda().contiguous()
    spmm.checkPrunned(i, A_prunned)
    spmm.compressMatrix(i, A_prunned)

while "(receive a input)":
    spmm.spmm(...)