[QST/BUG] Should shared memory usage be checked for multistage pipeline? #1525

wzhcz8902 · 2024-05-07T07:52:20Z

cutlass/include/cutlass/gemm/kernel/gemm.h

Lines 153 to 199 in 033d9ef

    
           static Status can_implement( 
        
             cutlass::gemm::GemmCoord const & problem_size, 
        
             typename Mma::IteratorA::TensorRef ref_A, 
        
             typename Mma::IteratorB::TensorRef ref_B, 
        
             typename Epilogue::OutputTileIterator::TensorRef ref_C, 
        
             typename Epilogue::OutputTileIterator::TensorRef ref_D) { 
        
             static int const kAlignmentA = (platform::is_same<typename Mma::IteratorA::Layout, 
        
                                                               layout::ColumnMajorInterleaved<32>>::value) 
        
                                            ? 32 
        
                                            : (platform::is_same<typename Mma::IteratorA::Layout, 
        
                                                                 layout::ColumnMajorInterleaved<64>>::value) 
        
                                              ? 64 
        
                                              : Mma::IteratorA::AccessType::kElements; 
        
             static int const kAlignmentB =  (platform::is_same<typename Mma::IteratorB::Layout, 
        
                                                                layout::RowMajorInterleaved<32>>::value) 
        
                                            ? 32 
        
                                            : (platform::is_same<typename Mma::IteratorB::Layout, 
        
                                                                 layout::RowMajorInterleaved<64>>::value) 
        
                                              ? 64 
        
                                              : Mma::IteratorB::AccessType::kElements; 
        
             static int const kAlignmentC = (platform::is_same<typename Epilogue::OutputTileIterator::Layout, 
        
                                                               layout::ColumnMajorInterleaved<32>>::value) 
        
                                            ? 32 
        
                                            : (platform::is_same<typename Epilogue::OutputTileIterator::Layout, 
        
                                                                 layout::ColumnMajorInterleaved<64>>::value) 
        
                                              ? 64 
        
                                              : Epilogue::OutputTileIterator::kElementsPerAccess; 
        
             if (!TensorRef_aligned(ref_A, kAlignmentA)) { 
        
               return Status::kErrorMisalignedOperand; 
        
             } 
        
             if (!TensorRef_aligned(ref_B, kAlignmentB)) { 
        
               return Status::kErrorMisalignedOperand; 
        
             } 
        
             if (!TensorRef_aligned(ref_C, kAlignmentC)) { 
        
               return Status::kErrorMisalignedOperand; 
        
             } 
        
             if (!TensorRef_aligned(ref_D, kAlignmentC)) { 
        
               return Status::kErrorMisalignedOperand; 
        
             } 
        
             return Status::kSuccess; 
        
           }

For multistage pipeline, the usage of shared memory is proportional with the number of stages applied, so there exists a maximum value of the stages beyond which there will be errors running the kernel. I checked the can_implement function, which seems only care about the alignment of tensor addresses in global memory. Should shared memory usage be checked? Why is it important to make sure the global address is aligned?

The text was updated successfully, but these errors were encountered:

github-actions · 2024-06-06T08:06:25Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

wzhcz8902 added ? - Needs Triage question Question labels May 7, 2024

wzhcz8902 changed the title ~~[QST] Should shared memory usage be checked for multistage pipeline?~~ [QST/BUG] Should shared memory usage be checked for multistage pipeline? May 7, 2024

github-actions bot added the inactive-30d label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST/BUG] Should shared memory usage be checked for multistage pipeline? #1525

[QST/BUG] Should shared memory usage be checked for multistage pipeline? #1525

wzhcz8902 commented May 7, 2024

github-actions bot commented Jun 6, 2024

[QST/BUG] Should shared memory usage be checked for multistage pipeline? #1525

[QST/BUG] Should shared memory usage be checked for multistage pipeline? #1525

Comments

wzhcz8902 commented May 7, 2024

github-actions bot commented Jun 6, 2024