You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know make_tiled_mma will create a mma_tile, and then along M, N, K we will get MMA_M, MMA_N, MMA_K dimensions. So inside cute::gemm, we will loop across MMA_M, MMA_N, MMA_K one by one?
The text was updated successfully, but these errors were encountered:
Yes. First you have a MMA atom and you will use them to make a MMA tile, then use the tile to partition the compute tile. For example, if you start with the 16-8-16 MMA atom, and configure the layout of atoms to be (3,4,2), that is equivalently to say you create a bigger MMA atom of size 48x32x32, and it uses 32x3x4x2 threads to run in parallel. Finally, if the compute tile has size 96-96-64, then you get extra dimensions (MMM_M, MMA_N, MMA_K)=(96/48, 96/32, 64/32) and they are running sequentially.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
I know make_tiled_mma will create a mma_tile, and then along M, N, K we will get MMA_M, MMA_N, MMA_K dimensions. So inside cute::gemm, we will loop across MMA_M, MMA_N, MMA_K one by one?
The text was updated successfully, but these errors were encountered: