Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squarePacked GEMM. #586

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

squarePacked GEMM. #586

wants to merge 13 commits into from

Commits on Dec 13, 2021

  1. sup zgemm improvement

    1. In zgemm, mkernel outperforms nkernel for both m > n, and n > m.
    2. Irrespective of mu and nu sizes, mkernel is forced for zgemm based on analysis done.
    
    Change-Id: Iafb7ddb2519c17cf2225da84d6cc74ed985cc21e
    AMD-Internal: [CPUPL-1352]
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    231a464 View commit details
    Browse the repository at this point in the history
  2. gemm_sqp(gemm_squarePacked): 3m_sqp and dgemm_sqp

    1. SquarePacked algorithm focuses on efficient zgemm/dgemm implementation for square matrix sizes (m=k=n)
    2. Variation of 3m algorithm (3m_sqp) is implemented to allow single load and store of C matrix in kernel.
    3. Currently the method supports only m multiple of 8. Residues cases to be implemented later.
    4. dgemm Real kernel (dgemm_sqp) implementation without alpha, beta multiple is done,
        since real alpha and beta scaling are in 3m_sqp framework.
    5. gemm_sqp supports dgemm when alpha = +/-1.0 and beta = 1.0.
    
    Change-Id: I49becaf6079da4be29be5b06057ff4e50770a7d8
    AMD-Internal: [CPUPL-1352]
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    0abc674 View commit details
    Browse the repository at this point in the history
  3. sqp commenting

    1. Added comments.
    
    AMD-Internal: [CPUPL-1429]
    Change-Id: Ie37e24e58cd8bf836038a2258ebd09c3912fab9e
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    5dc5ffa View commit details
    Browse the repository at this point in the history
  4. 3m_sqp vectorization

    1. bli_malloc modified to normal malloc and address alignment within 3m_sqp.
    2. function added to pack A real,imag and sum.
    3. function added to pack B real,imag and sum.
    4. function added to pack C real,imag and beta handling.
    4. sum and sub vectorized.
    
    AMD-Internal: [CPUPL-1352]
    Change-Id: I514e9efb053d529caef2de413d74d0dac2ceca54
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    87c123f View commit details
    Browse the repository at this point in the history
  5. disabled zgemm induced and gemm sqp temporarily.

    1. mx1, mx4 kernel addition and framework modification.
    2. 8mx6n kernel addition.
    3. NULL check added in dgemm_sqp malloc.
    4. mem tracing added.
    5. Restricted 3m_sqp to limited matrix sizes.
    6. Induced methods disabled temporarily for debug.
    
    AMD-Internal: [CPUPL-1352]
    Change-Id: I31671859b32bfbb359687fb7c9056f9eb904c8b2
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    2bb4e87 View commit details
    Browse the repository at this point in the history
  6. Enabling 3m_sqp and 3m1 methods

    1. Re-enabling 3m methods for zgemm.
    2. Vectorization of pack_sum routines re-enabled with bug fix.
    3. 8mx6n kernel added.
    
    AMD-Internal: [CPUPL-1352]
    Change-Id: Id9f010ba763afc52d268c2e68805f069919b8810
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    acfec6a View commit details
    Browse the repository at this point in the history
  7. squarePacked(sqp) framework and multi-instance handling

    1. kx partitions added to k loop for dgemm and zgemm.
    2. mx loop based threading model added for dgemm as prototype of zgemm.
    3. nx loop added for 3m_sqp and dgemm_sqp.
    4. single 3m_sqp workspace allocation with smaller memory footprint.
    5. sqp framework done from dgemm and zgemm.
    6. sqp kernels moved to seperate kernel file.
    7. residue kernel core added to handle mx<8.
    8. multi-instance tuning for 3m_sqp done.
    9. user can set env "BLIS_MULTI_INSTANCE" to 1 for better multi-instance behavior of 3m_sqp.
    
    AMD-Internal: [CPUPL-1521]
    Change-Id: Ibef50a8a37fe99f164edb4621acb44fc0c86514c
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    74800cf View commit details
    Browse the repository at this point in the history
  8. 3m_sqp conjugate support added

    1. 3m_sqp support for A matrix with conjugate_no_transpose and conjugate_transpose added.
    
    AMD-Internal: [CPUPL-1521]
    Change-Id: Ie6e5c49cf86f7d3b95d78705cf445e57f20b3d1f
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    35ad5d8 View commit details
    Browse the repository at this point in the history
  9. Induced method turned off, fix for beta=0 & C = NAN

    1. Induced Method turned off, till the path fully tested for different alpha,beta conditions.
    2. Fix for Beta =0, and C = NAN done.
    
    Change-Id: I5a7bd1393ac245c2ebb72f9a634728af4c0d4000
    madanm3 committed Dec 13, 2021
    Configuration menu
    Copy the full SHA
    93e3d7a View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2021

  1. compile error fixes

    1. New err_t param in bli_malloc_user added.
    2. AOCL_DTL log removed.
    madanm3 committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    7cd7968 View commit details
    Browse the repository at this point in the history
  2. Revert "sup zgemm improvement"

    This reverts commit 231a464.
    madanm3 committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    59029ee View commit details
    Browse the repository at this point in the history
  3. code clean and comments added

    madanm3 committed Dec 15, 2021
    Configuration menu
    Copy the full SHA
    b3e82ba View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0f984c5 View commit details
    Browse the repository at this point in the history