Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache-Oblivious Algorithms (COA) #965

Open
NimaSarajpoor opened this issue Apr 28, 2024 · 2 comments
Open

Cache-Oblivious Algorithms (COA) #965

NimaSarajpoor opened this issue Apr 28, 2024 · 2 comments

Comments

@NimaSarajpoor
Copy link
Collaborator

NimaSarajpoor commented Apr 28, 2024

In short, Cache-Oblivious Algorithm (COA) is an algorithm that takes advantage of cache without knowing the cache size. This paper proposes COA for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching.

It would be nice to have a tutorial on the cache, how it works (visually), and the COA.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Apr 28, 2024

Example

In #938, we use the COA approach for transposing x.reshape(n, n), where x is a 1D array with length $n^2$ , and $n = 2^{2k}$ (see below)

@njit
def func(x, n, x_transpose):
    """
    Parameters
    -----------
    x : numpy.ndarray
      A numpy 1D array, with length n^2.
   
   n : int
       square-root of len(x). n is a power of 4. 

    x_transpose : numpy.ndarray
       A numpy 1D array with same length as x.  x_transpose.reshape(n,n) will contain the transpose of x.reshape(n,n)
    """
    blocksize = 32
    blocksize = min(blocksize, n)

    x = x.reshape(n, n)
    x_transpose = x_transpose.reshape(n, n)
    for i in range(0, n, blocksize):
        for j in range(0, n, blocksize):
            x_transpose[i:i + blocksize, j:j + blocksize] = np.transpose(x[j:j + blocksize, i:i + blocksize])
            
    return

The above COA-based function for transposing matrix is faster than the following functions (asymptotically speaking):

@njit
def func1(x, n):
    x[:] = np.transpose(x.reshape(n, n)).ravel()
    return


@njit(fastmath=True)
def func2(x, n):
  for k in range(n):
    for p in range(k + 1, n):
      i = k + p * n
      j = p + k * n
      x[i], x[j] = x[j], x[i]

  return

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Apr 28, 2024

Some resources for stepping into the world of cache
Why do CPUs Need Caches?
Introduction to Cache Memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants