Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

surpport of vectorized count_and()? #195

Open
Tenglon opened this issue May 7, 2023 · 2 comments
Open

surpport of vectorized count_and()? #195

Tenglon opened this issue May 7, 2023 · 2 comments

Comments

@Tenglon
Copy link

Tenglon commented May 7, 2023

Thanks for the great contribution to the efficient library.

I was wondering is it is possible to do some operation like numpy's bitwise_and(x, Y)

where the function allows a vector x to calculate bit_wise operation to multiple vectors (stored as rows in Y) simultaneously.

What I observe is that numpy scales up very well, but it is actually 10x slower when Y is only a single vector.

@ilanschnell
Copy link
Owner

Thanks for your interest in bitarray! Unlike numpy arrays, bitarrays are always 1 dimensional arrays without broadcasting (which is what makes the numpy example you mention work). I assume you a bitarray x and a bitmatrix Y and for each row in Y you want to count of x & (row in Y). So the output is a N vector of integers (where N is the number of rows in Y). Right? How do you store the bitmatrix Y (N rows and M columns)? As a single bitarray? If this is the case what you want is something like:

[count_and(x, y) for y in Y[i*M:(i+1)*M] for i in range(N)]

If this operation is too slow, you can calculate elements (in the result vector/list) in parallel using threads.

@Tenglon
Copy link
Author

Tenglon commented May 8, 2023

Hi Ilan,

Thanks for the fast reply,

First to clarity my expected output, out = x & (row in Y) would be a Array that has same size as Y, namely N rows and M columns, where each row is a bit vector and each column is a bit.

I did some experimental test to show the speed difference:

Here is my experiment timing for 1000 repeats on M=8192, N = 500:

// x, y are numpy M bit vectors, Y is numpy bit matrix with N rows (of M bits) vector.
// a, b are numpy M bit vectors converted to list then to bitarray object. B is a list of N elements, where each element is a M bits bitarray. Bvec is a bit array of N*M bits.

numpy.bitwise_and(x, y)

--- numpy bitwise 0.004984617233276367 seconds ---

a&b

--- bitarray bitwise and 0.0003001689910888672 seconds ---

numpy.bitwise_and(x, Y)

--- numpy matrix bitwise and 0.0505518913269043 seconds ---

for i in range(len(B)):
    a&B[i]

--- bitarray matrix by loop bitwise and 0.3004026412963867 seconds ---

[a & Bvec[j * N_bits: (j + 1) * N_bits] for j in range(len(B))]

--- bitarray matrix comprehension and 0.4826831817626953 seconds ---

A turnover from 10 times faster to 10 times slower. which confused me a lot.

I guess numpy uses some SIMD features that the processor supports. But I did not verify this in the source code, as they are not python coded and has complex reliance on C/C++ sources, but I got the above speed comparison for you information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants