Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy dotnet Performance Issue #56

Open
Sundarrajan06295 opened this issue Sep 26, 2023 · 11 comments
Open

Numpy dotnet Performance Issue #56

Sundarrajan06295 opened this issue Sep 26, 2023 · 11 comments

Comments

@Sundarrajan06295
Copy link

I tried to multiple 2 large array with numpydotnet library
Code : np.multiply(Data1, Data2) ->300ms
I clearly see there is a performance degradation when compared to numpy library ->150ms

@KevinBaselinesw
Copy link
Collaborator

KevinBaselinesw commented Sep 26, 2023

Are the data types the same? For example are they both doubles? Same type data will perform much better.

Could you provide a more complete example. How big are the arrays? What is the data type? What is the shape of the arrays that you are testing?

@Sundarrajan06295
Copy link
Author

For same data type the difference is minimum.
When I do quantile it is taking more than 150 to 200ms
Code : np.quantile(array1,array2)
When I do variance of image size(3815 * 2800) after transpose it is taking 250ms
code : np.var(data.Transpose(),axis:0)

@KevinBaselinesw
Copy link
Collaborator

One big difference between C# and C (python numpy is a C library) is that C allows easier/faster casting between data types. In C# if you try to cast Int32 to UInt32 I think it will throw an exception but C will allow it. This forces NumpyDotNet to follow a code path that ultimately uses the "dynamic" data type to allow different data types to be used together. It works great, but it is a quite a bit slower. That is why carefully using the same data types will allow the library to run much faster. I can follow templated code paths that don't use the dynamic data type. This also applies to constant values. Something like doubleArray + 1 should be written doubleArray + 1.0 to get maximum performance.

I am working on another issue that can cause slower performance. I use try/catch around most of the calculations. This allows me to catch calculation errors that throw exceptions (i.e. divide by zero, overflows, etc...) and set a default value instead which is what python/numpy does. However, C# try/catch does add a significant CPU overhead. If that is in the middle of 1 million calculations, that can add up to a lot of time. I am working on adding a feature to disable/reroute code to not use try/catch. If you are confident your application will not cause an exception (99% probably don't) then it can speed up the calculations by about 20%.

Would you be willing to demo this feature in your code?

@KevinBaselinesw
Copy link
Collaborator

I will look into the np.quantile and np.var performance issues too.

@Sundarrajan06295
Copy link
Author

Sundarrajan06295 commented Sep 28, 2023

We have following interesting observations:

  1. Variance of axis 1 is always throwing error ?
    Data size : (3815 * 2800)
    Code : var variance = np.var(data,axis:1)
    Exception :
    Unhandled exception. System.Exception: shape mismatch: objects cannot be broadcast to a single shape
    at NumpyLib.numpyinternal.GenerateBroadcastedDims(NpyArray leftArray, NpyArray rightArray)
    at NumpyLib.numpyinternal.NpyArray_NumericOpArraySelection(NpyArray srcArray, NpyArray operandArray, UFuncOperation operationType)
    at NumpyLib.numpyinternal.NpyArray_PerformNumericOperation(UFuncOperation operationType, NpyArray x1Array, NpyArray x2Array, NpyArray outArray, NpyArray whereFilter)
    at NumpyLib.numpyAPI.NpyArray_PerformNumericOperation(UFuncOperation operationType, NpyArray x1Array, NpyArray x2Array, NpyArray outArray, NpyArray whereFilter)
    at NumpyDotNet.NpyCoreApi.PerformNumericOp(ndarray a, UFuncOperation ops, ndarray b, Boolean UseSrcAsDest)
    at NumpyDotNet.ndarray.op_Subtraction(ndarray a, ndarray b)
    at NumpyDotNet.np.var(Object a, Nullable`1 axis, dtype dtype, Int32 ddof, Boolean keep_dims)

  2. To circumvent, we are transposing an array and doing variance calculation.
    However, we observed with array of the size (3815 * 2800) ,
    np.var(data.Transpose(), axis: 0) takes around 250-300 milliseconds compared to np.var(data, axis:0) which takes ~60 ms.

@Sundarrajan06295
Copy link
Author

When I try to debug numpydotnet file , I found the error is in this line
"(object)(ndarray1.astype(np.Float32) - ndarray2)"

@KevinBaselinesw
Copy link
Collaborator

I have a bug fix coming for this today.

@Sundarrajan06295
Copy link
Author

okay , how can i use it

@KevinBaselinesw
Copy link
Collaborator

If you give me an email address at kmckenna at baselinesw.com, I can send you a new DLL that should work for you.

@KevinBaselinesw
Copy link
Collaborator

I have researched why np.quantile takes much longer than python version does. The root cause is that np.quantile ultimately calls np.partition code to do the heavy work. This code is much slower in C# than in the python C code. The reason is the python code uses a lot of complex C macros to do the work which effectively inlines all of the processing. C# does not support macros so I had to turn the macros into functions. These functions are called very frequently which greatly adds to the processing overhead. I can't think of any way to speed this up while still making the code readable and debuggable.

@KevinBaselinesw
Copy link
Collaborator

I have research why np.var takes much longer than python does. The root cause is that np.var is really comprised of 7 (at least) math operations on the array. If each one takes a little bit longer than the python/C version does, it adds up to a significant difference. I have made a few small tweeks to the code to make it a little faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants