New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate discrepancy between non-BLAS and BLAS versions of linfa-linear
#277
Comments
Flamegraphs from profiling attached. I haven't studied profiling or flamegraphs to build a firm foundation yet so I can't provide any insights. Each profile was run for 1min. |
did a quick review. 10 feets GLM 100_000 samples is spending most of its CPU time at this step One flow has A screenshot of the graph is attached. |
Also if we use the rayon backend we could look into using par_azip instead of zip. It is the parallel version. |
|
I think calling multiplication less often would likely be the bigger performance boost. I can locally test enabling the I didn't profile the BLAS version. Is this desirable or just out of curiosity? |
Would be useful to see if multiplication is also the bottleneck for BLAS, since BLAS also calls multiplication in two places. |
Okay I'll 2 more. One with blas and one with that the matrixmultiply feature. |
The fact that |
According to the benchmark results from this comment,
linfa-linear
is faster with BLAS than without. The OLS algorithm isn't too different with 5 features and is only slightly slower with 10 features, but GLM is significantly slower without BLAS. We should profile and investigate the difference between the BLAS and non-BLAS performance.The text was updated successfully, but these errors were encountered: