You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for pyodide's numpy without blas, So I try inject the tf.dot method in pyodide. But the code run several times slower.
I try np.dot, np.matmul, np.einsum and tf.dot (in js) about 1024x1024 float32 matrix: code and test page
float32 1024x1024: dot:1.50s matmul:1.69s einsum:0.51s tfdot in pyodide:0.05s So, without blass, the einsum have a better performance than dot and matmul, but tf.dot is far more faster
Thanks for looking into this! It makes sense that tfjs dot product is faster than the one we have since, according to their readme, it's accelerated by WebGL.
As far as I know, parts of BLAS Level 1 and 2 including dot product are memory bound, so it's not very surprising that making extra copies / conversions significantly impacts performance.
Though it looks like tensor creation by itself is quite slow in TF.js: https://groups.google.com/a/tensorflow.org/g/tfjs/c/CVYBwBRdUZg so that might be the issue
To make sure that's the bottleneck, it would help to have a table with timing for,
dot product with numpy
convert numpy array to JS array
create JS tensor from JS array
dot product with TF.js
(unless I missed this information in your post).
Also on a related subject getting a better BLAS would help #227
for pyodide's numpy without blas, So I try inject the tf.dot method in pyodide. But the code run several times slower.
float32 1024x1024: dot:1.50s matmul:1.69s einsum:0.51s tfdot in pyodide:0.05s
So, without blass, the einsum have a better performance than dot and matmul, but tf.dot is far more faster
here is a mobile net model, code and test page
I try 2 method:
method 1: call tf's dot from pyodide
method 2: write a js function for pyodide
just download the code, and open this line:
np.matmul = tfdot
the cost would fall down from 0.7 to 8 s. (and the method 2 cost about 1 minutes)
where is wrong?
@hoodmane
The text was updated successfully, but these errors were encountered: