Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform data between numpy and tfjs.tensor cost much time #2232

Open
yxdragon opened this issue Mar 1, 2022 · 1 comment
Open

transform data between numpy and tfjs.tensor cost much time #2232

yxdragon opened this issue Mar 1, 2022 · 1 comment

Comments

@yxdragon
Copy link

yxdragon commented Mar 1, 2022

for pyodide's numpy without blas, So I try inject the tf.dot method in pyodide. But the code run several times slower.

  1. I try np.dot, np.matmul, np.einsum and tf.dot (in js) about 1024x1024 float32 matrix: code and test page
    float32 1024x1024: dot:1.50s matmul:1.69s einsum:0.51s tfdot in pyodide:0.05s
    So, without blass, the einsum have a better performance than dot and matmul, but tf.dot is far more faster

here is a mobile net model, code and test page
I try 2 method:

method 1: call tf's dot from pyodide

def dot(a, b):
    ja = tf.tensor(to_js(a.ravel()), to_js(a.shape), str(a.dtype))
    jb = tf.tensor(to_js(b.ravel()), to_js(b.shape), str(b.dtype))
    jr = tf.dot(ja, jb)
    r = np.asarray(jr.reshape(to_js([-1])).dataSync().to_py())
    return r.reshape(jr.shape.to_py())

method 2: write a js function for pyodide

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data, a.shape, a.dtype)
    tb = tf.tensor(b.data, b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

just download the code, and open this line:
np.matmul = tfdot
the cost would fall down from 0.7 to 8 s. (and the method 2 cost about 1 minutes)

where is wrong?

@hoodmane

@rth
Copy link
Member

rth commented Mar 5, 2022

Thanks for looking into this! It makes sense that tfjs dot product is faster than the one we have since, according to their readme, it's accelerated by WebGL.

As far as I know, parts of BLAS Level 1 and 2 including dot product are memory bound, so it's not very surprising that making extra copies / conversions significantly impacts performance.
Though it looks like tensor creation by itself is quite slow in TF.js: https://groups.google.com/a/tensorflow.org/g/tfjs/c/CVYBwBRdUZg so that might be the issue

To make sure that's the bottleneck, it would help to have a table with timing for,

  • dot product with numpy
  • convert numpy array to JS array
  • create JS tensor from JS array
  • dot product with TF.js

(unless I missed this information in your post).

Also on a related subject getting a better BLAS would help #227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants