Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] GPU support for Pyodide. #1911

Open
dynamicwebpaige opened this issue Oct 26, 2021 · 15 comments
Open

[Feature Request] GPU support for Pyodide. #1911

dynamicwebpaige opened this issue Oct 26, 2021 · 15 comments

Comments

@dynamicwebpaige
Copy link

dynamicwebpaige commented Oct 26, 2021

As an example: TensorFlow.js supports a variety of platforms and backends, including:

This supports both training (fine-tuning existing models) and inference, in browser-based contexts. Would it be possible to add in-browser GPU support for Pyodide via WebGPU?

@hoodmane
Copy link
Member

WebGPU hasn't shipped yet in any browser if I understand correctly. We wouldn't want to do this until it has shipped in Chrome and probably not until it's available in both chrome and firefox.

@rth
Copy link
Member

rth commented Oct 26, 2021

What packages did you have in mind? None of the core scientific Python packages work with GPU (related #511) and for something like Tensorflow, Pytorch or ONNX if there are builds with a JS API targeting WebAssembly it could make sense to use those directly? If need be from Python via the Python->JS type translation API in Pyodide.

So far we have avoided building Tensorflow (#50) and Pytorch (#1625) in Pyodide as looking at their conda-forge recipes they are quite complex to build. Also the added value is not very clear to me, if there are already some ways to run them in WASM (at the very least via ONNX), possibly with GPU support.

@ryanking13
Copy link
Member

ryanking13 commented Oct 27, 2021

Personally, I think there is a huge added value if Pyodide supports GPU.

I once tried converting my Tensorflow models to Tensorflow.js and running them in the browser.
The conversion process was not very straightforward, I struggled for a couple of days to make it run properly.
I didn't try, but definitely, Pytorch --> ONNX --> Tensorflow --> Tensorflow.js will be more painful.

I think one of the main use cases of Pyodide is in the education area, for beginners who want to try Python and data science in the browser. In that case, supporting Pytorch or Tensorflow without any conversion process will help beginners a lot, who are not familiar with ONNX or Tensorflow.js.

The main problem is, as @rth mentioned, we cannot manage such big libraries like Pytorch or Tensorflow without the help of Google or Facebook.

@RandomFractals
Copy link

RandomFractals commented Nov 8, 2021

Some info on WebCodecs I'd like to see for creative sound/video TensorFlow projects, and support of webGPU here:

microsoft/vscode#118275 (comment)

That should be possible after this beta in Chromium: https://blog.chromium.org/2021/08/chrome-94-beta-webcodecs-webgpu.html Per those notes, they are looking to have some version of webGPU in Chrome 99.

We are still waiting on vscode dev team to make a decision about audio codecs support that many ML engineers are asking for now.

@TheRook
Copy link

TheRook commented Nov 11, 2021

Development is a moving target, it sounds like now is the absolute best possible time to start building this feature. WebGPU ships in chrome in version 102, which lands in May, and that is less than 6 months away. It can be enabled for dev purposes before then, here is how to use it:
https://web.dev/gpu-compute/

@rth
Copy link
Member

rth commented Nov 12, 2021

Using a GPU is certainly worth investigating. Note that in addition to browser compatibility constraints, it would likely only be useful to a fraction of users that browse on a machine with a powerful enough GPU. For instance, benchmarks done for Tensorflow JS show an improvement from 1x to 3-4x between WebGL and WASM on a 2018 MacBook Pro (which should have a dedicated Radeon Pro 555X (or 560x) GPU). On a laptop with an integrated GPU I doubt one would see much improvement.

Other somewhat related things in the backlog,

which would improve performance for all users, both for scientific computing and DL on CPU. Those would likely make using, for instance, https://github.com/ArtificialIntelligenceToolkit/aitk.keras for teaching significantly faster.

@TheRook
Copy link

TheRook commented Nov 12, 2021 via email

@rth
Copy link
Member

rth commented Nov 12, 2021

This change will have a dramatic impact on 100% of
all users.

As mentioned previously help is definitely welcome in this direction. Note however that CPython or the core scientific Python ecosystem (numpy, pandas, scipy, scikit-learn) which is currently most of our users does not support GPU. I'm aware of the Rapids AI project, but that requires CUDA/NVIDIA GPU as far as I can tell.. So until TensorFlow or Pytorch are packaged (#50, #1625) having webGPU will not be very useful. Packaging those would be a first step, and it's a significant amount of work -- we are very much looking for contributors to work on it :)

Also I would love to see more benchmarks for Pytorch or Tensorflow training (or even inference) on modern integrated Intel or AMD graphics vs CPU. For instance if I read the this tensorflow blog correctly (first figure), on a MacBook Pro with i7 using in addition Intel Iris Plus Graphics 645 is faster, but the difference is not major either. With an M1 CPU the situation is entirely different though.

@ryanking13
Copy link
Member

ryanking13 commented Nov 16, 2021

So until TensorFlow or Pytorch are packaged (#50, #1625) having webGPU will not be very useful. Packaging those would be a first step, and it's a significant amount of work -- we are very much looking for contributors to work on it :)

Maybe we can start with building TF or PyTorch of CPU only version first (Acutally most of beginners using those libraries don't need GPU at all). I've never tried building TF or PyTorch from source, so I have no idea which one will be easier. Probably we need to build lots of dependencies in advance of building those two.

@rth
Copy link
Member

rth commented Nov 16, 2021

Maybe we can start with building TF or PyTorch of CPU only version first

Yes. Generally a good start for any complex package is to look at how it is done on conda-forge: Pytorch, Tensorflow (see the corresponding meta.yaml). Based on those, my guess is that Pytorch would be easier to package, but hard to be sure until someone tries it. If you do please comment in the corresponding issue.

@yxdragon
Copy link

I think we just need a pure infer framework. Training need much interaction, so you must switch between pyodide and js many times. So why not use tfjs directly?

maybe onnxruntime is easier to be web-compiled than tf and torch? (I try onnx.js, it's compability is not very good. lost some operator, and did not support int64, So many model such as yolo 4-5 could not work)

We need a dl framework in pyodide for:

  1. most dl programers are not aware of js. If they just want to publish an pre trained net, and make a simple web app, pyodide is a good choice.
  2. we can do many post process by numpy and scipy.ndimage. and render it by pil or matplotlib.

@yxdragon
Copy link

by the way, another question about my dl framework planer.
it can translate onnx model to a json and npy weights. load and run in pure numpy (now support many model such as resnet, unet, yolov4v5, vitransformer). most model is about 2 times slower than torch (in native numpy with blas), not very fast, but just ok.

I anticipate waiting for numpy with blas for pyodide! and now I am trying to inject tfjs 's dot into pyodide, and replace the np.dot by tf.dot.

I test a 1024x1024 float32 matrix 's dot. pyodide's numpy cost2.5s, and tfjs cost 80ms. But when I inject it, the program runs more slow. after some testing, I found transfer between numpy-typedarray-tf.tensor is very slow.

I have try 2 method:
method 1: call tf's dot from pyodide

def dot(a, b):
    ja = tf.tensor(to_js(a.ravel()), to_js(a.shape), str(a.dtype))
    jb = tf.tensor(to_js(b.ravel()), to_js(b.shape), str(b.dtype))
    jr = tf.dot(ja, jb)
    r = np.asarray(jr.reshape(to_js([-1])).dataSync().to_py())
    return r.reshape(jr.shape.to_py())

method 2: write a js function for pyodide

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data, a.shape, a.dtype)
    tb = tf.tensor(b.data, b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

and the net cost 0.6s with numpy, but cost 7s with method 1, and cost 50s with method 2.
So what is the fastest way to transfer between numpy and tf.tensor?
may be pyodide do many check for the proxy, So cost much time. Is there some way to swap by the virtual memory file system? np.save, then js read it?

or some other advice for a faster dot?

@hoodmane
Copy link
Member

hoodmane commented Feb 27, 2022

But when I inject it, the program runs more slow. after some testing, I found transfer between numpy-typedarray-tf.tensor is very slow.

I have try 2 method: ...

I don't understand why method 2 is so slow, I don't believe it can have anything to do with the proxy checks. If you could make a complete example of this it might be interesting to look into it.

But to be honest I don't know very well how to profile our code so I'm really not sure how to figure out where it is spending the time. Maybe running in node and using a node profiler would be easiest.

Is it spending the time in tf.dot? My first thought is that perhaps tf.dot could need to fall back to some slow path if the two arguments are subarrays of the same backing array buffer. This hypothesis would be very easy to test but it doesn't really make that much sense considering that taking the dot product doesn't mutate the arguments.

@hoodmane
Copy link
Member

@yxdragon I would like to know what the performance looks like for the following two versions (with extra copies inserted).

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data.slice(), a.shape, a.dtype)
    tb = tf.tensor(b.data, b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

and

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data.slice(), a.shape, a.dtype)
    tb = tf.tensor(b.data.slice(), b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

@yxdragon
Copy link

yxdragon commented Mar 1, 2022

@hoodmane ok, I would start a new threading about this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants