[Feature Request] GPU support for Pyodide. #1911

dynamicwebpaige · 2021-10-26T19:52:50Z

As an example: TensorFlow.js supports a variety of platforms and backends, including:

TensorFlow.js CPU Backend, pure-JS backend for Node.js and the browser.
TensorFlow.js WebGL Backend, WebGL backend for the browser.
TensorFlow.js WASM Backend, WebAssembly backend for the browser.
TensorFlow.js WebGPU, WebGPU backend for the browser.
TensorFlow.js Node, Node.js platform via TensorFlow C++ adapter.
TensorFlow.js React Native, React Native platform via expo-gl adapter.

This supports both training (fine-tuning existing models) and inference, in browser-based contexts. Would it be possible to add in-browser GPU support for Pyodide via WebGPU?

hoodmane · 2021-10-26T21:45:09Z

WebGPU hasn't shipped yet in any browser if I understand correctly. We wouldn't want to do this until it has shipped in Chrome and probably not until it's available in both chrome and firefox.

rth · 2021-10-26T21:59:10Z

What packages did you have in mind? None of the core scientific Python packages work with GPU (related #511) and for something like Tensorflow, Pytorch or ONNX if there are builds with a JS API targeting WebAssembly it could make sense to use those directly? If need be from Python via the Python->JS type translation API in Pyodide.

So far we have avoided building Tensorflow (#50) and Pytorch (#1625) in Pyodide as looking at their conda-forge recipes they are quite complex to build. Also the added value is not very clear to me, if there are already some ways to run them in WASM (at the very least via ONNX), possibly with GPU support.

ryanking13 · 2021-10-27T02:19:27Z

Personally, I think there is a huge added value if Pyodide supports GPU.

I once tried converting my Tensorflow models to Tensorflow.js and running them in the browser.
The conversion process was not very straightforward, I struggled for a couple of days to make it run properly.
I didn't try, but definitely, Pytorch --> ONNX --> Tensorflow --> Tensorflow.js will be more painful.

I think one of the main use cases of Pyodide is in the education area, for beginners who want to try Python and data science in the browser. In that case, supporting Pytorch or Tensorflow without any conversion process will help beginners a lot, who are not familiar with ONNX or Tensorflow.js.

The main problem is, as @rth mentioned, we cannot manage such big libraries like Pytorch or Tensorflow without the help of Google or Facebook.

RandomFractals · 2021-11-08T07:15:58Z

Some info on WebCodecs I'd like to see for creative sound/video TensorFlow projects, and support of webGPU here:

microsoft/vscode#118275 (comment)

That should be possible after this beta in Chromium: https://blog.chromium.org/2021/08/chrome-94-beta-webcodecs-webgpu.html Per those notes, they are looking to have some version of webGPU in Chrome 99.

We are still waiting on vscode dev team to make a decision about audio codecs support that many ML engineers are asking for now.

TheRook · 2021-11-11T20:42:36Z

Development is a moving target, it sounds like now is the absolute best possible time to start building this feature. WebGPU ships in chrome in version 102, which lands in May, and that is less than 6 months away. It can be enabled for dev purposes before then, here is how to use it:
https://web.dev/gpu-compute/

rth · 2021-11-12T10:04:32Z

Using a GPU is certainly worth investigating. Note that in addition to browser compatibility constraints, it would likely only be useful to a fraction of users that browse on a machine with a powerful enough GPU. For instance, benchmarks done for Tensorflow JS show an improvement from 1x to 3-4x between WebGL and WASM on a 2018 MacBook Pro (which should have a dedicated Radeon Pro 555X (or 560x) GPU). On a laptop with an integrated GPU I doubt one would see much improvement.

Other somewhat related things in the backlog,

use an optimized BLAS Using a high performance BLAS #227 which should speed up matrix multiplication on CPU particularly useful for neural net libraries
generally improve Python performance in Pyodide (Fpcast removal #1677 looks promising)

which would improve performance for all users, both for scientific computing and DL on CPU. Those would likely make using, for instance, https://github.com/ArtificialIntelligenceToolkit/aitk.keras for teaching significantly faster.

TheRook · 2021-11-12T15:39:11Z

All modern machines have powerful GPUs, even cell phones have fantastic integrated chips these days. Let's be clear here. This change will have a dramatic impact on 100% of all users.

…

On Fri, Nov 12, 2021, 2:04 AM Roman Yurchak ***@***.***> wrote: Using a GPU is certainly worth investigating. Note that in addition to browser compatibility constraints, it would likely only be useful to a fraction of users that browse on a machine with a powerful enough GPU. For instance, benchmarks done for Tensorflow JS <https://www.tensorflow.org/js/guide/platform_environment> show an improvement from 1x to 3-4x between WebGL and WASM on a 2018 MacBook Pro (which should have a dedicated Radeon Pro 555X (or 560x) GPU). On a laptop with an integrated GPU I doubt one would see much improvement. Other somewhat related things in the backlog, - use an optimized BLAS #227 <#227> which should speed up matrix multiplication particularly useful for neural net libraries - generally improve Python performance in Pyodide (#1677 <#1677> looks promising) which would improve performance for all users, both for scientific computing and DL on CPU. Those would likely make using, for instance, https://github.com/ArtificialIntelligenceToolkit/aitk.keras for teaching significantly faster. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1911 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD7MN3NFTGFKMUFTTQAUATULTRDXANCNFSM5GYT5EMQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

rth · 2021-11-12T16:47:58Z

This change will have a dramatic impact on 100% of
all users.

As mentioned previously help is definitely welcome in this direction. Note however that CPython or the core scientific Python ecosystem (numpy, pandas, scipy, scikit-learn) which is currently most of our users does not support GPU. I'm aware of the Rapids AI project, but that requires CUDA/NVIDIA GPU as far as I can tell.. So until TensorFlow or Pytorch are packaged (#50, #1625) having webGPU will not be very useful. Packaging those would be a first step, and it's a significant amount of work -- we are very much looking for contributors to work on it :)

Also I would love to see more benchmarks for Pytorch or Tensorflow training (or even inference) on modern integrated Intel or AMD graphics vs CPU. For instance if I read the this tensorflow blog correctly (first figure), on a MacBook Pro with i7 using in addition Intel Iris Plus Graphics 645 is faster, but the difference is not major either. With an M1 CPU the situation is entirely different though.

ryanking13 · 2021-11-16T06:11:58Z

So until TensorFlow or Pytorch are packaged (#50, #1625) having webGPU will not be very useful. Packaging those would be a first step, and it's a significant amount of work -- we are very much looking for contributors to work on it :)

Maybe we can start with building TF or PyTorch of CPU only version first (Acutally most of beginners using those libraries don't need GPU at all). I've never tried building TF or PyTorch from source, so I have no idea which one will be easier. Probably we need to build lots of dependencies in advance of building those two.

rth · 2021-11-16T09:59:17Z

Maybe we can start with building TF or PyTorch of CPU only version first

Yes. Generally a good start for any complex package is to look at how it is done on conda-forge: Pytorch, Tensorflow (see the corresponding meta.yaml). Based on those, my guess is that Pytorch would be easier to package, but hard to be sure until someone tries it. If you do please comment in the corresponding issue.

yxdragon · 2022-02-27T18:17:17Z

I think we just need a pure infer framework. Training need much interaction, so you must switch between pyodide and js many times. So why not use tfjs directly?

maybe onnxruntime is easier to be web-compiled than tf and torch? (I try onnx.js, it's compability is not very good. lost some operator, and did not support int64, So many model such as yolo 4-5 could not work)

We need a dl framework in pyodide for:

most dl programers are not aware of js. If they just want to publish an pre trained net, and make a simple web app, pyodide is a good choice.
we can do many post process by numpy and scipy.ndimage. and render it by pil or matplotlib.

yxdragon · 2022-02-27T18:41:52Z

by the way, another question about my dl framework planer.
it can translate onnx model to a json and npy weights. load and run in pure numpy (now support many model such as resnet, unet, yolov4v5, vitransformer). most model is about 2 times slower than torch (in native numpy with blas), not very fast, but just ok.

I anticipate waiting for numpy with blas for pyodide! and now I am trying to inject tfjs 's dot into pyodide, and replace the np.dot by tf.dot.

I test a 1024x1024 float32 matrix 's dot. pyodide's numpy cost2.5s, and tfjs cost 80ms. But when I inject it, the program runs more slow. after some testing, I found transfer between numpy-typedarray-tf.tensor is very slow.

I have try 2 method:
method 1: call tf's dot from pyodide

def dot(a, b):
    ja = tf.tensor(to_js(a.ravel()), to_js(a.shape), str(a.dtype))
    jb = tf.tensor(to_js(b.ravel()), to_js(b.shape), str(b.dtype))
    jr = tf.dot(ja, jb)
    r = np.asarray(jr.reshape(to_js([-1])).dataSync().to_py())
    return r.reshape(jr.shape.to_py())

method 2: write a js function for pyodide

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data, a.shape, a.dtype)
    tb = tf.tensor(b.data, b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

and the net cost 0.6s with numpy, but cost 7s with method 1, and cost 50s with method 2.
So what is the fastest way to transfer between numpy and tf.tensor?
may be pyodide do many check for the proxy, So cost much time. Is there some way to swap by the virtual memory file system? np.save, then js read it?

or some other advice for a faster dot?

hoodmane · 2022-02-27T20:27:28Z

But when I inject it, the program runs more slow. after some testing, I found transfer between numpy-typedarray-tf.tensor is very slow.

I have try 2 method: ...

I don't understand why method 2 is so slow, I don't believe it can have anything to do with the proxy checks. If you could make a complete example of this it might be interesting to look into it.

But to be honest I don't know very well how to profile our code so I'm really not sure how to figure out where it is spending the time. Maybe running in node and using a node profiler would be easiest.

Is it spending the time in tf.dot? My first thought is that perhaps tf.dot could need to fall back to some slow path if the two arguments are subarrays of the same backing array buffer. This hypothesis would be very easy to test but it doesn't really make that much sense considering that taking the dot product doesn't mutate the arguments.

hoodmane · 2022-02-28T01:06:46Z

@yxdragon I would like to know what the performance looks like for the following two versions (with extra copies inserted).

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data.slice(), a.shape, a.dtype)
    tb = tf.tensor(b.data, b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

and

tfdot = function(a, b){
    a = a.getBuffer()
    b = b.getBuffer()
    ta = tf.tensor(a.data.slice(), a.shape, a.dtype)
    tb = tf.tensor(b.data.slice(), b.shape, b.dtype)
    tab = tf.dot(ta, tb)
    return [tab.dataSync(), tab.dtype, tab.shape]
}

yxdragon · 2022-03-01T09:33:46Z

@hoodmane ok, I would start a new threading about this question.

TheRook mentioned this issue Nov 12, 2021

C extensions like numpy and tenserflow in Brython using WebASM brython-dev/brython#1817

Closed

nmstoker mentioned this issue May 30, 2022

Neural networks doesn't function in pyscript pyscript/pyscript#475

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] GPU support for Pyodide. #1911

[Feature Request] GPU support for Pyodide. #1911

dynamicwebpaige commented Oct 26, 2021 •

edited

hoodmane commented Oct 26, 2021

rth commented Oct 26, 2021 •

edited

ryanking13 commented Oct 27, 2021 •

edited

RandomFractals commented Nov 8, 2021 •

edited

TheRook commented Nov 11, 2021 •

edited

rth commented Nov 12, 2021 •

edited

TheRook commented Nov 12, 2021 via email

rth commented Nov 12, 2021 •

edited

ryanking13 commented Nov 16, 2021 •

edited

rth commented Nov 16, 2021

yxdragon commented Feb 27, 2022

yxdragon commented Feb 27, 2022

hoodmane commented Feb 27, 2022 •

edited

hoodmane commented Feb 28, 2022

yxdragon commented Mar 1, 2022

[Feature Request] GPU support for Pyodide. #1911

[Feature Request] GPU support for Pyodide. #1911

Comments

dynamicwebpaige commented Oct 26, 2021 • edited

hoodmane commented Oct 26, 2021

rth commented Oct 26, 2021 • edited

ryanking13 commented Oct 27, 2021 • edited

RandomFractals commented Nov 8, 2021 • edited

TheRook commented Nov 11, 2021 • edited

rth commented Nov 12, 2021 • edited

TheRook commented Nov 12, 2021 via email

rth commented Nov 12, 2021 • edited

ryanking13 commented Nov 16, 2021 • edited

rth commented Nov 16, 2021

yxdragon commented Feb 27, 2022

yxdragon commented Feb 27, 2022

hoodmane commented Feb 27, 2022 • edited

hoodmane commented Feb 28, 2022

yxdragon commented Mar 1, 2022

dynamicwebpaige commented Oct 26, 2021 •

edited

rth commented Oct 26, 2021 •

edited

ryanking13 commented Oct 27, 2021 •

edited

RandomFractals commented Nov 8, 2021 •

edited

TheRook commented Nov 11, 2021 •

edited

rth commented Nov 12, 2021 •

edited

rth commented Nov 12, 2021 •

edited

ryanking13 commented Nov 16, 2021 •

edited

hoodmane commented Feb 27, 2022 •

edited