New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] GPU support for Pyodide. #1911
Comments
WebGPU hasn't shipped yet in any browser if I understand correctly. We wouldn't want to do this until it has shipped in Chrome and probably not until it's available in both chrome and firefox. |
What packages did you have in mind? None of the core scientific Python packages work with GPU (related #511) and for something like Tensorflow, Pytorch or ONNX if there are builds with a JS API targeting WebAssembly it could make sense to use those directly? If need be from Python via the Python->JS type translation API in Pyodide. So far we have avoided building Tensorflow (#50) and Pytorch (#1625) in Pyodide as looking at their conda-forge recipes they are quite complex to build. Also the added value is not very clear to me, if there are already some ways to run them in WASM (at the very least via ONNX), possibly with GPU support. |
Personally, I think there is a huge added value if Pyodide supports GPU. I once tried converting my Tensorflow models to Tensorflow.js and running them in the browser. I think one of the main use cases of Pyodide is in the education area, for beginners who want to try Python and data science in the browser. In that case, supporting Pytorch or Tensorflow without any conversion process will help beginners a lot, who are not familiar with ONNX or Tensorflow.js. The main problem is, as @rth mentioned, we cannot manage such big libraries like Pytorch or Tensorflow without the help of Google or Facebook. |
Some info on WebCodecs I'd like to see for creative sound/video TensorFlow projects, and support of webGPU here: microsoft/vscode#118275 (comment) That should be possible after this beta in Chromium: https://blog.chromium.org/2021/08/chrome-94-beta-webcodecs-webgpu.html Per those notes, they are looking to have some version of webGPU in Chrome 99. We are still waiting on vscode dev team to make a decision about audio codecs support that many ML engineers are asking for now. |
Development is a moving target, it sounds like now is the absolute best possible time to start building this feature. WebGPU ships in chrome in version 102, which lands in May, and that is less than 6 months away. It can be enabled for dev purposes before then, here is how to use it: |
Using a GPU is certainly worth investigating. Note that in addition to browser compatibility constraints, it would likely only be useful to a fraction of users that browse on a machine with a powerful enough GPU. For instance, benchmarks done for Tensorflow JS show an improvement from 1x to 3-4x between WebGL and WASM on a 2018 MacBook Pro (which should have a dedicated Radeon Pro 555X (or 560x) GPU). On a laptop with an integrated GPU I doubt one would see much improvement. Other somewhat related things in the backlog,
which would improve performance for all users, both for scientific computing and DL on CPU. Those would likely make using, for instance, https://github.com/ArtificialIntelligenceToolkit/aitk.keras for teaching significantly faster. |
All modern machines have powerful GPUs, even cell phones have fantastic
integrated chips these days.
Let's be clear here. This change will have a dramatic impact on 100% of
all users.
…On Fri, Nov 12, 2021, 2:04 AM Roman Yurchak ***@***.***> wrote:
Using a GPU is certainly worth investigating. Note that in addition to
browser compatibility constraints, it would likely only be useful to a
fraction of users that browse on a machine with a powerful enough GPU. For
instance, benchmarks done for Tensorflow JS
<https://www.tensorflow.org/js/guide/platform_environment> show an
improvement from 1x to 3-4x between WebGL and WASM on a 2018 MacBook Pro
(which should have a dedicated Radeon Pro 555X (or 560x) GPU). On a laptop
with an integrated GPU I doubt one would see much improvement.
Other somewhat related things in the backlog,
- use an optimized BLAS #227
<#227> which should speed up
matrix multiplication particularly useful for neural net libraries
- generally improve Python performance in Pyodide (#1677
<#1677> looks promising)
which would improve performance for all users, both for scientific
computing and DL on CPU. Those would likely make using, for instance,
https://github.com/ArtificialIntelligenceToolkit/aitk.keras for teaching
significantly faster.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1911 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD7MN3NFTGFKMUFTTQAUATULTRDXANCNFSM5GYT5EMQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
As mentioned previously help is definitely welcome in this direction. Note however that CPython or the core scientific Python ecosystem (numpy, pandas, scipy, scikit-learn) which is currently most of our users does not support GPU. I'm aware of the Rapids AI project, but that requires CUDA/NVIDIA GPU as far as I can tell.. So until TensorFlow or Pytorch are packaged (#50, #1625) having webGPU will not be very useful. Packaging those would be a first step, and it's a significant amount of work -- we are very much looking for contributors to work on it :) Also I would love to see more benchmarks for Pytorch or Tensorflow training (or even inference) on modern integrated Intel or AMD graphics vs CPU. For instance if I read the this tensorflow blog correctly (first figure), on a MacBook Pro with i7 using in addition Intel Iris Plus Graphics 645 is faster, but the difference is not major either. With an M1 CPU the situation is entirely different though. |
Maybe we can start with building TF or PyTorch of CPU only version first (Acutally most of beginners using those libraries don't need GPU at all). I've never tried building TF or PyTorch from source, so I have no idea which one will be easier. Probably we need to build lots of dependencies in advance of building those two. |
Yes. Generally a good start for any complex package is to look at how it is done on conda-forge: Pytorch, Tensorflow (see the corresponding |
I think we just need a pure infer framework. Training need much interaction, so you must switch between pyodide and js many times. So why not use tfjs directly? maybe onnxruntime is easier to be web-compiled than tf and torch? (I try onnx.js, it's compability is not very good. lost some operator, and did not support int64, So many model such as yolo 4-5 could not work) We need a dl framework in pyodide for:
|
by the way, another question about my dl framework planer. I anticipate waiting for numpy with blas for pyodide! and now I am trying to inject tfjs 's dot into pyodide, and replace the np.dot by tf.dot. I test a 1024x1024 float32 matrix 's dot. pyodide's numpy cost2.5s, and tfjs cost 80ms. But when I inject it, the program runs more slow. after some testing, I found transfer between numpy-typedarray-tf.tensor is very slow. I have try 2 method: def dot(a, b):
ja = tf.tensor(to_js(a.ravel()), to_js(a.shape), str(a.dtype))
jb = tf.tensor(to_js(b.ravel()), to_js(b.shape), str(b.dtype))
jr = tf.dot(ja, jb)
r = np.asarray(jr.reshape(to_js([-1])).dataSync().to_py())
return r.reshape(jr.shape.to_py()) method 2: write a js function for pyodide tfdot = function(a, b){
a = a.getBuffer()
b = b.getBuffer()
ta = tf.tensor(a.data, a.shape, a.dtype)
tb = tf.tensor(b.data, b.shape, b.dtype)
tab = tf.dot(ta, tb)
return [tab.dataSync(), tab.dtype, tab.shape]
} and the net cost 0.6s with numpy, but cost 7s with method 1, and cost 50s with method 2. or some other advice for a faster dot? |
I don't understand why method 2 is so slow, I don't believe it can have anything to do with the proxy checks. If you could make a complete example of this it might be interesting to look into it. But to be honest I don't know very well how to profile our code so I'm really not sure how to figure out where it is spending the time. Maybe running in node and using a node profiler would be easiest. Is it spending the time in |
@yxdragon I would like to know what the performance looks like for the following two versions (with extra copies inserted). tfdot = function(a, b){
a = a.getBuffer()
b = b.getBuffer()
ta = tf.tensor(a.data.slice(), a.shape, a.dtype)
tb = tf.tensor(b.data, b.shape, b.dtype)
tab = tf.dot(ta, tb)
return [tab.dataSync(), tab.dtype, tab.shape]
} and tfdot = function(a, b){
a = a.getBuffer()
b = b.getBuffer()
ta = tf.tensor(a.data.slice(), a.shape, a.dtype)
tb = tf.tensor(b.data.slice(), b.shape, b.dtype)
tab = tf.dot(ta, tb)
return [tab.dataSync(), tab.dtype, tab.shape]
} |
@hoodmane ok, I would start a new threading about this question. |
As an example: TensorFlow.js supports a variety of platforms and backends, including:
This supports both training (fine-tuning existing models) and inference, in browser-based contexts. Would it be possible to add in-browser GPU support for Pyodide via WebGPU?
The text was updated successfully, but these errors were encountered: