Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade numpy version to 1.20+ #1827

Closed
leopsidom opened this issue Sep 11, 2021 · 28 comments
Closed

Upgrade numpy version to 1.20+ #1827

leopsidom opened this issue Sep 11, 2021 · 28 comments

Comments

@leopsidom
Copy link
Contributor

leopsidom commented Sep 11, 2021

Currently pyodide has numpy at version 1.17.5 compared to latest numpy release @ 1.21.2. Pandas at version 1.0.5 while the latest release is 1.3.2. Should we bump the versions for these packages ?

PS: I tried to have a build on the latest numpy. It seems we are able to remove 2 patches with the upgrade due to upstream fix:

patches/add-emscripten-cpu.patch
patches/rm-duplicate-symbols-link.patch

I was able to fix some patches, and got it start building. But still getting some issues:

emcc: error: Passing any of -msse, -msse2, -msse3, -mssse3, -msse4.1, -msse4.2, -mavx, -mfpu=neon flags also requires passing -msimd128!

@hoodmane
Copy link
Member

Thanks for looking into this @leopsidom!

Should we bump the versions for these packages?

Yeah, if you do manage to update any of these a PR is quite welcome. See also:
#1464
It would maybe be helpful to update that issue.

@leopsidom
Copy link
Contributor Author

I'd be glad to take a look. But I have minimum knowledge of numpy source code and emscrypten, so not sure how I should estimate the effort.

@leopsidom
Copy link
Contributor Author

leopsidom commented Sep 19, 2021

Just to add some update, I did some bisect analysis on different numpy versions. The latest numpy that can build is 1.18.5. Since 1.19.0, we had some issues. For 1.20+ build, I'm currently getting this issue:

numpy/core/src/common/simd/avx512/avx512.h:54:18: error: unknown type name '__m512'; did you mean '__m128'?
typedef struct { __m512 val[3]; } npyv_f32x3;

I think it has something to do with numpy adding support for avx512 architecture since 1.19.0: numpy/numpy@5562a8c but not sure how to resolve it yet.

I cut an issue to numpy here: numpy/numpy#19892

@dynamicwebpaige
Copy link

Adding a +1 for updating the pandas version in Pyodide!

There are several nice new features in v1.2 and v1.3 (including table visualizations and exporting to LaTeX).

@rth
Copy link
Member

rth commented Nov 2, 2021

Pandas was updated in #1922: just bumping the version worked .

The numpy update is indeed a bit more challenging. @leopsidom As far as I understand we could indeed build SIMD code with emscripten however it's currently still not supported by Safari, so I think we would need to a wait a bit more before enabling it. Meanwhile, I think we would could skip the numpy SIMD backend. In particular, we probably shouldn't build anything under

numpy/core/src/common/simd/

and use the baseline implementation instead. As far as I understand it's possible to disable those at build time via the NPY_DISABLE_CPU_FEATURES env variable numpy/numpy#18901 (comment) though it's not very well documented numpy/numpy#18926

@rth rth changed the title Upgrade numpy and pandas versions Upgrade numpy version Nov 2, 2021
@rth rth changed the title Upgrade numpy version Upgrade numpy version to 1.20+ Nov 2, 2021
@rth
Copy link
Member

rth commented Nov 2, 2021

build time via the NPY_DISABLE_CPU_FEATURES env

Hmm, no that's at runtime. For build time the option is --cpu-baseline as you have been doing @leopsidom . Have you tried --cpu-baseline=""? For emcc we can skip unknown params, if necessary, similarly to what was done here (see #835 for an example PR)

@leopsidom
Copy link
Contributor Author

leopsidom commented Nov 3, 2021

As far as I understand we could indeed build SIMD code with emscripten however it's currently still not supported by Safari

@rth Yeah emscripten only supports up to 128-bit AVX, which means we can build up to AVX from the compile option doc here: https://numpy.org/devdocs/reference/simd/simd-optimizations.html#build-options-for-compilation. However currently numpy SIMD supports up to avx 512.

I think we would could skip the numpy SIMD backend

That makes sense. I thought we were always building numpy with SIMD enabled, so I was trying to find way to get it working. I got function signature mismatch when trying to build with cpu baseline as avx: numpy/numpy#20079 (comment). It seems some lapack-lite type definition is not correct here.

Have you tried --cpu-baseline=""

I tried cpu-baseline="avx" by making a patch with this line: https://github.com/numpy/numpy/blob/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/numpy/distutils/command/build.py#L38 which gives me: numpy/numpy#20079 (comment)

For emcc we can skip unknown params

Cool, great to know that. It might be better to make the change there instead of making another numpy patch.

@leopsidom
Copy link
Contributor Author

leopsidom commented Nov 6, 2021

Ahh, even if I disable optimization here. I still got the following error:

wasm-ld: error: function signature mismatch: dorgqr_64_

defined as (i32, i32, i32, i32, i32, i32, i32, i32, i32) -> i64 in build/temp.linux-x86_64-3.9/numpy/linalg/lapack_litemodule.o
defined as (i32, i32, i32, i32, i32, i32, i32, i32, i32) -> i32 in build/temp.linux-x86_64-3.9/numpy/linalg/lapack_lite/f2c_d_lapack.o

So it seems this error is not related to SIMD but some changes between 0.18.5 and 0.19.0.

Wonder if there's a way we can build on a specific commit so we can do bisect analysis and find out which commit caused the issue.

@rth
Copy link
Member

rth commented Nov 6, 2021

wasm-ld: error: function signature mismatch: dorgqr_64_

We do get some number of those in numpy and scipy. It seems that there are some mismatched signatures in general (particularly when f2c is involved). When dynamically linking it would still work for x86_64 apparently, however that fails in WASM with emscripten which is more strict. So the solution is often to patch the incorrect signature (also worth checking if there is anything similar in existing numpy patches).

Wonder if there's a way we can build on a specific commit

Yes, you should be able to indicate https://github.com/numpy/numpy/archive/7c211013d559db238daf76fdb90af1d2f37a341e.zip (or the appropriate commit hash) as the download link (it's the link under "Code->Download zip" on the GH project page, which gets updated if you view source for a particular commit). Otherwise it's also possible to check git blame on GH to see when those file signatures changed.

@leopsidom
Copy link
Contributor Author

leopsidom commented Nov 6, 2021

also worth checking if there is anything similar in existing numpy patches

Good idea. Let me double check the existing patches.

you should be able to indicate https://github.com/numpy/numpy/archive/7c211013d559db238daf76fdb90af1d2f37a341e.zip (or the appropriate commit hash) as the download link

Nice. Let me try that later today. I tried to manually check all the commits between the two versions. But there are too many of them, and it's hard to pinpoint which commit caused the issue.

@rth
Copy link
Member

rth commented Nov 6, 2021

I tried to manually check all the commits between the two versions.

Using git bisect could be faster if you want to find it by testing.

@leopsidom
Copy link
Contributor Author

Nice, great to know git bisect. Haven't used it before. But looks like it's only useful if we can build from a local directory of numpy in pyodide-build, not sure if it's possible ?

@leopsidom
Copy link
Contributor Author

Ahh, found the commit that caused the problem: numpy/numpy@0159b84. It's trying to build lapack-lite in 64 bit integer mode based on the system information.

@leopsidom
Copy link
Contributor Author

But now I'm getting this error, hmm:

clang-13: error: argument unused during compilation: '-mpopcnt' [-Werror,-Wunused-command-line-argument]

@hoodmane
Copy link
Member

hoodmane commented Nov 7, 2021

You could try filtering out -mpopcnt in pywasmcross.

@leopsidom
Copy link
Contributor Author

@hoodmane Wow, nice that worked. The latest numpy builds! Let me create a PR for this. Currently I'm still building with --cpu-baseline as AVX to enable minimum SIMD. But I guess we can discuss if we wanted that in the PR.

@leopsidom leopsidom mentioned this issue Nov 7, 2021
3 tasks
@leopsidom
Copy link
Contributor Author

Created a PR here: #1934

@Skylion007
Copy link

As far as I understand we could indeed build SIMD code with emscripten however it's currently still not supported by Safari

Couldn't we just dynamically pick which module to load depending on whether or not the browser supports WASM-SIMD? Numpy is a pretty core library and the lack of the SIMD support is one of the primary slowdowns I have encountered between Pyodide and CPython.

@rth
Copy link
Member

rth commented Jan 20, 2022

It should be possible. Numpy already has some runtime SIMD detection so it would be a matter of compiling numpy SIMD modules with Emscripten SIMD support then patch numpy runtime detection to take into account the browser.

Though a more impactful change IMO would be to use a better BLAS (#227) for numpy. That's what determines the performance of all linear algebra in numpy including the dot product. Related benchmarks in https://twitter.com/ethanhs/status/1381500487334162432

@yxdragon
Copy link

yxdragon commented Mar 7, 2022

As far as I understand we could indeed build SIMD code with emscripten however it's currently still not supported by Safari

I only want to make WeChat App (an chat app with webkit x5 embaded), Is that mean I can compile an simd release?
would it be difficult? and can np.dot and np.matmul become faster after simd-compiling?

@HarikrishnanBalagopal
Copy link

It should be possible. Numpy already has some runtime SIMD detection so it would be a matter of compiling numpy SIMD modules with Emscripten SIMD support then patch numpy runtime detection to take into account the browser.

Though a more impactful change IMO would be to use a better BLAS (#227) for numpy. That's what determines the performance of all linear algebra in numpy including the dot product. Related benchmarks in https://twitter.com/ethanhs/status/1381500487334162432

Can you please clarify what the current situation is with Numpy and SIMD? If I import and use the builtin numpy package https://pyodide.org/en/stable/usage/packages-in-pyodide.html and run Pyodide v0.22.1 https://www.npmjs.com/package/pyodide in NodeJS does that mean it will use the WASM SIMD instructions?

We are trying to write an app that uses SIMD for computing the Mandelbrot set.

@hoodmane
Copy link
Member

hoodmane commented Mar 3, 2023

The current situation is that simd is disabled. Our current minimum Safari version is 14.2 (I think, it would be good if we tested our target minimum browser versions). Per this chart, Safari added support for SIMD in version 16.4 released two weeks ago. So as @rth said we would need to build two different numpy wheels and then pick which one at runtime by feature detecting simd. This is entirely doable but someone would need to do it.

@hoodmane
Copy link
Member

hoodmane commented Mar 3, 2023

In the meantime, you can build your own numpy wheel with simd enabled.

@HarikrishnanBalagopal
Copy link

In the meantime, you can build your own numpy wheel with simd enabled.

Thanks for the reply, if possible can you point me in the right direction?

  • How do you build the wheel for WASM and Pyodide?
  • How do you provide it to Pyodide? I am running it in NodeJS

@hoodmane
Copy link
Member

hoodmane commented Mar 3, 2023

How do you build the wheel for WASM and Pyodide?

https://github.com/numpy/numpy/blob/main/.github/workflows/emscripten.yml

How do you provide it to Pyodide? I am running it in NodeJS.

In Node you can load it with await pyodide.loadPackage("path/to/wheel");. Make sure you load it before any numpy deps because loading a numpy dependency will load our standard numpy wheel.

@HarikrishnanBalagopal
Copy link

HarikrishnanBalagopal commented Mar 3, 2023

In Node you can load it with await pyodide.loadPackage("path/to/wheel");. Make sure you load it before any numpy deps because loading a numpy dependency will load our standard numpy wheel.

Thanks,

Make sure you load it before any numpy deps because loading a numpy dependency will load our standard numpy wheel.

I am not sure what this means, in my python script I only have import numpy as np no other dependencies.
In the NodeJS code I have this and no other dependencies

    const pyodide = await loadPyodide({});
    console.log('loading the numpy package');
    await pyodide.loadPackage("numpy");

@hoodmane
Copy link
Member

hoodmane commented Mar 3, 2023

I just mean that if you do e.g.,:

await pyodide.loadPackage(scipy);
await pyodide.loadPackage("path/to/custom/numpy");

it won't work.

@HarikrishnanBalagopal
Copy link

HarikrishnanBalagopal commented Mar 3, 2023

How do you build the wheel for WASM and Pyodide?

https://github.com/numpy/numpy/blob/main/.github/workflows/emscripten.yml

How do you provide it to Pyodide? I am running it in NodeJS.

In Node you can load it with await pyodide.loadPackage("path/to/wheel");. Make sure you load it before any numpy deps because loading a numpy dependency will load our standard numpy wheel.

Sorry, don't mean to bother you but running that workflow to build and test seems to fail https://github.com/HarikrishnanBalagopal/numpy/actions/runs/4323405620/jobs/7546973464#step:9:50
and I noticed that some runs on the main repo have failed as well but for other reasons
https://github.com/numpy/numpy/actions/runs/4235697499/jobs/7359641955#step:6:1


Update

So using the wheel from the Github workflow, we are able to use it to generate the Mandelbrot set.

    await pyodide.loadPackage('./numpy-0.1.0_alpha.1.mybuild-cp310-cp310-emscripten_3_1_27_wasm32.whl');

@hoodmane Is there some way to verify if the wheel actually has any SIMD instructions?
How do we check if Pyodide is actually running any SIMD instructions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants