New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on Azure CI (Windows instance) with numpy 1.19.0 #16913
Error on Azure CI (Windows instance) with numpy 1.19.0 #16913
Comments
Does it fail consistently or only once in a while? Do you have any windows developers who can try to build the project on a local machine? |
Hi, It failed consistently many times.. at that point I thought about asking Azure developers (my initial guess was that perhaps something had changed in their VMs setup). This link has the discussion I had with a Microsoft developer who spotted the problem could have been numpy: https://developercommunity.visualstudio.com/content/problem/1102472/azure-pipeline-error-with-windows-vm.html?childToView=1119179#comment-1119179 Unfortunately I do not have anyone that can try building the project on a local windows machine :( |
Then we will need a clear set of steps to reproduce |
Would the azure-pipelines.yml work? Here is what we use (https://github.com/equinor/pylops/blob/master/azure-pipelines.yml) commented out at the moment... you can see that it is a pretty standard setup, using Python 3.7, installing dependencies in requirements-dev.txt file (https://github.com/equinor/pylops/blob/master/requirements-dev.txt) and then running the tests. As I mentioned already, if I comment this out and force numpy 1.18.5 everything runs, seems like it is the new 1.19 to break |
What is the windows version major and minor version of the image running on Azure? i.e., what does |
I could find the details of the Azure VMs used in Azure Pipelines here: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml and the link to installed software https://github.com/actions/virtual-environments/blob/master/images/win/Windows2019-Readme.md I am not sure how to run |
It runs from the command line and dumps the output to terminal, so you can add it to your run as a command. |
You could do this in a PR that runs on CI to see what it says. I am asking since there have been issues with the 19041 build of Windows and pip NumPy. |
The answer was in the second link: OS Version: 10.0.17763 Build 1282 |
So my idea bears no fruit. |
You say you know there are some issues with the latest pip wheels for Windows, is it probably connected to that? |
It is actually (probably) a Windows bug introduced in 19041. But you are on a much older version so this is not the issue. It doesn't affect Conda NumPy, only pip NumPy because it seems to be some issue with Windows and OpenBlas. |
I see :) I got an email that 1.9.1 has been released. I am going to try to retrigger the Azure pipeline which would now install the latest version and see if that works, will let you know |
A bug in OpenBlas. Here is a reproducing example: import numpy as np
nr = 12000
v = np.random.randn(nr) + 1j * np.random.randn(nr)
np.vdot(v, v)
# also access violations
v @ v
# also access violations The no symbols debugging information is:
Note that the array has to be pretty big (10k passes, 12k does not) to trigger the bug. |
Quick check: $env:OPENBLAS_VERBOSE=2
$env:OPENBLAS_CORETYPE=Prescott passes but the default kernel ( |
Maybe worth checking that numpy HEAD, which uses a newer OpenBLAS 0.3.10, also fails. Or maybe you already did? |
@mattip no I had not tried this yet. You mean installing bumpy directly from the master with |
And to your question @bashtage (Do the failing tests use numba at all? numba 0.50 has a bug on some versions of windows where it incorrectly makes use of an unavailable intrinsic. This caused crashes for me in another project.) which I got via email but can't seem to see in this thread... the test that crashes uses both |
I just tried with Installing the HEAD of NumPy directly from the GitHub repository and the windows build runs till completion - no sudden crash: https://dev.azure.com/matteoravasi/PyLops/_build/results?buildId=54&view=logs&j=011e1ec8-6569-5e69-4f06-baf193d1351e&t=bf6cf4cf-6432-59cf-d384-6b3bcf32ede2 Interestingly some libraries that have NumPy as dependency don’t seem to install properly (not sure why) and some tests fail for all OS, but at least it’s not a complete crash as before... |
No error using nightly:
|
This doesn't have OpenBLAS unless you explicitly build it in. By default you get a slow, generic BLAS with a |
Looks like we may want to upgrade OpenBLAS for 1.19.2, so marking this. |
I think I might be experiencing the same issue on latest
The line in question is just:
If it helps, I could try saving the arrays to disk and getting them out via an Azure artifact. |
That looks like it. You are using the same pre that I had working correctly. You might want to add
or
to your template to know which kernel is being used. |
It would probably be enough to know the dtypes and dimensions. |
Okay, reproduced on a single run of just the failing test with just numpy+scipy+matplotlib+pytest (and deps) that writes the matrices being multiplied and then uploads the artifacts, here is the artifacts tab: The last
Working on getting the |
I reported the error in OpenMathLib/OpenBLAS#2732 and they suggested it might be fixed in master, see OpenMathLib/OpenBLAS#2728 . No idea the best way to test this, though. |
@mattip Do we know this is closed by MacPython/openblas-libs#35 ? Don't we need to wait until the next weekly is out? |
@charris I think this issue is still open, and a backport will likely be needed. |
Could someone with a reproducer try to build numpy with this commit to get the latest OpenBLAS binaries? So something like (mabe with typos)
You should have install gfortran with |
... this is for windows |
Is this only required for 32-bit? https://github.com/numpy/numpy/blob/master/azure-steps-windows.yml#L29-L31 I'll try what you suggest above with a |
Yes, You need gfortran. The azure machines have mingw 64-bit installed. If you are 32-bits, the invocation is a bit different. You also need to set |
I just verbatim copied most of https://github.com/numpy/numpy/blob/master/azure-steps-windows.yml using I then switched to |
... looks like there is no 32-bit OpenBLAS for 64-bit Windows in: https://anaconda.org/multibuild-wheels-staging/openblas-libs/files I guess I could add the tag to get it to use 64-bit OpenBLAS? |
2 are there and 1 is still being built. Should be up within the hour. |
In the meantime I added:
And it built just fine. No longer segfaults! I'll re-run it a few times just to be sure. Feel free to ping me when the 32-bit OpenBLAS Win64 libs are up and I can easily remove these lines and re-test. |
Any change you run the full test suite :-)
|
Looks like the 32 bit ones are up, and that also works. I'll give the full test suite a run now |
It gets a weird test collection error involving |
You shouldn't waste any more time on this other issue - I can wait until next week and test the weekly which will hopefully have the BLAS. |
Note that we can run the nightly builds at anytime by pushing a commit to the master branch. |
Ok, I'll wait until I see a new one to see if the issue with Windows 10 2004 is fixed. |
@bashtage Any update on this? |
OpenBLAS is still broken on the most recent release of Windows. It is very nonstandard to even get good debugging information because of the mixed to tool chain, at least for me. |
FYI with OpenBLAS 0.3.16 it seems like the |
Hello,
I have recently started experiencing problems when running tests for my project on Azure Pipelines with a Windows instance (
vmImage: 'windows-2019'
). Digging a little bit deeper (see this conversation https://developercommunity.visualstudio.com/content/problem/1102472/azure-pipeline-error-with-windows-vm.html?childToView=1119179#comment-1119179) we realised that the problem originated when we installnumpy 1.19.0
instead ofnumpy 1.8.5
- I can see thatnumpy 1.19.0
was put on PyPI on June 20 and this is around the time when our tests started to fail. Forcing the environment to installnumpy 1.8.5
as in previously successful builds seem to solve the problem.I just wanted to report this as I assume this is something others may have started observing (but it is quite hard to pin-point that numpy is the issue... or at least looks like it is).
Looking forward to hearing from you,
and happy to do any change to my azure pipeline setup if that can help troubleshooting the problem.
Error message:
This build works fine with numpy 1.18.5: https://dev.azure.com/matteoravasi/PyLops/_build/results?buildId=46&view=logs&j=011e1ec8-6569-5e69-4f06-baf193d1351e
A build on the same commit with numpy 1.19.0 fails: https://dev.azure.com/matteoravasi/PyLops/_build/results?buildId=43&view=results
The error is very cryptic, what I explained above is more relevant I think. Here it is anyways:
The text was updated successfully, but these errors were encountered: