Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.19.3 regression: Memory Error on un-pickle of large arrays #17825

Closed
yaav opened this issue Nov 22, 2020 · 15 comments
Closed

1.19.3 regression: Memory Error on un-pickle of large arrays #17825

yaav opened this issue Nov 22, 2020 · 15 comments

Comments

@yaav
Copy link

yaav commented Nov 22, 2020

This looks like numpy 1.19.3 regression, as it works well with numpy 1.19.2 and all other packages unchanged.

Environment:

C:\Python39>python -m pip list
Package         Version
--------------- ---------
certifi         2020.11.8
chardet         3.0.4
idna            2.10
lxml            4.6.1
numpy           1.19.3
pandas          1.1.4
pip             20.2.4
python-dateutil 2.8.1
pytz            2020.4
requests        2.25.0
setuptools      50.3.2
six             1.15.0
tqdm            4.53.0
urllib3         1.26.2
wheel           0.35.1

C:\Python39>python --version
Python 3.9.0

C:\Python39>systeminfo
OS Name:                  Windows 10 Pro
OS Version:                10.0.19042 N/A Build 19042
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
System Manufacturer:       LENOVO
System Model:              20MF000WRT
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 158 Stepping 10 GenuineIntel ~1405 Mhz
BIOS Version:              LENOVO N2EET50W (1.32 ), 23.09.2020
Total Physical Memory:     65 073 MB

Sample code to trigger the issue

Be careful! >40Gb RAM is required!

import pandas as pd
import multiprocessing as mp
import numpy as np


def test(index):
    size = 13000000
    d = {
        "arr1": np.ndarray((size,)),
        "arr2": np.ndarray((size,)),
        "arr3": np.ndarray((size,)),
        "arr4": np.full((size,), index),
        "arr5": np.full((size,), True),
        "arr6": np.full((size,), "This is a test string")
    }
    return pd.DataFrame.from_dict(d)


def main():
    with mp.Pool() as pool:
        res = pool.map(test, range(10))
    print(res[0].index.size)


if __name__ == '__main__':
    main()

Expected output:

13000000

Actual output:

Traceback (most recent call last):
  File "C:\Users\ya-a-\PycharmProjects\MyScripts\test.py", line 26, in <module>
    main()
  File "C:\Users\ya-a-\PycharmProjects\MyScripts\test.py", line 21, in main
    res = pool.map(test, range(10))
  File "C:\Python39\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Python39\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[          arr1  arr2  arr3  arr4  arr5                   arr6
0          0.0   0.0   0.0     7  True  This is a test string
1          0.0   0.0   0.0     7  True  This is a test string
2          0.0   0.0   0.0     7  True  This is a test string
3          0.0   0.0   0.0     7  True  This is a test string
4          0.0   0.0   0.0     7  True  This is a test string
...        ...   ...   ...   ...   ...                    ...
12999995   0.0   0.0   0.0     7  True  This is a test string
12999996   0.0   0.0   0.0     7  True  This is a test string
12999997   0.0   0.0   0.0     7  True  This is a test string
12999998   0.0   0.0   0.0     7  True  This is a test string
12999999   0.0   0.0   0.0     7  True  This is a test string

[13000000 rows x 6 columns]]'. Reason: 'MemoryError()'

Process finished with exit code 1
@yaav yaav changed the title 1.17.3 regression: Memory Error on un-pickle of large arrays 1.19.3 regression: Memory Error on un-pickle of large arrays Nov 23, 2020
@mattip mattip modified the milestones: 1.20.0 release, 1.19.5 release Nov 23, 2020
@mattip
Copy link
Member

mattip commented Nov 23, 2020

Maybe connected with int32/int64 confusion and windows?

@seberg
Copy link
Member

seberg commented Dec 11, 2020

Just noticed this was tagged for 1.19.5. There were very few changes between the two versions and right now I am not sure where to look for a regression.
@yaay could you confirm for us that nothing else is different? Just to be sure that not only pandas is the same, but also the python version and operating system (or even computer)?

@yaav
Copy link
Author

yaav commented Dec 11, 2020

@seberg, yes, everything else is the same: computer, python, all python libraries. I just install numpy==1.19.2 and the issue disappears.

@seberg
Copy link
Member

seberg commented Dec 11, 2020

@yaav would you be able to quickly also check 1.19.4, just to be sure it wasn't some very strange thing around OpenBLAS? I will try to reproduce it today (or at least check out if I can see a memory bloat difference, I guess).

@yaav
Copy link
Author

yaav commented Dec 12, 2020

@seberg, unfortunately 1.19.4 doesn't work for me at all:

C:\Python39\python.exe test.py
 ** On entry to DGEBAL parameter number  3 had an illegal value
 ** On entry to DGEHRD  parameter number  2 had an illegal value
 ** On entry to DORGHR DORGQR parameter number  2 had an illegal value
 ** On entry to DHSEQR parameter number  4 had an illegal value
Traceback (most recent call last):
  File "C:\Users\ya-a-\PycharmProjects\MyScripts\test.py", line 1, in <module>
    import pandas as pd
  File "C:\Python39\lib\site-packages\pandas\__init__.py", line 11, in <module>
    __import__(dependency)
  File "C:\Python39\lib\site-packages\numpy\__init__.py", line 305, in <module>
    _win_os_check()
  File "C:\Python39\lib\site-packages\numpy\__init__.py", line 302, in _win_os_check
    raise RuntimeError(msg.format(__file__)) from None
RuntimeError: The current Numpy installation ('C:\\Python39\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3d
m3h86

@charris
Copy link
Member

charris commented Dec 12, 2020

@yaav 1.19.4 and 1.19.3 are the same except for the OpenBLAS library.

@seberg
Copy link
Member

seberg commented Dec 12, 2020

Well, the only serious changes between 1.19.2 and 1.19.3 are OpenBLAS and the buffer-info fix. And I don't see how the buffer info fix can be incorrect (and doubt that would go unnoticed). Yes, buffers are used in pickle, but the contiguous flag is not touched, and the stride sanitation doesn't seem to have a bug, so I honestly don't see what could have changed between these two versions...

@charris
Copy link
Member

charris commented Dec 12, 2020

Looks like you are running Window version 2004. Out of curiosity, when did you upgrade?
Note that there is a pickling fix between 1.19.2 and 1.19.3, see #17059 BUG: fix pickling of arrays larger than 2GiB.

Might be worth trying 1.20.0rc1 to see if the problem is still there.

@yaav
Copy link
Author

yaav commented Dec 12, 2020

@charris, the Windows upgrade date is June 23rd, 2020
numpy==1.20.0rc1 doesn't cause the RuntimeError, but MemoryError is still there.
BTW, the memory issue is intermittent, but quite frequent. It is almost unreproducible if you set the number of processes to 5 in the reproducer I provided and happens in ~80% of runs with default number (10)

@seberg
Copy link
Member

seberg commented Dec 12, 2020

Just to be clear. I am officially out of ideas. The only idea I had was that the pickle5 buffer export fails for some reason, so it uses a more bloated way to export the buffer. I cannot reproduce that, checking explicitly for whether that buffer export failed.

For the moment, I assume that this is random behaviour, and the memory peak is just randomly higher on 1.19.3 and above. I do not know why, nor what the maximum realistic memory usage per thread is (to estimate what the maximum memory usage could be in the worst case).

@h-vetinari
Copy link
Contributor

Not sure if this is a reasonable candidate, but since @yaav is on Window version 2004, maybe just maybe, there are some other windows runtime issues like the register corruption from #16744? (Don't think there should be any fmod-operations on array-init, except perhaps - very creatively - to recalculate striding?)

In any case, if @yaav were able to upgrade past build 20270, we could test that hypothesis (though I realise this is a non-trivial effort since that build is currently only in the dev channel, not beta or preview yet).

@charris
Copy link
Member

charris commented Jan 29, 2021

I kicked this off to the 1.20.1 release as I expect the Windows update will be out before then.

@mattip
Copy link
Member

mattip commented Feb 3, 2021

Windows update is out. @yaav can you update to get the fix and try to reproduce?

@yaav
Copy link
Author

yaav commented Feb 4, 2021

@mattip, just checked the numpy==1.20.0 with both Python 3.8 and 3.9 after today's Windows patches - the issue has gone.

@charris charris closed this as completed Feb 4, 2021
@charris
Copy link
Member

charris commented Feb 4, 2021

@yaav Thanks for the update. I'll close this now. Feel free to reopen if the problem returns.

@charris charris removed this from the 1.20.1 release milestone Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants