Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy load function with evil data will cause command execution #12759

Closed
nanshihui opened this issue Jan 16, 2019 · 32 comments
Closed

numpy load function with evil data will cause command execution #12759

nanshihui opened this issue Jan 16, 2019 · 32 comments

Comments

@nanshihui
Copy link

numpy load function with evil data will cause command execution,if attack share evil data on internet,
when user load it , it will cause command execution.

Reproducing code example:

import numpy
from numpy import __version__
print __version__
import os
import  pickle
class Test(object):
    def __init__(self):
        self.a = 1

    def __reduce__(self):
        return (os.system,('ls',))
tmpdaa = Test()
with open("a-file.pickle",'wb') as f:
    pickle.dump(tmpdaa,f)
numpy.load('a-file.pickle')

Numpy/Python version information:

1.14.6

@nanshihui
Copy link
Author

the version <=1.16.0 , worked

@seberg
Copy link
Member

seberg commented Jan 16, 2019

Yes, which is why np.load(allow_pickle=True) was added, now I guess we could make a move to switch over to defaulting to False and give a well readable message " use allow_pickle="True" if you trust this file".

I agree that this would be the better default, so I am open to pushing that deprecation, even if it is unfortunately a bit noisy e.g. for all those scientists just sharing some data in the lab (or just saving/reloading themselves).

@seberg
Copy link
Member

seberg commented Jan 16, 2019

So allow_pickle was added in april 2015, so it would seem that is should have existed since numpy 1.10. So I think that move does get more realistic now, since I doubt many using/supporting 1.17 will also still support 1.10 (removing the pain of supporting the kwarg or not supporting it). Although for the moment it seems scipy at least still supports 1.8 in version 1.

@nanshihui
Copy link
Author

it seems it will be last for a long time

@Plazmaz
Copy link

Plazmaz commented Jan 17, 2019

I would suggest logging a deprecation warning and giving a date if you want a smooth transition.

@seberg
Copy link
Member

seberg commented Jan 17, 2019

@Plazmaz of course, I would go with a VisibleDeprecationWarning, if we want casual users to stop doing it. Then deprecate after one or two releases. The thing is that it is annoying to work around if you have to and the kwarg does not exist in some older versions. Because then you have to do if np.__version__ > ...: use kwarg else do not use kwarg to avoid the warning and support both.

Anyway, I think there is a good chance you can get it into 1.17. So if you feel open a PR, but we may want to ping the mailing list to see if someone complains.

@limburgher
Copy link
Contributor

Hi, Fedora numpy RPM maintainer. What's a good way to mitigate this in distro packaging?

@seberg
Copy link
Member

seberg commented Jan 22, 2019

I do not know of a nice way. Depending on the concern level, I would be up to adding a warning very soon, so that it is definitely there in 1.17. If someone is extremely concerned, we could discuss backporting it or moving quicker, but that would depend a lot on whether or not downstream depends on it.

@jhanwar
Copy link

jhanwar commented Jan 23, 2019

I am working on this.

@tylerjereddy
Copy link
Contributor

cc @jeanqasaur re: security / vulnerability expertise

@eric-wieser
Copy link
Member

eric-wieser commented Jan 31, 2019

Hi, Fedora numpy RPM maintainer. What's a good way to mitigate this in distro packaging?

@limburgher: what does fedora do about the exact same functionality built into Python? It's not clear that this is something that needs mitigating.

While I'm not opposed to changing the default, it seems wrong to declare this a vulnerability. It's working as documented and designed.

@njsmith
Copy link
Member

njsmith commented Jan 31, 2019

Unfortunately the rule is that once a CVE number is assigned, it no longer matters whether there is any bug or not, the distros have to try to do something to prove to their customers that they are Providing Value. Not sure what that would be here, but companies and ops people are always struggling to manage the ongoing flood of vulnerabilities, and the tools they use to do this don't have a lot of room for communicating nuance, so that's the way the pressure goes. We don't have customers though, so we shouldn't necessarily take that into account outselves.

We can tell during save and load whether a particular file uses pickle or not, right? It probably is good to migrate to allow_pickle=False in both cases, with an intermediate period where we issue some kind of deprecation warning exactly in the cases where save or load actually does need to use pickle and allow_pickle wasn't specified.

@eric-wieser The difference from the stdlib pickle is that load/save actually can avoid using pickle in most cases (e.g. simple arrays of primitive types); pickle only gets used in more exotic cases like object arrays or IIRC certain complicated dtypes. This makes it possible for folks who are mostly using the safe case to miss that the unsafe case exists, if they don't read the docs closely enough. And anyway, given that we have both a "safe mode" and an "unsafe mode", it's better for the "safe mode" to be the default. For stdlib pickle OTOH, it's always 100% unsafe 100% of the time so there's no point in worry about defaults.

@limburgher
Copy link
Contributor

Honestly, if it's documented, intentional functionality, I can close the BZ in good conscience, especially if safe is the default. I don't know how we handle Python's functionality. I'll look.

@limburgher
Copy link
Contributor

From my examination of the spec, I don't think we alter anything from upstream in that regard.

@Plazmaz
Copy link

Plazmaz commented Jan 31, 2019

Has the CVE been disputed? That might make the scenario clearer to maintainers.

@pv
Copy link
Member

pv commented Feb 1, 2019

The CVE appears largely bogus. That numpy.load can execute arbitrary code is well known and documented, and it is necessary for loading serialized Python object arrays. The user can forbid loading object arrays by passing allow_pickle=False to this library function.

It would have been better if the default had been to load object arrays only when explicitly asked, but it is as it is for historical reasons. The transition has been suggested also before, and the discussion above is about how to make it in a way that does not uncontrollably break backward compatibility.

Careless use of numpy.load, similarly as of Python pickling, can however lead to vulnerabilities in downstream applications.

@adeak
Copy link
Contributor

adeak commented Feb 1, 2019

That numpy.load can execute arbitrary code is well known and documented, and it is necessary for loading serialized Python object arrays.

I would rather only say that it is documented. I've been using numpy for a few years, and while I'm not a frequent user of numpy.save/numpy.load it wasn't obvious to me at all that numpy.load suffers from the same vulnerability as pickle does. Of course I didn't know that numpy.load might use pickle under the hood (I only use numpy-native arrays and never gave it a thought, exactly the scenario that @njsmith mentioned).

The fact that pickle is vulnerable is well-known, and its documentation has a big red warning on top saying

Warning: The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

In comparison, the docs of numpy.load seems to mention the whole security aspect as an aside in the description of its allow_pickle keyword:

allow_pickle: bool, optional
Allow loading pickled object arrays stored in npy files. Reasons for disallowing pickles include security, as loading pickled data can execute arbitrary code. If pickles are disallowed, loading object arrays will fail. Default: True

I wouldn't hate it if we could put a Big Red Warning into the documentation of numpy.load, at least until allow_pickle=False becomes the default. Until that change is made seeing numpy.load should raise the same red flags in one's mind that pickle.load raises.

@mattip
Copy link
Member

mattip commented Feb 2, 2019

Documentation PR welcome for numpy.load

adeak added a commit to adeak/numpy that referenced this issue Feb 2, 2019
Load uses pickle under the hood for object arrays, this is made
more visible in the documentation using a warning.
See also numpygh-12759
@mattip
Copy link
Member

mattip commented Feb 3, 2019

Documentation now has a warning about pickle

@mcepl
Copy link

mcepl commented Feb 4, 2019

Unfortunately the rule is that once a CVE number is assigned, it no longer matters whether there is any bug or not, the distros have to try to do something to prove to their customers that they are Providing Value. Not sure what that would be here, but companies and ops people are always struggling to manage the ongoing flood of vulnerabilities, and the tools they use to do this don't have a lot of room for communicating nuance, so that's the way the pressure goes.

@njsmith it is not so bad: we will make numpy.load default to allow_pickle to False, which is actually not completely stupid idea.

mattip pushed a commit to mattip/numpy that referenced this issue Apr 16, 2019
charris pushed a commit to ivanov/numpy that referenced this issue Apr 17, 2019
charris pushed a commit to charris/numpy that referenced this issue Apr 17, 2019
@Manjunath07
Copy link

Could you please close this issue as it is referenced in https://nvd.nist.gov/vuln/detail/CVE-2019-6446 because of which nexus iq still consider it has Vulnerable

@rgommers
Copy link
Member

thanks @Manjunath07

NexediGitlab pushed a commit to Nexedi/wendelin that referenced this issue Mar 2, 2023
…at to numpy

Base_wendelinTextToNumpy just takes a string and pass it to numpy.load,
but numpy.load will load pickles (problem from numpy/numpy#12759 ).
Therefore we disallow loading pickles here until this becomes the
default in newer numpy versions.

Co-authored-by: Jérome Perrin <jerome@nexedi.com>
NexediGitlab pushed a commit to Nexedi/wendelin that referenced this issue Mar 3, 2023
…at to numpy

Base_wendelinTextToNumpy just takes a string and pass it to numpy.load,
but numpy.load will load pickles (problem from numpy/numpy#12759 ).
Therefore we disallow loading pickles here until this becomes the
default in newer numpy versions.

Co-authored-by: Jérome Perrin <jerome@nexedi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests