Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Make attaching docstrings to __array_function__ funcs more robust #16209

Closed
wants to merge 1 commit into from

Conversation

rgommers
Copy link
Member

Closes gh-14384

add_docstring(
implement_array_function,
# We can't use add_newdoc (which is more robust than add_docstring) here to
# attach these docstrings (cyclic imports), and add_docstring() gives a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we resolve these cyclic imports instead?

Or alternatively, just move these docstrings to add_newdocs with all the others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried for 15 minutes, it's nontrivial. The other option is to move the docstring into the C code. I don't think add_docstring should be used ever.

I'm going for a minimal fix for 1.19.x though, this is a painful issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, just move these docstrings to add_newdocs with all the others?

Tried that too, doesn't work - add_newdoc cannot attached things to _multiarray_umath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that too, doesn't work - add_newdoc cannot attached things to _multiarray_umath.

I don't know what you mean by that - isn't that exactly what https://github.com/numpy/numpy/blob/master/numpy/core/_add_newdocs.py does already?

I think you'd probably use 'numpy.core.overrides' in that file, but it still should work just fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This is about reloading and rerunning the same code twice. How about fixing this in add_docstring itself...

        doc_attr = PyObject_GetAttrString(obj, "__doc__");
        if (doc_attr != NULL && doc_attr != Py_None) {
            PyErr_Format(PyExc_RuntimeError, "object %s", msg);
            return NULL;
        }

replace that with a check that the old and new docstring are identical. And if they are simply do nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't notice if we set it twice by accident, but since it will still error if there is a disagreement, that seems not a huge issue and feels more reasonable then a try/except here that may hide errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't really follow why this file does its own docstring attachment - can't we do it just like we do here:

add_newdoc('numpy.core.multiarray', 'datetime_data',

In both cases, it's attaching a docstring to an extension method from multiarray_umath.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This is about reloading and rerunning the same code twice. How about fixing this in add_docstring itself...

        doc_attr = PyObject_GetAttrString(obj, "__doc__");
        if (doc_attr != NULL && doc_attr != Py_None) {
            PyErr_Format(PyExc_RuntimeError, "object %s", msg);
            return NULL;
        }

replace that with a check that the old and new docstring are identical. And if they are simply do nothing.

yes this would be really nice. another quick and hacky option maybe just filter "already has a docstring" exceptions and raise everything else ? this will avoid hiding other exceptions, but still mask errors with same msg caused by something else.

@rgommers
Copy link
Member Author

The one CI failure is a Mingw32 install problem.

@rgommers rgommers added this to the 1.19.0 release milestone May 11, 2020
@seberg
Copy link
Member

seberg commented May 11, 2020

@rgommers I assume there is an easy way to try this out locally inside spyder or IPython?

@seberg
Copy link
Member

seberg commented May 11, 2020

OK, this works:

import numpy as np
for k in list(sys.modules.keys()): 
   if "numpy" in k: 
        del sys.modules[k] 
import numpy as np

but as I suspected, all this does is move the error to the next function which is empty_like... Moving this to C is the only reasonable thing I suspect.
Even then, I still wonder if there isn't something that can be fixed up-stream to see where this usage actually comes from (or happens more often). It still feels like a generally unsafe thing to me.

@rgommers
Copy link
Member Author

@rgommers I assume there is an easy way to try this out locally inside spyder or IPython?

Not super easy, something like this isn't enough to trigger it:

import sys

import numpy
x = numpy.arange(3)
rng = numpy.random.RandomState(42)

del x
del numpy
del sys.modules['numpy']

import numpy
x = numpy.arange(3)
rng = numpy.random.RandomState(42)

print(sys.getrefcount(numpy))

from importlib import reload

reload(numpy)

I'll ask someone over in the other issue to test.

@rgommers
Copy link
Member Author

Ah thanks for the reproducer - yes need to delete recursively in sys.modules.

@rgommers
Copy link
Member Author

That makes for a good regression test.

@seberg
Copy link
Member

seberg commented May 11, 2020

Well, I don't know... Maybe my snippet is way too aggressive compared to what people actually do...

@rgommers
Copy link
Member Author

@seberg
Copy link
Member

seberg commented May 11, 2020

@rgommers but the fundamental problem remains... For example:

In [3]: import numpy as np 
   ...: print(id(np._NoValue)) 
   ...: for k in list(sys.modules.keys()):  
   ...:    if "numpy" in k:  
   ...:         del sys.modules[k]  
   ...: import numpy as np 
   ...: print(id(np._NoValue))                                                                                          
139846708487024
139846355005152

Thus, all C-functions which use np._NoValue as singleton can fail, if they were already executed once. Reloading should at least come with a big warning that this is very much unsafe!

@seberg
Copy link
Member

seberg commented May 11, 2020

But I will agree that this, unlike sub-interpreters, is very much fixable and probably only reasonably difficult (although easy to miss a case here and there).

@rgommers
Copy link
Member Author

Reloading should at least come with a big warning that this is very much unsafe!

That makes sense. How/where to raise it though?

@seberg
Copy link
Member

seberg commented May 11, 2020

Probably make it as specific as possible, in multiarraymodule.c:

PyMODINIT_FUNC PyInit__multiarray_umath(void) {

Just add somethign along the lines of:

static int initialized = 0;

if (initialized != 0) {
     /* Give warning */
}
intialized = 1;

EDIT: This may not work because spyder/pycharm may be smart enough to not reload C-extension modules, but still agressive enough to reload the dependencies of those C-modules, leaving us in the same inconsistent state. One option could be to make a C-function which returns and caches np._NoValue and the main __init__.py can query that at the end, and check that np._NoValue is np.core.multiarray._multiarray_umath._get_novalue().

@charris
Copy link
Member

charris commented May 12, 2020

close/reopen

@charris charris closed this May 12, 2020
@charris charris reopened this May 12, 2020
@charris charris changed the title BUG: make attaching docstrings to __array_function__ funcs more robust BUG: Make attaching docstrings to __array_function__ funcs more robust May 13, 2020
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might prefer the C-level or even just checking if implementation.__doc__ == dispatcher.__doc__ here (moving it to C "solves" the issue of dealing with the duplication here – at least leaves this code untouched).

As is now, if there is some other error, for any weird reason we won't notice. And if we do actually add it twice, we could be using a second incorrect doc version and also not notice.

@@ -29,3 +32,13 @@ def test_novalue():
assert_equal(repr(np._NoValue), '<no value>')
assert_(pickle.loads(pickle.dumps(np._NoValue,
protocol=proto)) is np._NoValue)

def test_reimport():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I am a bit worried about this test. Because it definitely messes up internal/cached. So if for example we have one place before and after this relying on that state (testing certain exceptions, using np._NoValue, or checking np.matrix on the C-side), this will lead to cryptic test failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about running it in a subprocess similar to some of the tests in test_io.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Somehow I completely forgot about that possibility!

@rgommers
Copy link
Member Author

I'll close this, continued in gh-16239

@rgommers rgommers closed this May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError: implement_array_function method already has a docstring
5 participants