New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object) #8758
Comments
I just edited my post with a new test that seems to confirm checking the first item type IS what causes the problem |
Simpler testcase: a = np.array([1, 2, 3])
def f(x):
print('got', x)
return x
fv = np.vectorize(f)
y = fv(a) Gives:
|
"If otypes is not specified, then a call to the function with the first argument will be used to determine the number of outputs. The results of this call will be cached if cache is True to prevent calling the function twice. However, to implement the cache, the original function must be wrapped which will slow down subsequent calls, so only do this if your function is expensive." So it's well-documented behavior, but does seem counterintuitive. So maybe its more of an enhancement than a bug fix. |
ah, clearly I didn't read all of the docs well enough. That does fix the issue of double calls but at the cost of slower execution. So I guess there is no way to prevent this double-calling while still maintaining performance? |
for example, in inputs = [arg.flat[0] for arg in args]
outputs = func(*inputs) to: #earlier
import copy
...
inputs = copy.deepcopy([arg.flat[0] for arg in args])
outputs = func(*inputs) And then you would not have to use the cacheing or specify otypes but I think you avoid hitting the actual mutable elements twice. But I'm ignorant to how much of a performance hit that would give compared to the cacheing. I'm just getting the sense that the caching behavior was designed with expensive function execution time in mind, not thinking about the case of a function modifying a mutable object. I would think it's potentially possible to accommodate mutable object modification without the performance hit of caching that had a long function execution time in mind. |
I think that for vectorized operations on arrays of very large objects, it might actually be more computationally intensive to make a deep copy of the first element than it would cost to apply the function. So it might be a good idea to have that functionality if the user doesn't specify |
The only way to fix this would be to rewrite the core of |
I have a list of dictionaries. I'm trying to use np.vectorize to apply a function that modifies dictionary elements for each dictionary in the list. The results seem to show that vectorize is acting twice on the first element. Is this a bug that can be fixed?(perhaps related to the fact that vectorize checks type on the first element?) Below are some example cases and output:
A simple test case with no dictionary modifications:
output:
Now modify the dictionary and see that the function is applied twice to the first element:
output:
Try a different modification to check consistency of the bug:
output:
You can do the same thing without actually providing a return value (which is how i'm trying to use it in my use case):
output:
And by the way, there is nothing special about a list of length 3, you can change that and see the same behavior of only the first element being double-modified.
I'm confirmed the behavior using both Numpy version 1.11.3 and 1.12.0
EDIT:
I found a work-around that also confirms its a "testing the type on the first element" issue. If you specify the
otypes
argument, the first element doesn't get hit twice:output:
The text was updated successfully, but these errors were encountered: