BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object) #8758

danjump · 2017-03-08T21:52:21Z

I have a list of dictionaries. I'm trying to use np.vectorize to apply a function that modifies dictionary elements for each dictionary in the list. The results seem to show that vectorize is acting twice on the first element. Is this a bug that can be fixed?(perhaps related to the fact that vectorize checks type on the first element?) Below are some example cases and output:

A simple test case with no dictionary modifications:

def fcn1(x):
    return x['b']
a = [{'b': 1} for _ in range(3) ]
print(a)
print(np.vectorize(fcn1)(a))
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[1 1 1]
[{'b': 1}, {'b': 1}, {'b': 1}]

Now modify the dictionary and see that the function is applied twice to the first element:

def fcn2(x):
    x['b'] += 1
    return x['b']
a = [{'b': 1} for _ in range(3) ]
print(a)
print(np.vectorize(fcn2)(a))
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[3 2 2]
[{'b': 3}, {'b': 2}, {'b': 2}]

Try a different modification to check consistency of the bug:

def fcn3(x):
    x['b'] *= 2
    return x['b']
a = [{'b': 1} for _ in range(3) ]
print(a)
print(np.vectorize(fcn3)(a))
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[4 2 2]
[{'b': 4}, {'b': 2}, {'b': 2}]

You can do the same thing without actually providing a return value (which is how i'm trying to use it in my use case):

def fcn4(x):
    x['b'] += 1
a = [{'b': 1} for _ in range(3) ]
print(a)
np.vectorize(fcn4)(a)
print(a, '\n\n')

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[{'b': 3}, {'b': 2}, {'b': 2}]

And by the way, there is nothing special about a list of length 3, you can change that and see the same behavior of only the first element being double-modified.

I'm confirmed the behavior using both Numpy version 1.11.3 and 1.12.0

EDIT:
I found a work-around that also confirms its a "testing the type on the first element" issue. If you specify the otypes argument, the first element doesn't get hit twice:

def fcn(x):
    x['b'] += 1
    return x['b']
a = [{'b': 1} for _ in range(3)]
print a
print np.vectorize(fcn, otypes=[dict])(a)
print a, '\n\n'

output:

[{'b': 1}, {'b': 1}, {'b': 1}]
[2 2 2]
[{'b': 2}, {'b': 2}, {'b': 2}]

The text was updated successfully, but these errors were encountered:

danjump · 2017-03-08T22:07:15Z

I just edited my post with a new test that seems to confirm checking the first item type IS what causes the problem

eric-wieser · 2017-03-08T22:09:01Z

Simpler testcase:

a = np.array([1, 2, 3])
def f(x):
    print('got', x)
    return x
fv = np.vectorize(f)
y = fv(a)

Gives:

got 1
got 1
got 2
got 3

warut-vijit · 2017-03-08T22:10:43Z

"If otypes is not specified, then a call to the function with the first argument will be used to determine the number of outputs. The results of this call will be cached if cache is True to prevent calling the function twice. However, to implement the cache, the original function must be wrapped which will slow down subsequent calls, so only do this if your function is expensive."
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

So it's well-documented behavior, but does seem counterintuitive. So maybe its more of an enhancement than a bug fix.

danjump · 2017-03-08T22:13:53Z

ah, clearly I didn't read all of the docs well enough. That does fix the issue of double calls but at the cost of slower execution. So I guess there is no way to prevent this double-calling while still maintaining performance?

danjump · 2017-03-08T22:32:10Z

for example, in numpy/lib/function_base.py in the vectorize class function _get_ufunc_and_otypes(), I would naively think you could modify these lines:

inputs = [arg.flat[0] for arg in args]
outputs = func(*inputs)

to:

#earlier
import copy

...

inputs = copy.deepcopy([arg.flat[0] for arg in args])
outputs = func(*inputs)

And then you would not have to use the cacheing or specify otypes but I think you avoid hitting the actual mutable elements twice. But I'm ignorant to how much of a performance hit that would give compared to the cacheing.

I'm just getting the sense that the caching behavior was designed with expensive function execution time in mind, not thinking about the case of a function modifying a mutable object. I would think it's potentially possible to accommodate mutable object modification without the performance hit of caching that had a long function execution time in mind.

warut-vijit · 2017-03-08T22:46:11Z

I think that for vectorized operations on arrays of very large objects, it might actually be more computationally intensive to make a deep copy of the first element than it would cost to apply the function. So it might be a good idea to have that functionality if the user doesn't specify otype, but then say in the documentation that performance may take a hit unless otype is specified for arrays of large objects. @eric-wieser what do you think?

shoyer · 2017-03-08T23:44:21Z

copy.deepcopy() is not safe on many inputs, so unfortunately that's not a viable option.

The only way to fix this would be to rewrite the core of vectorize, which currently uses numpy.frompyfunc to create an actually numpy ufunc. An alternate inner loop would need to be created, similar to what we use for np.apply_along_axis. Unfortunately, to avoid performance degradation, I think we would need to do the loop in C (like frompyfunc does in the ufunc it constructs).

eric-wieser added the 00 - Bug label Apr 21, 2017

jeremysalwen mentioned this issue Nov 17, 2020

Soundness bug with connected components sambayless/monosat#29

Closed

ivanzvonkov mentioned this issue Aug 21, 2022

Stable 0.1.3 nasaharvest/openmapflow#82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object) #8758

BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object) #8758

danjump commented Mar 8, 2017 •

edited

danjump commented Mar 8, 2017

eric-wieser commented Mar 8, 2017 •

edited

warut-vijit commented Mar 8, 2017 •

edited

danjump commented Mar 8, 2017

danjump commented Mar 8, 2017 •

edited

warut-vijit commented Mar 8, 2017 •

edited

shoyer commented Mar 8, 2017

BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object) #8758

BUG: np.vectorize() functions operate twice on the first element (as seen when the function modifies a mutable object) #8758

Comments

danjump commented Mar 8, 2017 • edited

danjump commented Mar 8, 2017

eric-wieser commented Mar 8, 2017 • edited

warut-vijit commented Mar 8, 2017 • edited

danjump commented Mar 8, 2017

danjump commented Mar 8, 2017 • edited

warut-vijit commented Mar 8, 2017 • edited

shoyer commented Mar 8, 2017

danjump commented Mar 8, 2017 •

edited

eric-wieser commented Mar 8, 2017 •

edited

warut-vijit commented Mar 8, 2017 •

edited

danjump commented Mar 8, 2017 •

edited

warut-vijit commented Mar 8, 2017 •

edited