make internal broadcast and unbroadcast both primitives #292

mattjj · 2017-09-12T16:39:59Z

At @dougalm's suggestion, I took a stab at making our internal broadcast and unbroadcast functions into primitives. They seem to form a nice pair!

This might prevent graph expansion and be a bit faster, though I haven't actually run asv on this change yet.

Any thoughts on this first pass, @j-towns?

…lling through calls)

j-towns · 2017-09-13T15:27:05Z

Yeah looks like a good idea. I'd been meaning to switch to using numpy's broadcast_to inside our broadcast for a while.

j-towns · 2017-09-13T15:36:40Z

Also I feel like we should get the optional vspace checking setup as a matter of priority so that we can rigorously test that these functions are outputting the correct thing (dtype in particular).

j-towns · 2017-09-13T15:45:01Z

autograd/numpy/numpy_vjps.py

+    target_shape, target_ndim, _, target_iscomplex = target_meta
+    x_shape = onp.shape(x)
+    while onp.ndim(x) > target_ndim:
+        x = onp.sum(x, axis=broadcast_idx)


I was wondering if we should replace the above two lines with:

x = onp.sum(x, axis=range(broadcast_idx, broadcast_idx + onp.ndim(x) - target_ndim))

or similar. Am I right that only calling sum once might lead to better performance, basically because only one output array has to be allocated?

I briefly tried something like that (though if broadcast_idx is -1, which is the only nonzero use case I noticed in the code, then I think we want something different) and it didn't seem to make a speed difference, so I dropped it. Now is a good time to make sure it's performant, though!

Doing a few timings it looks like there is a benefit for small arrays but it's not massive:

In [15]: a = np.ones((5, 5, 5)) In [16]: %timeit np.sum(a, axis=(0, 1)) 5.38 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [17]: %timeit x = np.sum(a, axis=0); x = np.sum(x, axis=0) 8.62 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

and for slightly bigger arrays it's the other way round (maybe I've made some mistake?):

In [18]: a = np.ones((50, 50, 50)) In [19]: %timeit np.sum(a, axis=(0, 1)) 118 µs ± 930 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [20]: %timeit x = np.sum(a, axis=0); x = np.sum(x, axis=0) 81.6 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Wow, I got similar timings. That seems weird for the bigger arrays...

j-towns · 2017-09-13T15:46:45Z

autograd/numpy/numpy_vjps.py

-    if anp.iscomplexobj(x) and not target_iscomplex:
-        x = anp.real(x)
+        if size == 1:  # TODO(mattjj): bug here w/ passing through scalars?
+            x = onp.sum(x, axis=axis, keepdims=True)


You could do a similar thing for this one.

mattjj · 2017-09-13T15:48:48Z

Btw I think this change was inspired by your improvements to the dot VJPs.

Re: vspaces, I generally agree, though I'm thinking that if vspaces are primarily for testing, we should use them extensively in our testing code but not incur the costs at runtime for every grad eval.

j-towns · 2017-09-13T17:15:10Z

Yeah cool I'm totally agreed that's the right approach re: vspaces

…

On Wed, 13 Sep 2017 at 16:48, Matthew Johnson ***@***.***> wrote: Btw I think this change was inspired by your improvements to the dot VJPs. Re: vspaces, I generally agree, though I'm thinking that if vspaces are primarily for testing, we should use them extensively in our testing code but not incur the costs at runtime for every grad eval. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#292 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOjguxl1x1F_UjUW09Y9fAc3olejst2pks5sh_lqgaJpZM4PU72r> .

I think this obviates the changes in HIPS#292.

j-towns · 2017-10-26T15:33:35Z

I've effectively incorporated the changes in this pr into #312.

mattjj added 4 commits September 12, 2017 09:36

make broadcast and unbroadcast both primitives

6714dc6

replace broadcast implementation with a call to onp.broadcast_to

70fa6ba

broadcast/unbroadcast primitives should call onp (not anp)

91f3960

prevent broadcast/unbroadcast nodes from always being added (allow fa…

00b8f29

…lling through calls)

j-towns reviewed Sep 13, 2017

View reviewed changes

j-towns added a commit to j-towns/autograd that referenced this pull request Oct 18, 2017

Use broadcast_to not broadcast, also refactor unbroadcast.

0e18a7e

I think this obviates the changes in HIPS#292.

j-towns mentioned this pull request Nov 3, 2017

Define ufunc JO and JTO simultaneously #312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make internal broadcast and unbroadcast both primitives #292

make internal broadcast and unbroadcast both primitives #292

mattjj commented Sep 12, 2017

j-towns commented Sep 13, 2017

j-towns commented Sep 13, 2017 •

edited

j-towns Sep 13, 2017 •

edited

mattjj Sep 13, 2017

j-towns Sep 13, 2017

mattjj Sep 14, 2017

j-towns Sep 13, 2017 •

edited

mattjj commented Sep 13, 2017

j-towns commented Sep 13, 2017 via email

j-towns commented Oct 26, 2017 •

edited

make internal broadcast and unbroadcast both primitives #292

Are you sure you want to change the base?

make internal broadcast and unbroadcast both primitives #292

Conversation

mattjj commented Sep 12, 2017

j-towns commented Sep 13, 2017

j-towns commented Sep 13, 2017 • edited

j-towns Sep 13, 2017 • edited

Choose a reason for hiding this comment

mattjj Sep 13, 2017

Choose a reason for hiding this comment

j-towns Sep 13, 2017

Choose a reason for hiding this comment

mattjj Sep 14, 2017

Choose a reason for hiding this comment

j-towns Sep 13, 2017 • edited

Choose a reason for hiding this comment

mattjj commented Sep 13, 2017

j-towns commented Sep 13, 2017 via email

j-towns commented Oct 26, 2017 • edited

j-towns commented Sep 13, 2017 •

edited

j-towns Sep 13, 2017 •

edited

j-towns Sep 13, 2017 •

edited

j-towns commented Oct 26, 2017 •

edited