Define ufunc JO and JTO simultaneously #312

j-towns · 2017-10-16T14:10:43Z

To possibly do:

Implement derivatives for the binary ufunc methods (see Implement ufunc.outer #234).

Summary of the changes in this pr

New helper function def_ufunc_jps for defining the jvp and vjp of a ufunc in one shot. The lines

defjvp(anp.sin, lambda g, ans, x : g * anp.cos(x))
defvjp(anp.sin, lambda ans, x : lambda g: g * anp.cos(x))

can be replaced with a single line

#                      ('derivative'             , linear operator to apply)
def_ufunc_jps(anp.sin, (lambda ans, x: anp.cos(x), 'mul'                   ))

I've added a docstring to def_ufunc_jps explaining how to use it.

All numpy and scipy ufuncs and ufunc-like functions to the new format enabling forward mode for many scipy primitives. Enable forward mode tests for newly supported primitives.
Make broadcast_to into a primitive, define its adjoint in numpy_wrapper.py and setup derivatives. This is roughly the same as make internal broadcast and unbroadcast both primitives #292.

New helper function def_ufunc_jps_inv_pair for defining the jps of an inverse pair of ufuncs in one shot. So for example the four defs

defvjp(anp.square,  lambda ans, x : lambda g: g * 2 * x)		
defvjp(anp.sqrt,    lambda ans, x : lambda g: g * 0.5 * x**-0.5)

defjvp(anp.square,      lambda g, ans, x : g * 2 * x)		
defjvp(anp.sqrt,        lambda g, ans, x : g * 0.5 * x**-0.5)

become

def_ufunc_jps_inv_pair(anp.square, anp.sqrt, lambda ans, x: 2 * x)

Implement this for the 10 or so inverse pairs I spotted. This could also make implementing the grads of inverse cdf functions (which exist for most scipy.stats distributions) very straightforward.

Move match_complex, unbroadcast and unbroadcast_f into newly created autograd.numpy.util alongside the new helper functions (I think this rearrangement makes sense).
(Bonus) I've added the derivatives and tests for scipy.special.rel_entr.

Notes

We could reduce the number of lines of code for other primitive defs in this way. In particular I've got my eye on the reductions (sum, mean, var, std) to potentially do next. I also think, at least in the case of ufuncs, that this style is clearer. I guess we'd better check that there's little or no harm to performance. I think any overhead introduced could potentially be optimized away by carefully handling different special cases in def_ufunc_jps.

During higher order derivatives, the computation of what I've called 'derivative' in the snippet above could be cached and reused. This computation is currently being re-done, because the same primitive's jvp/vjp is being called at each layer of unboxing, with numerically the same ans and *args inputs (although g will be different for each layer). I might write a little blog post about this as I suspect this is an optimization that may not yet be implemented in e.g. Theano or Tensorflow. Implementing this in Autograd might require something similar to what was discussed in #188.

I think this obviates the changes in HIPS#292.

This should minimize memory overhead.

j-towns · 2017-10-18T10:21:50Z

autograd/numpy/numpy_jvps.py

+                     unbroadcast_f(args[argnum], lambda g: -g)),
+        'mul':  (lambda argnum, deriv: lambda g, ans, *args: g * deriv(ans, *args),
+                 lambda argnum, deriv: lambda ans, *args:
+                     unbroadcast_f(args[argnum], lambda g, d=deriv(ans, *args): g * d)),


For the vjps I've used this slightly weird d=deriv(ans, *args) default argument syntax to ensure that deriv is evaluated during the forward pass, allowing *args and ans to potentially be garbage collected.

Any objections? I could also have done this using a kind of helper closure to evaluate deriv, which would have been a bit more explicit.

Also move broadcast_to_adjoint into numpy_wrapper.

j-towns · 2017-11-02T23:52:57Z

@mattjj / @dougalm can you review this?

j-towns · 2017-11-06T14:49:32Z

Have added a couple of benchmarks and run a bench compare. Almost everything is the same but there are some differences from compute which I've shifted from the backward pass to the forward pass:

    before     after       ratio
  [1b96990a] [98bedda2]
+  541.78μs   725.65μs      1.34  bench_core.time_long_forward_pass
+   34.93μs    44.31μs      1.27  bench_core.time_short_forward_pass
+  697.56ms   879.13ms      1.26  bench_core.time_fan_out_fan_in_grad
-   30.90ms    21.06ms      0.68  bench_numpy_vjps.time_tanh_0
-   17.07μs    10.21μs      0.60  bench_core.time_short_backward_pass
-  305.48μs   120.86μs      0.40  bench_core.time_long_backward_pass

~~Also something's going wrong in the needless nodes case, will check that out.~~ Edit: fixed the garbage collection of needless nodes and the vjp of add taking longer.

j-towns · 2017-11-13T11:27:51Z

Ping @mattjj + @dougalm! Can you take a look at this pr?

j-towns · 2017-11-20T13:39:54Z

I'm wondering whether, for consistency, all of the extra numpy-ish primitives that we define (things like the dot and tensordot adjoints) should be in numpy_wrapper, alongside things like make_diagonal and broadcast_to_adjoint.

They can be viewed as extra primitives that we want to add to numpy (primitives which happen to be useful for calculating derivatives), so perhaps it makes more sense for them to be there.

j-towns added 3 commits October 16, 2017 16:07

Start implementing def_ufunc_derivs

7b18fe4

Refactor def_ufunc_derivs

91e510e

More ufuncs to new format

89cf08c

j-towns mentioned this pull request Oct 17, 2017

sketching out what TJPs might look like #280

Open

Unary ufuncs new style

94747f5

j-towns changed the title ~~[Experiment][WIP] Define JO and JTO simultaneously~~ [Experiment][WIP] Define ufunc JO and JTO simultaneously Oct 17, 2017

Use broadcast_to not broadcast, also refactor unbroadcast.

0e18a7e

I think this obviates the changes in HIPS#292.

j-towns changed the title ~~[Experiment][WIP] Define ufunc JO and JTO simultaneously~~ Define ufunc JO and JTO simultaneously Oct 18, 2017

Ensure that deriv is evaluated during fwd pass

b753db7

This should minimize memory overhead.

j-towns commented Oct 18, 2017

View reviewed changes

j-towns added 3 commits October 18, 2017 12:29

Merge branch 'dev-1.2' into fwd-rev-in-one-def

9015f0f

Re-add accidentally deleted imports

5104a90

Re-add subval import

b1e5e54

j-towns mentioned this pull request Oct 21, 2017

Changed the tanh derivative from g/cosh(x)**2 to g*(1-tanh(x)**2) #319

Merged

j-towns added 9 commits October 25, 2017 11:34

Rm unnecessary brackets

cd5662b

Merge branch 'dev-1.2' into fwd-rev-in-one-def

760618e

Merge branch 'fwd-rev-onedef-reducs' into fwd-rev-in-one-def

b41a84b

Refactor - new numpy.util module for def_ufunc_jps

d305a45

Also move broadcast_to_adjoint into numpy_wrapper.

Rename binary->nary ufunc jps

a86d0b6

stats.norm jps to new format

21ac8c8

Add possibility for None jp to nary ufuncs

62c3e54

stats.t jps to new format (and psi, polygamma)

585f3cb

Scipy special to new format

a29c770

j-towns mentioned this pull request Oct 26, 2017

make internal broadcast and unbroadcast both primitives #292

Open

j-towns changed the title ~~Define ufunc JO and JTO simultaneously~~ [WIP] Define ufunc JO and JTO simultaneously Oct 26, 2017

simplify logcdf grads

e4c719d

j-towns changed the base branch from dev-1.2 to master October 30, 2017 17:29

j-towns added 3 commits November 2, 2017 15:14

Merge branch 'master' into fwd-rev-in-one-def

fad42ef

Re-add missing scipy tests

57df4d8

hypot grad to new format

8205a3d

j-towns added 7 commits November 2, 2017 16:00

New scipy ufuncs to new format

d2c737a

Simplify def_ufunc_jps api

1ba7015

Refactor def_ufunc_jps

394e28b

Merge branch 'master' into fwd-rev-in-one-def

9a00147

Add docstring to def_ufunc_jps

486ecea

New stats grads to new format

27e882b

Add inverse pair helper

204baea

j-towns changed the title ~~[WIP] Define ufunc JO and JTO simultaneously~~ Define ufunc JO and JTO simultaneously Nov 2, 2017

j-towns added 2 commits November 3, 2017 10:41

Beta fns to new ufunc jp format

98bedda

Add tanh and 'add' benchmarks

b1852af

j-towns added 3 commits November 6, 2017 14:58

rm unnecessary match_complex from ufunc vjps

cd5d24f

fix numpy vjp benchmarks

f6dfd73

Merge branch 'master' into fwd-rev-in-one-def

70e0bd0

j-towns added 4 commits November 20, 2017 10:11

Merge branch 'master' into fwd-rev-in-one-def

087880b

Fix arctan2 jps def

b5fa236

Rm unused imports

57eb1b4

fix indentation numpy_wrapper.py

93e5601

j-towns mentioned this pull request Nov 29, 2017

Implement ufunc.outer #234

Open

Define derivs for scipy.special.rel_entr

a9c0e45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define ufunc JO and JTO simultaneously #312

Define ufunc JO and JTO simultaneously #312

j-towns commented Oct 16, 2017 •

edited

j-towns Oct 18, 2017 •

edited

j-towns commented Nov 2, 2017 •

edited

j-towns commented Nov 6, 2017 •

edited

j-towns commented Nov 13, 2017

j-towns commented Nov 20, 2017 •

edited

Define ufunc JO and JTO simultaneously #312

Are you sure you want to change the base?

Define ufunc JO and JTO simultaneously #312

Conversation

j-towns commented Oct 16, 2017 • edited

To possibly do:

Summary of the changes in this pr

Notes

j-towns Oct 18, 2017 • edited

Choose a reason for hiding this comment

j-towns commented Nov 2, 2017 • edited

j-towns commented Nov 6, 2017 • edited

j-towns commented Nov 13, 2017

j-towns commented Nov 20, 2017 • edited

j-towns commented Oct 16, 2017 •

edited

j-towns Oct 18, 2017 •

edited

j-towns commented Nov 2, 2017 •

edited

j-towns commented Nov 6, 2017 •

edited

j-towns commented Nov 20, 2017 •

edited