Skip to content
This repository has been archived by the owner on Dec 2, 2023. It is now read-only.

Support Numpy Duck Arrays #74

Open
mrocklin opened this issue May 21, 2018 · 6 comments
Open

Support Numpy Duck Arrays #74

mrocklin opened this issue May 21, 2018 · 6 comments

Comments

@mrocklin
Copy link
Contributor

Tangent provides source-to-source automatic differentiation of functions containing Numpy syntax

In [1]: import numpy as np

In [2]: def f(x):
   ...:     return np.sum(np.exp(x)) + 1

In [3]: x = np.arange(5)
In [4]: f(x)
Out[4]: 86.7910248837216

In [5]: import tangent
In [6]: df = tangent.grad(f)
In [7]: df(x)
Out[7]: array([ 1.        ,  2.71828183,  7.3890561 , 20.08553692, 54.59815003])

It currently has a pluggable mechanism to support both numpy arrays and tensorflow arrays explicitly. However, it would be nice if it also supported other numpy-like arrays using duck typing. Currently this appears not to be the case.

In [8]: import dask.array as da
In [9]: x = da.arange(5, chunks=(2,))
In [10]: f(x)
Out[10]: dask.array<add, shape=(), dtype=float64, chunksize=()>

In [11]: _.compute()
Out[11]: 86.7910248837216

In [12]: df(x)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-31ac6e885892> in <module>()
----> 1 df(x)

/tmp/tmp3sxcen8j/tangent_b64e.py in dfdx(x, b_return)
      3     np_sum_np_exp_x = np.sum(np_exp_x)
      4     _return = np_sum_np_exp_x + 1
----> 5     assert tangent.shapes_match(_return, b_return
      6         ), 'Shape mismatch between return value (%s) and seed derivative (%s)' % (
      7         numpy.shape(_return), numpy.shape(b_return))

~/workspace/tangent/tangent/utils.py in shapes_match(a, b)
    627     return match
    628   else:
--> 629     shape_checker = shape_checkers[(type(a), type(b))]
    630     return shape_checker(a, b)
    631 

KeyError: (<class 'dask.array.core.Array'>, <class 'float'>)

It would be convenient if tangent could be used for other objects that "quack like a numpy.ndarray" for which there are a few today (numpy, sparse, dask.array, cupy).

cc @njsmith @shoyer @ericmjl @hameerabbasi

@mdanatg
Copy link

mdanatg commented May 21, 2018

I haven't tried it, but this should be doable by registering the existing numpy handler with the respective types. If they are ndarray-like, this should be sufficient.

For example, modifying https://github.com/google/tangent/blob/master/tangent/utils.py#L633 to include dask.array would be one step:

register_all_shape_checker(
    array_shapes_match, (numpy.ndarray, Number, float, int, numpy.float32,
                         numpy.float64, numpy.int32, numpy.int64, dask.array),
    ignore_existing=True)

Something similar would need to be done for register_add_grad, register_init_grad, register_unbroadcast, register_unreduce. https://github.com/google/tangent/blob/master/tangent/tf_extensions.py could be used for guidance because it has all the configuration calls specifically required for tensorflow arrays.

@hameerabbasi
Copy link

Would it be possible to (for example) do some metaprogramming and detect __array_ufunc__ in a class? I certainly remember seeing something like this in the code of the future module.

@mrocklin
Copy link
Contributor Author

Yeah, so many projects have plugin mechanisms like this. This results in an n by m interaction matrix. There is an effort underway to instead have projects respect protocols rather than do explicit type checks. That way if a new ndarray project comes online (as they seem to be doing today) it doesn't need to register itself with every downstream computational library.

I might suggest that you do a check like the following:

def is_numpy_like(x):
    return hasattr(x, 'shape') and hasattr(x, 'dtype') and hasattr(x, '__array_ufunc__')

Though this check will likely evolve in the future (and hopefully be upstreamed into numpy).

@hameerabbasi
Copy link

hameerabbasi commented May 21, 2018

I might be wrong here, but this project seems like it relies heavily on type checks. I'll see if I can check the code of the future module and do some metaprogramming and put that check into a metaclass. I'll submit a PR if I'm successful.

@mdanatg
Copy link

mdanatg commented May 21, 2018

For well known types like ndarray, we can add direct checks. Have a look for instance at shapes_match in util.py. It has special handling for list/dict/tuple. One can imagine inserting another branch that calls is_numpy_like.

The interaction matrix was designed as a more generic mechanism for cases when such a protocol does not exist. It's used for example with tensorflow arrays which are not array-like. It has constant access time and is cleaner than the alternative - calling is_foo in a loop.

@mrocklin
Copy link
Contributor Author

The interaction matrix was designed as a more generic mechanism for cases when such a protocol does not exist

Yes, and I think that this is the correct choice for these sorts of situations. My hope is that all numpy-like arrays can be a single row in this matrix.

To be clear, when I'm talking about a matrix of interactions I'm not talking about a matrix like what is in tangent with axes of ndarray-container and operation, I'm talking about a higher level matrix that has axes projects-that-implement-ndarray-containers and projects-that-consume-ndarray-containers. In that matrix tangent is one column.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants