Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf.constant should support pandas Series and DataFrame as input #2318

Closed
shoyer opened this issue May 11, 2016 · 6 comments · Fixed by #8638
Closed

tf.constant should support pandas Series and DataFrame as input #2318

shoyer opened this issue May 11, 2016 · 6 comments · Fixed by #8638
Labels
stat:contribution welcome Status - Contributions welcome type:feature Feature requests

Comments

@shoyer
Copy link
Contributor

shoyer commented May 11, 2016

Currently, it does not.

I would suggest implementing this by checking for a __array__ method, which is the standard API used for indicating that an object is coercible into NumPy arrays, and is implemented by nearly every library that implements "array like" objects in the Python ecosystem. This includes pandas, as well as many other widely used libraries, including xarray and dask.array.

@mrry
Copy link
Contributor

mrry commented May 13, 2016

This should be easy to add, after my fix to #2328 goes in.

@rdadolf: Was there any reason for your thumbs down? If this change is easy to make, is there any reason not to do it?

@rdadolf
Copy link
Contributor

rdadolf commented May 13, 2016

So this caught my eye because of the pandas tag, but I suppose if I am to be consistent, it should apply to #2328 as well.

My feeling is that this kind of everything-should-map-to-everything coercion is harmful to a programming environment, mostly because it undermines the type system. Type errors turn into logical errors, which are far harder to debug. This is the reason I avoid pandas (sorry, @shoyer!), and I'd rather it not be in TensorFlow either.

From your comments in the other thread and the existence of this, it seems likely I'm in the minority here, so the feature should probably move ahead. Still, I thought I'd weigh in.

@shoyer
Copy link
Contributor Author

shoyer commented May 13, 2016

My feeling is that this kind of everything-should-map-to-everything coercion is harmful to a programming environment, mostly because it undermines the type system. Type errors turn into logical errors, which are far harder to debug.

I am in total agreement with you, but I think "numpy array likes" are a well defined duck type -- they're anything that implements the __array__ method.

Importantly, I do not propose to test this by seeing if coercing something to an array with np.array works. NumPy is far too willing to convert anything into 0d object arrays, though I think most NumPy devs agree that this is a bad idea.

This is the reason I avoid pandas (sorry, @shoyer!), and I'd rather it not be in TensorFlow either.

Funny you should mention this. Recently, most of my contributions to pandas seem to be fighting a losing battle for type safety.

On a related note, whatever choice we make here for coercing arrays in tf.constant should also hold for values put into feed_dict for Session.run.

@rdadolf
Copy link
Contributor

rdadolf commented May 13, 2016

I am in total agreement with you, but I think "numpy array likes" are a well defined duck type -- they're anything that implements the __array__ method.

Importantly, I do not propose to test this by seeing if coercing something to an array with np.array works. NumPy is far too willing to convert anything into 0d object arrays, though I think most NumPy devs agree that this is a bad idea.

This seems convincing. It avoids ambiguity by explicitly foisting the decisions on the __array__ method author, which is a plus.

I probably should have directed my emoji-flavored vitriol elsewhere. Or perhaps, you know, actually taken the time to write a coherent response.

On a related note, whatever choice we make here for coercing arrays in tf.constant should also hold for values put into feed_dict for Session.run.

This doesn't seem compatible with session's existing method, which is apparently distinct (?) from the one used for normal conversion, which is also incompatible. Both use direct type comparisons, not duck-typing checks. It seems like both locations would need to change. Maybe @mrry has already dealt with this.

@girving
Copy link
Contributor

girving commented Jun 8, 2016

We'd be happy to accept a PR for __array__ support.

@yongtang
Copy link
Member

I added a PR #8638 to support __array__. Please take a look and let me know if there are any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:contribution welcome Status - Contributions welcome type:feature Feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants