More Python 3 fixes #457

astrofrog · 2014-10-02T20:33:51Z

This is a work in progress, and I am aiming to try and get the test suite to pass in Python 3.

Related issues:

~~imread returns different results on Python 2 and Python 3 scikit-image/scikit-image#1187~~
np.unique no longer works with fixed float and string arrays in Python 3 numpy/numpy#5145
get_readable_fileobj behaves differently for file object input in Python 3 astropy/astropy#3013

(jeez, this is turning out to be quite the bug hunt)

astrofrog · 2014-10-02T22:50:06Z

@ChrisBeaumont - just for info, for the categorical data, you currently convert to an object array, which np.unique can't deal with in Python 3 (numpy/numpy#5145 and numpy/numpy#641). I'll try and figure out a workaround but if you also have any ideas, let me know. It would be easy if we just had np.unique(values) since once can use list(set()) to do the filtering, but you also use return_inverse so we need to figure out how to emulate that.

astrofrog · 2014-10-02T22:51:28Z

Just to check, is there a reason that we need to allow mixed type arrays for categorical data? What is the use case for this? (as opposed to just strings)

astrofrog · 2014-10-02T22:56:13Z

@ChrisBeaumont - another question - in the image loader you have:

def img_loader(file_name):
    """Load an image to a numpy array, using either PIL or skimage

    :param file_name: Path of file to load
    :rtype: Numpy array
    """
    try:
        from skimage.io import imread
        return np.asarray(imread(file_name))
    except ImportError:
        pass

    try:
        from PIL import Image
        return np.asarray(Image.open(file_name))
    except ImportError:
        raise ImportError("Reading %s requires PIL or scikit-image" %
                          file_name)

scikit-image is just interfacing to matplotlib or PIL. If it uses matplotlib, then it actually returns the wrong result (it returns a floating point array normalized to 1). Can we simply remove the first try...except and use just PIL/pillow? (since scikit-image is just using PIL)

ChrisBeaumont · 2014-10-02T23:00:55Z

Requiring string arrays for categorical data might be fine -- @JudoWill developed this feature, so he should confirm. But I think that should be general enough.

astrofrog · 2014-10-02T23:02:41Z

@ChrisBeaumont - ok, and we can also cast to string in case there are any non-string values in the data, to be a little more flexible.

Regarding the scikit-image, the relevant issue where I found out about the different plugins is here: scikit-image/scikit-image#1187

Basically I don't think there is any benefit to using scikit-image because it doesn't actually implement it, it's just a wrapper around matplotlib and PIL.

astrofrog · 2014-10-02T23:04:34Z

For scikit-image I can also simply check the output type and if the output is float then transform it to the uint representation for images (as described here: http://scikit-image.org/docs/0.10.x/user_guide/data_types.html)

ChrisBeaumont · 2014-10-02T23:05:14Z

Regarding the skimage try/except block: I suspect I thought skimage was able to parse more formats than PIL can. In principle this is true, since it has a plugin system akin to astropy. But I see the only default plugins are PIL and matplotlib, so skimage isn't adding anything in the normal case. No strong opinions from me

astrofrog · 2014-10-02T23:11:03Z

Ok, I'll see what I can do tomorrow. I might leave it but add a call to img_as_ubyte to force it to uint8.

…t with mixed float and string types

…to indicate no category

astrofrog · 2014-10-03T09:03:04Z

@JudoWill - I've updated the CategoricalData class to convert categories to strings but recognize nan and None as missing values, and converting the categories to '' for these. Can you review this change and let me know if it sounds ok to you?

astrofrog · 2014-10-03T10:10:08Z

@ChrisBeaumont - I have all the tests passing in Python 3 except those in glue/qt so in the spirit of moving forward and avoiding regressions, I've updated Travis to test everything except glue/qt on Python 3. This means that at least we won't introduce Python 3 regressions in most places. I will address glue/qt in a separate PR. The things to check before merging are:

that the treatment of categorical data is ok
the question regarding the inequality in a9ffe34
the change to the plotly exporter and test - basically this was relying on dictionary order which is much more random in Python 3 so it uncovered the error (I think it was just lucky that it never failed in Python 2).

Other than that, this is good to go.

JudoWill · 2014-10-03T12:21:43Z

@astrofrog Originally I implemented categorical variables as 'any' object (in some cases one might want to treat an integer as a category) but as I use it more and more, I find that I only ever use strings. If its a blocker for Py3 compatibility I'm more then willing to give up the "generic" nature of categories.

Can you put a type-check in the __init__ (or really anywhere) to make sure someone only ever passes strings, I much prefer this over silent conversion. Maybe raising a NotImplementedError or a simple AssertionError would be sufficient.

astrofrog · 2014-10-03T13:05:48Z

@JudoWill - thanks for the quick reply! I'll implement this. In future, the Numpy bug should be fixed of course, so I'll add a TODO to remind us to change it back if we ever need. Having said that, to plot categorical data we need to convert to strings anyway, so maybe requiring strings is not much of an issue.

ChrisBeaumont · 2014-10-03T13:06:08Z

glue/qt/data_collection_model.py

@@ -131,7 +138,7 @@ def __init__(self, dc, parent):

    @memoize
    def child(self, row):
-        if row < self.dc.subset_groups:


This is a bug in the original code -- the test should be if row < len(self.dc.subset_groups) (integer to integer comparison)

Ah, that makes more sense!

astrofrog · 2014-10-03T13:26:35Z

@JudoWill @ChrisBeaumont - actually we don't have to be so strict, we just can't allow mixed types. But we can have integer or boolean categorical data. I've updated the code to do the following:

If the input is an array, check that it is not of type object
If the input is another iterable, check that all objects are the same type

@ChrisBeaumont - all that's left now is to check my question about the inequality and the plotly change

astrofrog · 2014-10-03T13:31:11Z

@ChrisBeaumont - I've addressed your comments. As mentioned above, pyside doesn't install with Python 3.4 so I'm installing pyside on Python 2, and pyqt on Python 3 - does that work for you?

ChrisBeaumont · 2014-10-03T13:47:09Z

glue/core/data.py

+        # Check that categorical data is of uniform type
+        if isinstance(categorical_data, np.ndarray):
+            if categorical_data.dtype.kind == "O":
+                raise TypeError("Numpy object type not supported")


I'm not sure if I agree with @JudoWill about raising an error here. If a user passes a pandas series of strings into glue, it will be of dtype=object. Raising an error for that case is annoying.

Actually, if the only issue with mixed types is the call to numpy.unique, it looks like pandas.factorize handles mixed types in python3:

In [19]: x = np.array([1, 1, 3, 'hi', 'hi', 3], dtype=object) In [20]: pd.factorize(x) Out[20]: (array([0, 0, 1, 2, 2, 1]), array([1, 3, 'hi'], dtype=object)) In [21]: np.unique(x) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-21-9a478241c930> in <module>() ----> 1 np.unique(x) /Users/beaumont/anaconda/envs/astropy3/lib/python3.3/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts) 193 aux = ar[perm] 194 else: --> 195 ar.sort() 196 aux = ar 197 flag = np.concatenate(([True], aux[1:] != aux[:-1])) TypeError: unorderable types: str() > int()

Since pandas is a required dependency of Glue anyways, I know think it might be best to change the _update_categories method, and keep mixed types unless there's another roadblock.

I can make this change if you want (we probably need to use pandas instead of np.searchsorted in _update_data as well)

@ChrisBeaumont - I'm not too familiar with Pandas so feel free to make the change. If possible I think it would probably make sense to have a compatibility 'unique' function in the utilities that can be called from here, to limit the number of places where we are dependent on pandas (and in future we can switch the compatibility layer back to using Numpy once they have fixed it).

@ChrisBeaumont - Actually, are you sure it doesn't already work as you would like? If you pass a pandas object it will currently not validate as a numpy array so it will iterate over them to check that they are all strings, then make a string array out of them. In fact, I think there are already some tests that use pandas.

It might, I didn't check. But I opened a PR on your copy that should deal with the problems of mixed types more directly.

Use pandas to deal with categorical lookups on mixed-type arrays

astrofrog · 2014-10-03T15:29:24Z

@ChrisBeaumont - I didn't have travis set up on my fork so didn't see there were failures. I should be able to address them.

ChrisBeaumont · 2014-10-03T15:52:36Z

Ah, sorry about that

astrofrog · 2014-10-03T16:04:58Z

@ChrisBeaumont - the build is passing now! Feel free to merge :) I'll then rebase #458.

ChrisBeaumont · 2014-10-03T20:11:30Z

Great, thanks!

More Python 3 fixes

Final round of Python 3 fixes (requires #457 to be merged first)

astrofrog added 2 commits October 2, 2014 22:32

More Python 3 fixes

0fd0d18

More fixes

7027363

astrofrog added 7 commits October 3, 2014 09:31

Use string type for categorical data because Python 3 cannot sort lis…

d142471

…t with mixed float and string types

Added workaround for Astropy bug in Python 3

5f14fd8

Fixed more tests

50f0ffc

Fix img_data factory when PIL is not present

d5d25a3

Fix object comparison in Python 3

a9ffe34

Fix comparison

3400077

Fix categorical data so that it uses strings but can use NaN or None …

5355d39

…to indicate no category

astrofrog added 8 commits October 3, 2014 11:06

Fix test failure in d3po exporter

a6b3c7a

Test everything except glue/qt in Python 3.4

3589051

Fix PATH

d094d0b

Fix Python version

3249e82

pyside is not available in anaconda for Python 3.x

a7f6165

Fix syntax

f6efad2

Fix syntax

8ba9c05

Fix plotly test

84b1352

astrofrog mentioned this pull request Oct 3, 2014

Final round of Python 3 fixes (requires #457 to be merged first) #458

Merged

astrofrog changed the title ~~WIP: More Python 3 fixes~~ More Python 3 fixes Oct 3, 2014

astrofrog added the Python3 label Oct 3, 2014

ChrisBeaumont reviewed Oct 3, 2014
View reviewed changes

Check that categorical data is of uniform type

9c6b0b7

astrofrog added 2 commits October 3, 2014 15:30

Proper fix for inequality - was originally a bug

66379d2

Explicitly set attribute names

cc91733

ChrisBeaumont reviewed Oct 3, 2014
View reviewed changes

Chris Beaumont and others added 2 commits October 3, 2014 11:02

Use pandas to deal with categorical lookups on mixed-type arrays

de0f081

Merge pull request #3 from ChrisBeaumont/more-python3

7a55774

Use pandas to deal with categorical lookups on mixed-type arrays

astrofrog added 2 commits October 3, 2014 17:34

Fix failure in test_high_cardinatility_timing

f51690b

Fix failure with using pandas as Numpy index

959b212

astrofrog force-pushed the more-python3 branch from 0e55311 to 959b212 Compare October 3, 2014 15:48

ChrisBeaumont added a commit that referenced this pull request Oct 3, 2014

Merge pull request #457 from astrofrog/more-python3

062fcb0

More Python 3 fixes

ChrisBeaumont merged commit 062fcb0 into glue-viz:master Oct 3, 2014

astrofrog added a commit that referenced this pull request Oct 4, 2014

Merge pull request #458 from astrofrog/fix-qt-python3

12fb717

Final round of Python 3 fixes (requires #457 to be merged first)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Python 3 fixes #457

More Python 3 fixes #457

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

ChrisBeaumont commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

ChrisBeaumont commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 3, 2014

astrofrog commented Oct 3, 2014

JudoWill commented Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont Oct 3, 2014

astrofrog Oct 3, 2014

astrofrog commented Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont Oct 3, 2014

astrofrog Oct 3, 2014

astrofrog Oct 3, 2014

ChrisBeaumont Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont commented Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont commented Oct 3, 2014

More Python 3 fixes #457

More Python 3 fixes #457

Conversation

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

ChrisBeaumont commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 2, 2014

ChrisBeaumont commented Oct 2, 2014

astrofrog commented Oct 2, 2014

astrofrog commented Oct 3, 2014

astrofrog commented Oct 3, 2014

JudoWill commented Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont Oct 3, 2014

Choose a reason for hiding this comment

astrofrog Oct 3, 2014

Choose a reason for hiding this comment

astrofrog commented Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont Oct 3, 2014

Choose a reason for hiding this comment

astrofrog Oct 3, 2014

Choose a reason for hiding this comment

astrofrog Oct 3, 2014

Choose a reason for hiding this comment

ChrisBeaumont Oct 3, 2014

Choose a reason for hiding this comment

astrofrog commented Oct 3, 2014

ChrisBeaumont commented Oct 3, 2014

astrofrog commented Oct 3, 2014

ChrisBeaumont commented Oct 3, 2014