Python 2/3 string handling #1731

mvdbeek · 2016-02-15T09:10:02Z

To make the string handling python 2 and python 3 compatible, I

removed all instances of unicode and replaced it with six's text_type (which is unicode on python2, str on python3).
replaced basestring with six's string_types
replaced isinstance(var, str) with isinstance(var, string_types) or isinstance(var, string_types) and not isinstance(var, text_type), depending on the context.

jmchilton · 2016-02-15T15:44:59Z

lib/galaxy/tools/parameters/basic.py

@@ -163,9 +163,6 @@ def value_to_basic( self, value, app ):
        return self.to_string( value, app )

    def value_from_basic( self, value, app, ignore_errors=False ):
-        # HACK: Some things don't deal with unicode well, psycopg problem?


@mvdbeek Can you comment on this? Can you be confident this hack is no longer needed?

@mvdbeek Can you comment on this? Can you be confident this hack is no longer needed?

I am (working on) testing this now on a copy of our production server (which is running postgresql-9.4).
Hard to be sure what happens on old postgresql versions though.
The alternative would be

In [12]: u"tést".encode('ascii', 'replace') Out[12]: 't?st'

or

In [4]: print(u"tést".encode('UTF-8')) tést

?

@mvdbeek Can you comment on this? Can you be confident this hack is no longer needed?

Doing a few standard things (running workflows, deleting datasets) it seems to work fine with postgresql-9.3 and postgresql-9.4.
pscycopg explicitly supports unicode as well, so ... I'm not sure what to do. I guess there is no chance to see what exactly was the reason for the hack :/

This was introduced in commit 7bfc238, but that hardly helps.

It is old enough that we can assume library versions are fairly different. I'd say just drop the hack for now.

jmchilton · 2016-02-15T15:55:21Z

👍 Very nice.

xref #1715

nsoranzo · 2016-02-15T16:15:36Z

lib/galaxy/datatypes/data.py

                    out.append( '<tr><td>%s</td></tr>' % escape( line ) )
                else:
-                    out.append( '<tr><td>%s</td></tr>' % escape( unicode( line, 'utf-8' ) ) )
+                    out.append( '<tr><td>%s</td></tr>' % escape( text_type( line, 'utf-8' ) ) )


This does not work in Python 3 because text_type() is just replaced with str():

TypeError: decoding str is not supported

Actually it works when line is binary:

>>> text_type(b'ciao \xc3\xa8', 'utf-8') 'ciao è'

So probably it's correct.

I guess we would make the least amount of assumptions by using text_type(var.encode('utf-8')).
This works in python 2 and 3.

nsoranzo · 2016-02-15T19:55:47Z

Since unicodify() now uses six, it may make sense to add the following change to this PR:

diff --git a/lib/pulsar/client/interface.py b/lib/pulsar/client/interface.py
index 9d625f7..9b3e2a0 100644
--- a/lib/pulsar/client/interface.py
+++ b/lib/pulsar/client/interface.py
@@ -3,10 +3,7 @@ from abc import abstractmethod
 from string import Template

 from six import BytesIO
-try:
-    from six import text_type
-except ImportError:
-    from galaxy.util import unicodify as text_type
+from six import text_type
 try:
     from urllib import urlencode
 except ImportError:

nsoranzo · 2016-02-15T20:42:17Z

lib/galaxy/util/__init__.py

+        return text_type( value, encoding, error )
+    except Exception as e:
+        log.debug("value %s could not be coerced to unicode" % value)
+        log.debug(e)
        return default


This implementation returns default if value is e.g. a list, which is wrong. I think the following implementation is more correct:

if value is None: return None try: if not isinstance(value, string_types): value = str(value) # At this point value is of type str, which in Python 2 needs to be converted to unicode if not isinstance(value, text_type): value = text_type(value, encoding, error) except Exception: log.exception("value %s could not be coerced to unicode" % value) return default return value

right, that makes sense. thanks!

I've updated the above code to have also the call to text_type() inside the try.

mvdbeek · 2016-02-16T13:44:12Z

@nsoranzo thanks a lot for the suggestions and the thorough review. I hope I didn't miss anything.
In addition I switched to using text_type(var.encode('utf-8')) instead of text_type(var, 'utf-8') when var is supposed to be a string, let me know if that's a bad idea.

nsoranzo · 2016-02-16T15:19:21Z

@mvdbeek Thanks for updating the PR! I think text_type(var.encode('utf-8')) is not correct in Python 2:

>>> from __future__ import print_function
>>> from six import text_type
>>> line = 'ciao è'
>>> uline = unicode(line, 'utf-8')
>>> print("uline is '%s' of type %s" % (uline, type(uline)))
uline is 'ciao è' of type <type 'unicode'>
>>> uline = text_type(line.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

If we slightly modify unicodify() as in https://gist.github.com/nsoranzo/6ab474f2ae74c04b0c0c, then we can use this function instead:

>>> uline = unicodify(line)
>>> print("uline is '%s' of type %s" % (uline, type(uline)))
uline is 'ciao è' of type <type 'unicode'>

nsoranzo · 2016-02-16T16:49:33Z

lib/galaxy/web/framework/helpers/__init__.py

-        return unicode( a_string, 'utf-8' )
-    elif a_string_type is unicode:
-        return a_string
+    return unicodify( a_string )


This should specify the 'utf-8' encoding:

return unicodify( a_string, 'utf-8' )

good catch, thanks.

nsoranzo · 2016-02-17T18:23:28Z

lib/galaxy/tools/__init__.py

    else:
-        return val
+        return unicodify(val, "utf8" )


The original code was changing unicode to binary string, now it's doing the opposite! I think this should be:

elif isinstance( val, text_type ): return val.encode( "utf8" ) else: return val

right, I was thinking it might be advantageous to use unicode here and here
It did prevent issue #1702 for me (in a different context, of course), ...
Let me know what you think about this, I can change it back to the original behaviour.
Also, I guess we can remove json_fix from here and import if from galaxy.util.json, right?

+1 to remove json_fix from here and import if from galaxy.util.json .

I think the fix for #1702 should be different, I'll comment there.

nsoranzo · 2016-02-18T16:01:02Z

lib/galaxy/web/framework/helpers/grids.py

-                            if not isinstance( filter, basestring ):
-                                filter = unicode( filter ).encode("utf-8")
+                            if not isinstance( filter, string_types ):
+                                text_type( filter ).encode("utf-8")


filter = text_type( filter ).encode("utf-8")

@nsoranzo
ahh, of course.
but ... what is this code actually doing?
I don't see filter being re-used anywhere ?!
Should this be:

column_filter_encoded = [ ] for filter in column_filter: if not isinstance( filter, string_types ): filter = text_type( filter ).encode("utf-8") column_filter_encoded.append( filter ) extra_url_args[ "f-" + column.key ] = dumps( column_filter_encoded )

?

@mvdbeek You're right! But I would rewrite it as:

column_filter = [text_type(_).encode('utf-8') for _ in column_filter if not isinstance(_, string_types)] extra_url_args[ "f-" + column.key ] = dumps( column_filter )

list comprehension was my first thought too, but in your version we're dropping filter if it is of string_type.
how about this?
column_filter = [text_type(_).encode('utf-8') if not isinstance(_, string_types) else _ for _ in column_filter]

Ops, you're right!

nsoranzo · 2016-02-19T19:13:24Z

@mvdbeek 👍 but the branch has conflicts, can you resolve them?

mvdbeek · 2016-02-20T11:54:16Z

@nsoranzo should be OK now.

…es, unicode with six's text_type.

Python 2/3 string handling

nsoranzo · 2016-02-22T12:34:43Z

Thanks @mvdbeek!

mvdbeek · 2016-02-22T12:36:06Z

Sure, and thanks for your help @nsoranzo !

jmchilton reviewed Feb 15, 2016
View reviewed changes

jmchilton added kind/enhancement status/review labels Feb 15, 2016

jmchilton added this to the 16.04 milestone Feb 15, 2016

nsoranzo added the area/framework label Feb 15, 2016

nsoranzo reviewed Feb 15, 2016
View reviewed changes

mvdbeek changed the title ~~Python 2/3 string handling~~ WIP: Python 2/3 string handling Feb 16, 2016

nsoranzo added status/WIP and removed status/review labels Feb 16, 2016

mvdbeek changed the title ~~WIP: Python 2/3 string handling~~ Python 2/3 string handling Feb 16, 2016

jmchilton added status/review and removed status/WIP labels Feb 16, 2016

mvdbeek force-pushed the python_2_3_string_handling branch from ea649ce to b593c1e Compare February 16, 2016 16:46

nsoranzo reviewed Feb 16, 2016
View reviewed changes

mvdbeek mentioned this pull request Feb 17, 2016

Convert tool_shed response to unicode #1703

Closed

nsoranzo reviewed Feb 17, 2016
View reviewed changes

mvdbeek force-pushed the python_2_3_string_handling branch 2 times, most recently from b76bd5d to f8caa3e Compare February 18, 2016 10:35

nsoranzo reviewed Feb 18, 2016
View reviewed changes

mvdbeek force-pushed the python_2_3_string_handling branch from adce0b4 to b55ee9a Compare February 19, 2016 10:30

mvdbeek force-pushed the python_2_3_string_handling branch from b55ee9a to 92fe009 Compare February 20, 2016 11:53

mvdbeek added 9 commits February 20, 2016 13:01

For python2/3 compatibility, replace basestring with six's string_typ…

a22af16

…es, unicode with six's text_type.

use six's text_type

fbc40f6

Use unicodify

da9cd12

need binary type

0028ac2

Do not test if text_type instance before calling unicodify()

8ee38f6

remove unncessary if statements

6653bcb

Indentation fixes, import json_fix from galaxy.util.json

66bce36

properly encode column_filter

f706ed7

fix bad rebase

92fe009

nsoranzo added a commit that referenced this pull request Feb 22, 2016

Merge pull request #1731 from mvdbeek/python_2_3_string_handling

1b25944

Python 2/3 string handling

nsoranzo merged commit 1b25944 into galaxyproject:dev Feb 22, 2016

mvdbeek deleted the python_2_3_string_handling branch April 8, 2016 14:57

dannon mentioned this pull request Apr 19, 2016

[16.04] Pages parser fix #2197

Merged

nsoranzo added the area/python3 Specific to Python 3 label Jul 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 2/3 string handling #1731

Python 2/3 string handling #1731

mvdbeek commented Feb 15, 2016

jmchilton Feb 15, 2016

mvdbeek Feb 15, 2016

mvdbeek Feb 15, 2016

nsoranzo Feb 15, 2016

jmchilton Feb 15, 2016

jmchilton commented Feb 15, 2016

nsoranzo Feb 15, 2016

mvdbeek Feb 15, 2016

nsoranzo Feb 15, 2016

mvdbeek Feb 16, 2016

nsoranzo commented Feb 15, 2016

nsoranzo Feb 15, 2016

mvdbeek Feb 15, 2016

nsoranzo Feb 15, 2016

mvdbeek commented Feb 16, 2016

nsoranzo commented Feb 16, 2016

nsoranzo Feb 16, 2016

mvdbeek Feb 16, 2016

nsoranzo Feb 17, 2016

mvdbeek Feb 17, 2016

nsoranzo Feb 17, 2016

nsoranzo Feb 18, 2016

mvdbeek Feb 18, 2016

nsoranzo Feb 18, 2016

mvdbeek Feb 19, 2016

nsoranzo Feb 19, 2016

nsoranzo commented Feb 19, 2016

mvdbeek commented Feb 20, 2016

nsoranzo commented Feb 22, 2016

mvdbeek commented Feb 22, 2016

Python 2/3 string handling #1731

Python 2/3 string handling #1731

Conversation

mvdbeek commented Feb 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmchilton commented Feb 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsoranzo commented Feb 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvdbeek commented Feb 16, 2016

nsoranzo commented Feb 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsoranzo commented Feb 19, 2016

mvdbeek commented Feb 20, 2016

nsoranzo commented Feb 22, 2016

mvdbeek commented Feb 22, 2016