You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using Python3, when creating a GFF FeatureNode from scratch, setting its "source" field with fn.set_source() method results in the storage of a string that is possibly twice encoded.
If the user requests the source data back with fn.get_source() method, what the user gets back is a proper string that falsely resembles a "bytes" object. Attempting to decode the object results in an attribute error because the object does not have a "decode" attribute.
The object is already a string, but it retains (as a string) the format b'text' which looks like a bytes object. (See code below)
I assume the source logic is attempting to be transparent to both python2 and python3. I did not test with python2.
example python3.8 code "test_newfeat_source.py" follows
##Output from above code under Ubuntu's system installed python 3.8
$ python3 test_newfeat_source.py
['/home/testy/Documents/work', '/home/testy/Documents/source/gt/gtpython', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages']
FeatureNode(start=100, end=900, seqid="foo") b'BAR' gene
FeatureNode(start=100, end=200, seqid="foo") b'BAR' exon
FeatureNode(start=800, end=900, seqid="foo") b'BAR' exon
..After reformating source gff3 field...
The type of obj "source" is <class 'str'>
FeatureNode(start=100, end=900, seqid="foo") BAR gene
The type of obj "source" is <class 'str'>
FeatureNode(start=100, end=200, seqid="foo") BAR exon
The type of obj "source" is <class 'str'>
FeatureNode(start=800, end=900, seqid="foo") BAR exon
What GenomeTools version are you reporting an issue for (as output by gt -version)?
I am using GenomeTools 1.6.1 installed by downloading precompiled binary and python libs from GenomeTools.org (single tar.gz package)
$ python3 -V
Python 3.8.5
$ which python3
/usr/bin/python3
$ echo $PYTHONPATH
/home/testy/Documents/source/gt/gtpython
$ /home/testy/Documents/source/gt/bin/gt --version
/home/testy/Documents/source/gt/bin/gt (GenomeTools) 1.6.1
Copyright (c) 2003-2016 G. Gremme, S. Steinbiss, S. Kurtz, and CONTRIBUTORS
Copyright (c) 2003-2016 Center for Bioinformatics, University of Hamburg
See LICENSE file or http://genometools.org/license.html for license details.
Did you compile GenomeTools from source? If so, please state the make parameters used.
Same downloaded distro reports:
Used compiler: cc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Compile flags: -g -Wall -Wunused-parameter -pipe -fPIC -Wpointer-arith -Wno-unknown-pragmas -O3 -m32 -Werror
What operating system (e.g. Ubuntu, Mac OS X), OS version (e.g. 15.10, 10.11) and platform (e.g. x86_64) are you using?
Ubuntu 20.04 LTS
The text was updated successfully, but these errors were encountered:
The UTF8 decoding/encoding everywhere in the code is indeed a result of me trying to connect the C GtStrs with the correct representations in Python 2 and 3. Please keep in mind that this was written when Python 3 was just beginning to appear on the horizon in reality. I would be happy to get some pointers regarding this from someone who writes Python more regularly than I do ;)
We also have the option of removing Python 2 compatibility altogether, as it's been deprecated for quite a while now and Debian, for instance, doesn't even ship it anymore. Any opinions on that? I wouldn't mind simplifying things this way.
I'd prefer to work on this with a bit more time, as I don't really have a lot of experience with the UTF encoding implications in the Python versions and I would like to get 1.6.2 with some other bugfixes out first.
Problem description
Dear @satta :
Using Python3, when creating a GFF FeatureNode from scratch, setting its "source" field with fn.set_source() method results in the storage of a string that is possibly twice encoded.
If the user requests the source data back with fn.get_source() method, what the user gets back is a proper string that falsely resembles a "bytes" object. Attempting to decode the object results in an attribute error because the object does not have a "decode" attribute.
The object is already a string, but it retains (as a string) the format b'text' which looks like a bytes object. (See code below)
I assume the source logic is attempting to be transparent to both python2 and python3. I did not test with python2.
example python3.8 code "test_newfeat_source.py" follows
##Output from above code under Ubuntu's system installed python 3.8
What GenomeTools version are you reporting an issue for (as output by
gt -version
)?I am using GenomeTools 1.6.1 installed by downloading precompiled binary and python libs from GenomeTools.org (single tar.gz package)
Did you compile GenomeTools from source? If so, please state the
make
parameters used.Same downloaded distro reports:
What operating system (e.g. Ubuntu, Mac OS X), OS version (e.g. 15.10, 10.11) and platform (e.g. x86_64) are you using?
Ubuntu 20.04 LTS
The text was updated successfully, but these errors were encountered: