Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError on Windows system #1018

Open
Mickychen00 opened this issue Apr 17, 2023 · 7 comments
Open

UnicodeDecodeError on Windows system #1018

Mickychen00 opened this issue Apr 17, 2023 · 7 comments
Assignees
Labels
bug Something isn't working Windows

Comments

@Mickychen00
Copy link

Mickychen00 commented Apr 17, 2023

I encountered a UnicodeDecodeError when trying to call an R plot code using rpy2 on my Windows PC. The error message read "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 81: invalid continuation byte". Strangely, the same code worked well on my Mac laptop.

Luckily, I found a potential solution to this problem in a simplified Chinese blog post. The author pointed out a bug in the rpy2\rinterface_lib\conversion.py file and provided a revised version of the code that solves the issue.

Original code:

def _cchar_to_str(c, encoding: str) -> str:
    # TODO: use isString and installTrChar
    s = ffi.string(c).decode(encoding)
    return s


def _cchar_to_str_with_maxlen(c, maxlen: int, encoding: str) -> str:
    # TODO: use isString and installTrChar
    s = ffi.string(c, maxlen).decode(encoding)
    return s

Revised code:

def _cchar_to_str(c, encoding: str) -> str:
    # TODO: use isString and installTrChar
    try:
        s = ffi.string(c).decode(encoding)
    except Exception as e:
        s = ffi.string(c).decode('GBK')
    
    return s
 
 
def _cchar_to_str_with_maxlen(c, maxlen: int, encoding: str) -> str:
    # TODO: use isString and installTrChar
    try:
        s = ffi.string(c, maxlen).decode(encoding)
    except Exception as e:
        s = ffi.string(c, maxlen).decode("GBK")
 
    return s

Once I implemented the author's revised code, the UnicodeDecodeError disappeared and the code worked perfectly. However, I don't fully understand why this solution works. Can anyone suggest a better solution? Thank you!

@Mickychen00 Mickychen00 added the bug Something isn't working label Apr 17, 2023
@lgautier
Copy link
Member

Thanks. Can you post the python traceback shown when the error occurs?

@lgautier
Copy link
Member

lgautier commented Apr 21, 2023

Also, what do you get when running

import sys
print(sys.getdefaultencoding())

?

@lgautier
Copy link
Member

Alternatively, can you check whether PR #1020 solves the issue?

@lgautier lgautier self-assigned this Apr 22, 2023
@Mickychen00
Copy link
Author

Hi, thanks for you hints. Because I am on a trip abroad, I will give it a try in two days and give you a feedback.

@Mickychen00
Copy link
Author

Hi. the current results of sys.getdefaultencoding() are "utf-8". A strange phenomenon is that when I go back to the origin version of "conversion.py" it seems that my PC still works. How can I test this PR in my local PC machine?

@lgautier
Copy link
Member

lgautier commented May 6, 2023

@Mickychen00 - You will have to install from source, and a clone of the PR branch.

There is plenty of git documentation available about cloning a branch. The exact procedure might depend on your git client. With the git CLI it should look like:

git clone -b consistent_r2py_str_encoding-issue1018 --single-branch git@github.com:rpy2/rpy2.git

You'll have install rpy2 (e.g., pip install consistent_r2py_str_encoding-issue1018/) for that cloned repos.

@wbvguo
Copy link

wbvguo commented Jan 24, 2024

The issue is not windows-specific. I got the same error on Ubuntu 20.04.4 LTS. rpy2 version '3.5.15'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Windows
Projects
None yet
Development

No branches or pull requests

3 participants