UnicodeEncodeError: 'latin-1' codec can't encode characters #1822

xjsender · 2013-12-20T02:56:35Z

Requests is the latest version.
When I try to post the data which contains Chinese character, this exception is thrown.

Traceback (most recent call last):
  File "X/threading.py", line 639, in _bootstrap_inner
  File "X/threading.py", line 596, in run
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\salesforce\api.py", line 546, in execute_anonymous
    headers=headers)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\api.py", line 88, in post
    return request('post', url, data=data, **kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\sessions.py", line 338, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\sessions.py", line 441, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\adapters.py", line 292, in send
    timeout=timeout
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\packages\urllib3\connectionpool.py", line 428, in urlopen
    body=body, headers=headers)
  File "C:\Users\Administrator\Dropbox\Sublime3056\Data\Packages\SublimeApex\requests\packages\urllib3\connectionpool.py", line 280, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "X/http/client.py", line 1049, in request
  File "X/http/client.py", line 1086, in _send_request
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1632-1633: ordinal not in range(256)

The text was updated successfully, but these errors were encountered:

sigmavirus24 · 2013-12-20T03:30:39Z

File "X/http/client.py"

Did you write X because that's a path to a local file? If so, your directory structure may be confusing urllib3. If not, then you should probably raise this with on bugs.python.org since this is not something I think requests should be handling. This looks like it's rising from httplib (or http on Python 3 which I'm guessing you're using).

xjsender · 2013-12-20T05:42:41Z

@sigmavirus24 ,

I used requests in sublime plugin, if the soap_body in below statement didn't contains any Chinese characters, there will be no exception.

response = requests.post(self.apex_url, soap_body, verify=False, headers=headers)

Lukasa · 2013-12-20T07:02:25Z

Firstly, unless you're using a different version of Sublime Apex to the one in their public repository, Requests is not the latest version, it's version 1.2.3. Can you tell me what version of Sublime Text you're using?

xjsender · 2013-12-20T08:06:37Z

It's sublime text 3056

Lukasa · 2013-12-20T08:45:48Z

So, ST 3, but not the most recent revision. Ok, that gives us something. Specifically, Sublime Text 3 uses Python 3.3, not Python 2.7 (which Sublime Text 2 used). This means all the default strings in Sublime Apex are unicode strings.

If you open up the Python 3.3 http.client file, you'll find that the _send_request() function looks like this:

# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = dict.fromkeys([k.lower() for k in headers])
skips = {}
if 'host' in header_names:
    skips['skip_host'] = 1
if 'accept-encoding' in header_names:
    skips['skip_accept_encoding'] = 1

self.putrequest(method, url, **skips)

if body is not None and ('content-length' not in header_names):
    self._set_content_length(body)
for hdr, value in headers.items():
    self.putheader(hdr, value)
if isinstance(body, str):
    # RFC 2616 Section 3.7.1 says that text default has a
    # default charset of iso-8859-1.
    body = body.encode('iso-8859-1')
self.endheaders(body)

Now, ISO-8859-1 is an alias for Latin-1, which is the codec we're having trouble with. The problem we've got is that Sublime Apex is providing a unicode string body to Requests, which httplib needs to encode into bytes. Taking the default from RFC 2616, it concludes you want Latin-1, which doesn't include any Chinese characters. Clearly then, encoding fails, and you get the exception in question.

Considering that Sublime Apex claims in the headers it sends to be sending UTF-8 encoded data (which is a lie currently), Sublime Apex wants to be encoding the data as UTF-8 before sending it. This means any line sending data (in this case line 545 of salesforce/api.py) should read like this:

response = requests.post(self.apex_url, soap_body.encode('utf-8'), verify=False, headers=headers)

For the sake of anyone else who wants to confirm my diagnosis, here's a quick bit of sample code that confirms the problem:

a = "\u13E0\u19E0\u1320"
a.encode('latin1')  # Throws UnicodeEncodeError, proves that this can't be expressed in ISO-8859-1.
a.encode('utf-8')  # Totally fine.
r = requests.post('http://httpbin.org/post', data=a)  # Using unicode string, throws UnicodeEncodeError blaming Latin1.
r = requests.post('http://httpbin.org/post', data=a.encode('utf-8'))  # Works fine.

Thanks for raising this with us, but this is not a Requests bug. =)

xjsender · 2013-12-23T02:16:43Z

Thanks.

wuminmin · 2018-04-19T07:40:20Z

r = requests.post('http://httpbin.org/post', data=a.encode('utf-8'))
very usefull,
thank you!

Lukasa closed this as completed Dec 20, 2013

b3b mentioned this issue Dec 1, 2018

UnicodeEncodeError: 'latin-1' codec can't encode characters b3b/ipython-restmagic#2

Closed

kobayashi mentioned this issue Apr 29, 2020

Webhooks UTF-8 trouble netbox-community/netbox#4549

Closed

angelajt mentioned this issue Jan 30, 2021

fix UnicodeEncodeError and some formatting inconsistencies JEF1056/Jade_T5#1

Merged

ChoiByungWook mentioned this issue Mar 4, 2021

Batch transform fails with character encoding error in local mode aws/sagemaker-python-sdk#2165

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError: 'latin-1' codec can't encode characters #1822

UnicodeEncodeError: 'latin-1' codec can't encode characters #1822

xjsender commented Dec 20, 2013

sigmavirus24 commented Dec 20, 2013

xjsender commented Dec 20, 2013

Lukasa commented Dec 20, 2013

xjsender commented Dec 20, 2013

Lukasa commented Dec 20, 2013

xjsender commented Dec 23, 2013

wuminmin commented Apr 19, 2018

UnicodeEncodeError: 'latin-1' codec can't encode characters #1822

UnicodeEncodeError: 'latin-1' codec can't encode characters #1822

Comments

xjsender commented Dec 20, 2013

sigmavirus24 commented Dec 20, 2013

xjsender commented Dec 20, 2013

Lukasa commented Dec 20, 2013

xjsender commented Dec 20, 2013

Lukasa commented Dec 20, 2013

xjsender commented Dec 23, 2013

wuminmin commented Apr 19, 2018