Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unusual url string crashs is py3 #4168

Closed
ReimarBauer opened this issue Jun 19, 2017 · 4 comments
Closed

unusual url string crashs is py3 #4168

ReimarBauer opened this issue Jun 19, 2017 · 4 comments

Comments

@ReimarBauer
Copy link

I installed the current master by pip install zipfile in a recent python3 conda environment

base_url = 'http://............127.0.0.1:8082'
request.get(base_url)
crashes

and ends with an UnidodeError
python3.6/encodings/idna.py",
line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

May be you can catch this?

@Lukasa
Copy link
Member

Lukasa commented Jun 19, 2017

For posterity, the complete traceback is this:

>>> requests.get(base_url)
Traceback (most recent call last):
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cory/Documents/Python/requests_org/requests/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/cory/Documents/Python/requests_org/requests/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/cory/Documents/Python/requests_org/requests/requests/sessions.py", line 493, in request
    prep.url, proxies, stream, verify, cert
  File "/Users/cory/Documents/Python/requests_org/requests/requests/sessions.py", line 666, in merge_environment_settings
    env_proxies = get_environ_proxies(url, no_proxy=no_proxy)
  File "/Users/cory/Documents/Python/requests_org/requests/requests/utils.py", line 692, in get_environ_proxies
    if should_bypass_proxies(url, no_proxy=no_proxy):
  File "/Users/cory/Documents/Python/requests_org/requests/requests/utils.py", line 676, in should_bypass_proxies
    bypass = proxy_bypass(netloc)
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 2616, in proxy_bypass
    return proxy_bypass_macosx_sysconf(host)
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 2593, in proxy_bypass_macosx_sysconf
    return _proxy_bypass_macosx_sysconf(host, proxy_settings)
  File "/Users/cory/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 2566, in _proxy_bypass_macosx_sysconf
    hostIP = socket.gethostbyname(hostonly)
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)

I don't think there's much we can do about this. The error is coming out of the standard library (specifically, in the urllib proxy_bypass function). It's present only on Python 3, which feels the need to call socket.gethostbyname. This function will automatically IDNA-encode a unicode hostname, even in situations like this where it's simply not necessary, and its IDNA encoder correctly rejects this.

The only way we can fix this is by moving to a much smarter URL handling implementation that normalizes URLs in some form. The best candidate is hyperlink, but hyperlink also barfs on this for a similar reason (it tries to IDNA-encode and fails).

This means that at best we could fix this by extending hyperlink with a URL host normalizer and then handle it. However, the WHATWG URL specification also appears to forbid this form of URL. If it does I'm not sure why, as Chrome normalizes it (though Safari does not).

Given the amount of work required to do this, I don't see any reason to tolerate it. The URL is just spectacularly far away from anything that can reasonably be expected to work, so I'm inclined to just close this as a won't fix.

@Lukasa Lukasa closed this as completed Jun 19, 2017
@johnpaulhayes
Copy link

I'm encountering this for a URL of the format:

https://key:secret@example.com/path/file.json

and length of 132 characters.

@ablack-jpl
Copy link

@johnpaulhayes That's still not an issue with the requests library, but as I'm also running into it I figure I'll drop an update.

It's not the total length of the url that seems to do it, just a section of it. The idna encoder seems to break on urls when the first part of the host name is greater than 64 characters long. For whatever reason, it's including the key and secret in there as well. So either avoid python3 or avoid long "key:secret@example" strings (likely by avoiding long api keys) until the underlying functions are fixed. I submitted a bug for it to the python tracker yesterday.

vrajmohan pushed a commit to fecgov/fec-eregs that referenced this issue Apr 12, 2018
Due to psf/requests#4168, using https://user:password@url makes the
URL too long and results in a UnicodeError: "label empty or too long".

The workaround is to avoid specifying it in the URL and to use an
alternate mechanism of supplying credentials e.g. .netrc.
@brunsgaard
Copy link

For those interested in the issue on the python side
https://bugs.python.org/issue32958

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants