Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSGI strings should be decoded with ISO-8859-1 on Python 3 #138

Closed
voroninman opened this issue Aug 3, 2018 · 5 comments
Closed

WSGI strings should be decoded with ISO-8859-1 on Python 3 #138

voroninman opened this issue Aug 3, 2018 · 5 comments

Comments

@voroninman
Copy link

voroninman commented Aug 3, 2018

Problem

When running my Django app with bjoern and navigating to an URL with characters outside of ISO-8859-1 charset it fails with error:

Traceback (most recent call last):
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 145, in __call__
    request = self.request_class(environ)
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 69, in __init__
    path_info = get_path_info(environ)
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 162, in get_path_info
    path_info = get_bytes_from_wsgi(environ, 'PATH_INFO', '/')
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 210, in get_bytes_from_wsgi
    return value.encode('iso-8859-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0457' in position 1: ordinal not in range(256)

Probably the root cause

On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). [1]

[1] https://www.python.org/dev/peps/pep-3333/#unicode-issues

Expected

As it's implemented in wsgiref.

def application(environ, start_response):
    start_response('200 OK', [])
    print('PATH_INFO:', environ['PATH_INFO'])
    yield b'OK'

from wsgiref.simple_server import make_server
httpd = make_server('0.0.0.0', 8080, application)
httpd.serve_forever()

curl localhost:8080/%C3%A5 prints PATH_INFO: /Ã¥.

Actual

def application(environ, start_response):
    start_response('200 OK', [])
    print('PATH_INFO:', environ['PATH_INFO'])
    yield b'OK'

import bjoern
bjoern.run(application, '0.0.0.0', 8080)

curl localhost:8080/%C3%A5 prints PATH_INFO: /å.

@jonashaag
Copy link
Owner

Thanks for this great bug report!

Do you also happen to know how this should be fixed?

@voroninman
Copy link
Author

voroninman commented Aug 3, 2018

I'm not an C expert but I can have a look. So far I ended up writing a WSGI middleware to temporary address the issue:

from wsgi import application
import bjoern

class FixBjoernEncoding:

    def __init__(self, app):
        self._app = app

    def __call__(self, environ, start_response):
        environ['PATH_INFO'] = environ.get('PATH_INFO', '/')\
            .encode('utf8').decode('latin-1')
        return self._app(environ, start_response)

bjoern.run(FixBjoernEncoding(application), '0.0.0.0', 8080)

@jonashaag
Copy link
Owner

Thanks, I'll fix this in the next days.

@jonashaag
Copy link
Owner

Please have a look!

@voroninman
Copy link
Author

Works like a charm. 🎉🎉🎉

Thanks a lot! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants