Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace get_encoding_from_headers default with None #5084

Open
jvanasco opened this issue May 10, 2019 · 0 comments
Open

replace get_encoding_from_headers default with None #5084

jvanasco opened this issue May 10, 2019 · 0 comments

Comments

@jvanasco
Copy link
Contributor

jvanasco commented May 10, 2019

This issue a suggestion for handling partial elements of issues #1737 and #2086 (#1774 and others too). I'm bringing this up in a separate ticket because it is a specific fix for multiple tickets and I am willing to write a PR for it in requests3 and requests2. (i would have just gone and generated a PR, but I can't figure out what the actual master for requests3 actually is)

my proposal is to do the following (some of which have already been discussed as possibilities):

alter get_encoding_from_headers()

replace the default value of get_encoding_from_headers with None when "text" is in the content_type header. currently this is "ISO-8859-1"

https://github.com/kennethreitz/requests/blob/c501ec986daa4961cd9dee370b5d45ff2e524b37/requests/utils.py#L492-L493

add Response.detected_encoding attribute

extend build_response and Response to additionally stash the header decoding in response.detected_encoding. this will allow the detected value to remain on the response object if response.encoding is changed by the consumer.

add Response.text_encoding property

add a text_encoding property to Response , which offloads some of the logic now in text, and will return "ISO-8859-1"

https://github.com/kennethreitz/requests/blob/c501ec986daa4961cd9dee370b5d45ff2e524b37/requests/models.py#L835-L858

alter Response.text to use the new .text_encoding attribute

update the text property to use the new .text_encoding attribute. this just ties together all the above.

Goals

The goal of these changes is to positively identify situations where no response encoding was declared by the server. The backwards incompatibility is that None will now be returned when inspecting response.encoding when there is no encoding for the response - however the effective encoding used to generate .text will be available in response.text_encoding .

This still maintains compliance with RFC 2616, because .text will still interpret the lack of a declared encoding as having the required default charset of ISO-8859-1. This simply allows developers to know when ISO-8859-1 is being used in a default or explicit manner.

This approach doesn't address all aspects of those issues or close those tickets, but it offers a solution to many of the needs and concerns with some minor changes.

@jvanasco jvanasco changed the title replace get_encoding_from_headers default with None replace get_encoding_from_headers default with None May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant