-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set encoding of the response string, if specified by server. #278
Conversation
The encoding of the response string is always reported as 'ASCII-8BIT', even when the underlying byte sequence is valid UTF-8 and the server advertises it as UTF-8 in the Content-Type header. With this commit, the Content-Type header in the server response is checked for a charset defintion and the response string encoding set accordingly. If there is a mismatch between the advertised and the actual encoding, the response encoding is left as 'ASCII-8BIT'. The following irb session illustrates the problem: ``` irb(main):001:0> require 'restclient' => true irb(main):002:0> response = RestClient.get 'http://www.goldenerunkelruebe.de'; response.code => 200 irb(main):003:0> response.encoding => #<Encoding:ASCII-8BIT> irb(main):004:0> response.headers[:content_type] => "text/html; charset=UTF-8" ```
This looks pretty good, thanks! I think it should probably wait for the next minor release for backwards-compatibility, though. |
Cool! Will I have to do anything for that or will you just merge it when it is time for the next minor release? |
I haven't had a chance to look too closely, but I think it should just be good to go. I've tagged it 1.8.0 so hopefully I won't forget about it, but feel free to bump as 1.8.0 gets closer. Right now I'm focusing on just getting a 1.7.0 release out to get some new momentum. |
Okay, thanks for the heads-up! |
@chrismo did some additional research into how other gems do this here: lostisland/faraday#139 For reference: |
👍 |
The encoding of the response string is always reported as 'ASCII-8BIT',
even when the underlying byte sequence is valid UTF-8 and the server
advertises it as UTF-8 in the Content-Type header.
With this commit, the Content-Type header in the server response is
checked for a charset defintion and the response string encoding set
accordingly. If there is a mismatch between the advertised and the
actual encoding, the response encoding is left as 'ASCII-8BIT'.
The following irb session illustrates the problem: