Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response.body is ASCII-8BIT when Content-Type is text/html; charset=UTF-8 #580

Open
pirj opened this issue Oct 30, 2017 · 5 comments
Open

Comments

@pirj
Copy link

pirj commented Oct 30, 2017

> response = Typhoeus::Request.get('http://ridingtheclutch.com/post/113889750070/on-the-virtues-of-contracting', follow_location: true)
> puts response.headers['Content-Type'], response.body.encoding
text/html; charset=UTF-8
ASCII-8BIT

Tried with 1.1.2, 1.3.0, and edge master.

There were similar issues with other HTTP libraries as well, see this comment.

Does this relate to ethon or typhoeus itself?

@philsturgeon
Copy link

Hey, so I'm suffering this issue at work. I also am not 100% if technically this is easier to deal with in Typheous or Ethon, but as Typheous is higher level it seems to theoretically be the appropriate place.

Like many issues, this was first noticed in Faraday, who very fairly shrugged and passed the buck onto the adapters. Some adapters handle this, and some do not. Typhoeus does not.

Patron handles this quite elegantly IMO.

em-http-request has some good tests for their equivalent functionality.

I really want to continue using typheous due to the amazing hydra, but this is messing things up for one of our teams and they're having to switch. Luckily nobody is running into this issue that would also benefit from async...

Anyway, if you want this in Typheous give me a thumbs up, and I'll get to work on it.

@Alexxfrolov
Copy link

Are there any news for this issue ?

@andremedeiros
Copy link
Member

@Alexxfrolov are you seeing this? Is there any chance you could get a failing test case that displays this behavior? That way we could work towards implementing a fix.

@VladYermakov
Copy link

VladYermakov commented Jan 21, 2020

Hi guys, we've got a very weird error because of this issue.
In short we're using the result of Typhoeus request with Feedjira and latter fails to parse a feed because of wrong body encoding.
Encoding::CompatibilityError (incompatible character encodings: UTF-8 and ASCII-8BIT)

I've rechecked if it really because of that with forcing encoding to UTF-8 and everything worked fine. So I'm sure that it was because of this issue.

Can you please take a look into this error and invest some time to fix it?

P.S. Here is the example of code which fails

request = Typhoeus.get('https://www.historynet.com/feed', headers: { 'User-Agent' => 'Ruby' })
rss = Feedjira.parse(request.response_body)

Please pay no attention to changed User-Agent, it was done because site blocks Typhoeus User-Agent for unknown reason

P.P.S. this issue is a bit different from original one, because our response is not text/html but application/rss+xml

jasonschroeder-sfdc added a commit to jasonschroeder-sfdc/CocoaPods-Core that referenced this issue May 13, 2020
There is a bug in Typhoeus. See
typhoeus/typhoeus#580
This will force the response_body to always be UTF-8.
jasonschroeder-sfdc added a commit to jasonschroeder-sfdc/CocoaPods-Core that referenced this issue May 14, 2020
There is a bug in Typhoeus. See
typhoeus/typhoeus#580
This will force the response_body to always be UTF-8.

Some response_body are coming back with encoding=ASCII-8BIT but are not writable to
disk, with errors like this:

```
[!] CDN: trunk Repo update failed - 31 error(s):
"\xE2" from ASCII-8BIT to UTF-8
```
jasonschroeder-sfdc added a commit to jasonschroeder-sfdc/CocoaPods-Core that referenced this issue May 14, 2020
There is a bug in Typhoeus. See
typhoeus/typhoeus#580
This will force the response_body to always be UTF-8.

Some response_body are coming back with encoding=ASCII-8BIT but are not writable to
disk, with errors like this:

```
[!] CDN: trunk Repo update failed - 31 error(s):
"\xE2" from ASCII-8BIT to UTF-8
```
guillett added a commit to betagouv/rdv-service-public that referenced this issue Oct 7, 2020
There is a bug in Typhoeus. See
typhoeus/typhoeus#580
This will force the response_body to always be UTF-8.

Some response_body are coming back with encoding=ASCII-8BIT but are not writable to
sentry, with errors like this:
adipasquale added a commit to betagouv/rdv-service-public that referenced this issue Oct 7, 2020
* Ensure HTTP errors in web hook logic are reported (#913)

* Force encoding to UTF-8 in webhook response bodies

There is a bug in Typhoeus. See
typhoeus/typhoeus#580
This will force the response_body to always be UTF-8.

Some response_body are coming back with encoding=ASCII-8BIT but are not writable to
sentry, with errors like this:

Co-authored-by: Thomas Guillet <guillet.thomas@gmail.com>
@semaperepelitsa
Copy link

Example of how it is implemented for Net HTTP adapter: lostisland/faraday-net_http#6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants