Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accented characters? � � � #118

Closed
phstc opened this issue Nov 26, 2011 · 17 comments
Closed

accented characters? � � � #118

phstc opened this issue Nov 26, 2011 · 17 comments

Comments

@phstc
Copy link

phstc commented Nov 26, 2011

Hi!

I'm trying to scrap a web page with accented characters á é ó ú ê ã etc. I tried encoding: utf-8, but I'm still getting this ��� characters in the result.

 request.get({
      uri: url,
      encoding: 'utf-8'
      // ...
@thejh
Copy link
Contributor

thejh commented Nov 30, 2011

Well, what's the encoding the page uses? You can't just throw a utf8 parser at ISO-whatever.

@thejh
Copy link
Contributor

thejh commented Nov 30, 2011

@phstc
Copy link
Author

phstc commented Nov 30, 2011

@thejh the page encoding is iso-8859-1, I've also tried:

request.get({
    uri: url,
    encoding: 'iso-8859-1'
    // ...

and I got:

Error: Unknown encoding

But I reading this issue #27 then http://nodejs.org/docs/v0.6.0/api/http.html#request.setEncoding

Set the encoding for the request body. Either 'utf8' or 'binary'. Defaults to null, which means that the 'data' event will emit a Buffer object..

It worked.

@thejh
Copy link
Contributor

thejh commented Nov 30, 2011

Have a look at the iconv library.

@phstc
Copy link
Author

phstc commented Nov 30, 2011

Okay... but do you know why binary worked?

@phstc phstc closed this as completed Nov 30, 2011
@phstc phstc reopened this Nov 30, 2011
@thejh
Copy link
Contributor

thejh commented Nov 30, 2011

Because it just takes the raw buffers data. Also, the string still isn't utf8, so don't do it.

@phstc
Copy link
Author

phstc commented Nov 30, 2011

But in this case, what is the proper value for encoding?

@thejh
Copy link
Contributor

thejh commented Nov 30, 2011

No encoding. Take it as a buffer, then stuff it into iconv.

@mikeal
Copy link
Member

mikeal commented Nov 30, 2011

the confusion appears to be over "binary" and Buffer, which is also binary.

"binary" is, mostly, a legacy encoding from the node 0.1.x days where we encoded all binary in to strings.

in node.js 0.2 we got a Buffer object, which is a raw allocation of memory outside of v8's heap. the object is not a string, and can hold raw binary data you get out of a file descriptor and send it to another file descriptor without suffering conversion to string.

in request, you can pipe() a request object to any stream and all the buffers will be sent to the destination stream. if all you're doing is taking binary data from an http request and sending to a file, socket, or http response, you should just use pipe().

@phstc
Copy link
Author

phstc commented Dec 3, 2011

How can I use pipe with request module?

@mikeal
Copy link
Member

mikeal commented Dec 3, 2011

@phstc
Copy link
Author

phstc commented Dec 3, 2011

@mikeal Awesome!

I need to scrap more than one URL in the same HTTP request (it's a webapp) and then send all these data to the response.

I can't send it like that:

request.get({
        uri: url1
}).pipe(res);

request.get({
        uri: ur2
}).pipe(res);

Is there any other way to do it instead of

var writeStream = fs.createWriteStream('./output');
request.get({
        uri: url1
}).pipe(writeStream);

request.get({
        uri: url2
}).pipe(writeStream);

// after all pipes finish I send writeStream content to the response

?

@phstc
Copy link
Author

phstc commented Dec 5, 2011

Which stream can I use with pipe?

@mikeal
Copy link
Member

mikeal commented Feb 18, 2012

you can use any Stream :)

HTTP Server responses, you can use it as the body of another request object, you can open a file write stream. anything :)

@vkygil
Copy link

vkygil commented Feb 10, 2018

request({url: "www.example.com", encoding: "latin1"}, function (error, response, html) {
console.log('error:', error);

@edujr1
Copy link

edujr1 commented Feb 23, 2018

@vickygill69 Thanks, your answer resolve my problem

@ricardovf
Copy link

setting encoding to null and then using the response buffer with iconv worked for me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants