Character set issues with non ASCII titles of papers #10

jkonert · 2018-06-07T13:38:49Z

I have issues with German umlaut characters in the resultset. (äüöß) They are broken.
As there is no parameter for the charset I assumed the module google-scholar would use UTF8, but seems not.
Attached a sample screenshot of some of the broken titles.

hcientist · 2018-06-07T13:41:06Z

weird. thanks for pointing it out. it should be fairly simple to fix. PRs welcome (-;

hcientist · 2018-06-07T13:46:56Z

Even weirder: the module we're using to make the request to google ( https://www.npmjs.com/package/request ) says,

encoding - encoding to be used on setEncoding of response data. If null, the body is returned as a Buffer. Anything else (including the default value of undefined) will be passed as the encoding parameter to toString() (meaning this is effectively utf8 by default). (Note: if you expect binary data, you should set encoding: null.)

hcientist added the help wanted label Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character set issues with non ASCII titles of papers #10

Character set issues with non ASCII titles of papers #10

jkonert commented Jun 7, 2018

hcientist commented Jun 7, 2018

hcientist commented Jun 7, 2018

Character set issues with non ASCII titles of papers #10

Character set issues with non ASCII titles of papers #10

Comments

jkonert commented Jun 7, 2018

hcientist commented Jun 7, 2018

hcientist commented Jun 7, 2018