Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character set issues with non ASCII titles of papers #10

Open
jkonert opened this issue Jun 7, 2018 · 2 comments
Open

Character set issues with non ASCII titles of papers #10

jkonert opened this issue Jun 7, 2018 · 2 comments

Comments

@jkonert
Copy link

jkonert commented Jun 7, 2018

I have issues with German umlaut characters in the resultset. (äüöß) They are broken.
As there is no parameter for the charset I assumed the module google-scholar would use UTF8, but seems not.
Attached a sample screenshot of some of the broken titles.
2018-06-07 15_35_38-scholarsampleextractor

@hcientist
Copy link
Member

weird. thanks for pointing it out. it should be fairly simple to fix. PRs welcome (-;

@hcientist
Copy link
Member

Even weirder: the module we're using to make the request to google ( https://www.npmjs.com/package/request ) says,

encoding - encoding to be used on setEncoding of response data. If null, the body is returned as a Buffer. Anything else (including the default value of undefined) will be passed as the encoding parameter to toString() (meaning this is effectively utf8 by default). (Note: if you expect binary data, you should set encoding: null.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants