You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maremma seems to be suffering from this UTF8 bug lostisland/faraday#139
Basically Excon does not properly encode the string as UTF8. This causes the string to be parsed as ASCII and then stripped of its special characters in the parse_response method.
<?xml version="1.0" encoding="UTF-8"?>
<crossref_result xmlns="http://www.crossref.org/qrschema/3.0" version="3.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/qrschema/3.0 http://www.crossref.org/schemas/crossref_query_output3.0.xsd">
<query_result>
<head>
<doi_batch_id>none</doi_batch_id>
</head>
<body>
<query status="resolved">
<doi type="journal_article">10.1038/nature14474</doi>
<crm-item name="publisher-name" type="string">Springer Science and Business Media LLC</crm-item>
<crm-item name="prefix-name" type="string">Springer Science and Business Media LLC</crm-item>
<crm-item name="member-id" type="number">297</crm-item>
<crm-item name="citation-id" type="number">75327788</crm-item>
<crm-item name="journal-id" type="number">3415</crm-item>
<crm-item name="deposit-timestamp" type="number">20191101103854578</crm-item>
<crm-item name="owner-prefix" type="string">10.1038</crm-item>
<crm-item name="last-update" type="date">2019-11-01T11:11:21Z</crm-item>
<crm-item name="created" type="date">2015-05-12T15:48:08Z</crm-item>
<crm-item name="citedby-count" type="number">290</crm-item>
<doi_record>
<crossref xmlns="http://www.crossref.org/xschema/1.1" xsi:schemaLocation="http://www.crossref.org/xschema/1.1 http://doi.crossref.org/schemas/unixref1.1.xsd">
<journal>
<journal_metadata language="en">
<full_title>Nature</full_title>
<abbrev_title>Nature</abbrev_title>
<issn media_type="print">0028-0836</issn>
<issn media_type="electronic">1476-4687</issn>
</journal_metadata>
<journal_issue>
<publication_date media_type="print">
<month>6</month>
<year>2015</year>
</publication_date>
<journal_volume>
<volume>522</volume>
</journal_volume>
<issue>7554</issue>
</journal_issue>
<journal_article publication_type="full_text">
<titles>
<title>Observation of the rare Bs0 ?????+????? decay from the combined analysis of CMS and LHCb data</title>
</titles>
...
I think there are lots of ways to solve this, but here are two suggestions
Force the encoding
Maremma.class_eval do
def self.parse_response(string, options = {})
string = string.dup
string =
if options[:skip_encoding]
string
else
string.force_encoding('utf-8').encode(
Encoding.find("UTF-8"),
invalid: :replace,
undef: :replace,
replace: "?"
)
end
return string if options[:raw]
from_json(string) || from_xml(string) || from_string(string)
end
end
Note the addtion of force_encoding('utf-8')
faraday-encoding middleware
Another option would be to use the faraday-encoding middleware. That's probably a less blunt solution, but I didn't try implementing it. https://github.com/ma2gedev/faraday-encoding
The text was updated successfully, but these errors were encountered:
Maremma seems to be suffering from this UTF8 bug lostisland/faraday#139
Basically Excon does not properly encode the string as UTF8. This causes the string to be parsed as ASCII and then stripped of its special characters in the parse_response method.
Example:
I think there are lots of ways to solve this, but here are two suggestions
Force the encoding
Note the addtion of
force_encoding('utf-8')
faraday-encoding middleware
Another option would be to use the faraday-encoding middleware. That's probably a less blunt solution, but I didn't try implementing it. https://github.com/ma2gedev/faraday-encoding
The text was updated successfully, but these errors were encountered: