Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for exact title with different encoding fails #6059

Closed
MignonBelongie opened this issue Jan 19, 2022 · 2 comments · Fixed by #7445
Closed

Search for exact title with different encoding fails #6059

MignonBelongie opened this issue Jan 19, 2022 · 2 comments · Fixed by #7445
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 2 Important, as time permits. [managed] Type: Bug Something isn't working. [managed]

Comments

@MignonBelongie
Copy link

If the title of a work has characters with diacritics, searching for it with a string that corresponds to exactly the same title but with a different unicode representation fails.

Evidence / Screenshot (if possible)

This is very hard to demo, because it involves strings which appear identical, and whose encoding can be (and often is) changed by browsers, editors, etc. Therefore, I have created a Colab notebook that demonstrates the error: https://colab.research.google.com/drive/1NiKOD0Md_nR7bPbHXGW4lUnyCbc0sbJ5

Since @cdrini is already aware of the issue, I hope this is sufficient.

Stakeholders

@cdrini

@MignonBelongie MignonBelongie added Needs: Lead Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Type: Bug Something isn't working. [managed] labels Jan 19, 2022
@tfmorris
Copy link
Contributor

To save others having to go and read Python code, the two URLs are:

https://openlibrary.org/search.json?title=Des+filles+bien+élevées (single accented character)
https://openlibrary.org/search.json?title=Des+filles+bien+élevées (non-spacing combining accent)

but even worse

https://openlibrary.org/search.json?title=Des+filles+bien+elevees

returns 0 results.

This is a long standing (over a decade) problem for which there is a well known solution.

Here are some of the related tickets: #11, #149, #178, https://bugs.launchpad.net/openlibrary/+bug/598204

@jimchamp jimchamp added Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] and removed Needs: Lead labels Jan 24, 2022
@cdrini cdrini added Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 2 Important, as time permits. [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] labels Apr 11, 2022
@tfmorris
Copy link
Contributor

On testing.openlibrary.org, all three examples above return the same three works, so this is hopefully close to being resolved by the fix for #7040.

https://testing.openlibrary.org/search.json?title=Des+filles+bien+élevées (single accented character)
https://testing.openlibrary.org/search.json?title=Des+filles+bien+élevées (non-spacing combining accent)
https://testing.openlibrary.org/search.json?title=Des+filles+bien+elevees (no accents)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Priority: 2 Important, as time permits. [managed] Type: Bug Something isn't working. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants