-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr should normalize ISBNs #609
Comments
There are ISBN and LCCN normalization filters for SOLR here: |
@salmman-bhai
|
@LeadSongDog You said both dupe and not dupe at different times. I agree that #49 is a separate issue (data updates vs search). This issue is very specific - indexing and searching for ISBNs. Let's not extend it to include all things ISBN related. Fixing the original issue, as framed by @cdrini, would provide useful benefit for the users. |
Is the fix going to be to (a) create a proper isbn normalization function (that accommodates isbn's with text in them, etc, as noted in #1194), (b) on setting an isbn, first normalizes it before storing it, and (c) on search, first normalizes it before searching? @cdrini - I'd like to change the title of this issue to "Properly normalize incoming isbn prior to update or search" and mark #49 as a duplicate. Acceptable? |
The root cause of #49 might be this, but I have no clue what LC Bot is/does, so not certain. The fix would basically be to update our solr config and do a full reindex, but we don't have a flow for a full reindex at the moment (that's what #1843 is working on). I'll rename it to "Solr should normalize ISBNs". |
@cdrini Is still an issue? I didn't explicitly try to fix this issue, but I think some of recent work and refactoring around ISBN lookups, and search improvements to apply isbn conversions consistently may have fixed this. Now when I do a category ALL search for https://openlibrary.org/search?q=isbn:978-84-00-04725-2 I am redirected straight to https://openlibrary.org/books/OL3849512M/Hombres_y_documentos_de_la_filosofi%CC%81a_espan%CC%83ola which shows that both hyphen stripping and isbn13<->isbn10 conversion is happening on search. |
Oh sweet! That's awesome! Hmmm; so when is the normalization process happening when you query for |
https://openlibrary.org/search?q=isbn%3A%22978-84-00-04725-2%22&mode=everything works! Looks done to me. |
(Note technically this should be done on solr directly (not in the python); Solr actually has processors for ISBNs that handle all this stuff. But we can deal with it later; I'm content closing this) |
Currently, searches like
isbn:"978-84-00-04725-2"
yield no results because of the '-'.isbn
is defined in the schema.xml file to be of typestring
, which means they are indexed and queried verbatim. They should probably have their own field type which handles these.The text was updated successfully, but these errors were encountered: