Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr should normalize ISBNs #609

Closed
cdrini opened this issue Oct 26, 2017 · 11 comments
Closed

Solr should normalize ISBNs #609

cdrini opened this issue Oct 26, 2017 · 11 comments
Assignees
Labels
Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Bug Something isn't working. [managed]
Projects

Comments

@cdrini
Copy link
Collaborator

cdrini commented Oct 26, 2017

Currently, searches like isbn:"978-84-00-04725-2" yield no results because of the '-'. isbn is defined in the schema.xml file to be of type string, which means they are indexed and queried verbatim. They should probably have their own field type which handles these.

@LeadSongDog
Copy link

Duplicates #49 and related to #27 #142

@tfmorris
Copy link
Contributor

There are ISBN and LCCN normalization filters for SOLR here:
https://github.com/mlibrary/umich_solr_library_filters

@tfmorris tfmorris added the Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] label Mar 10, 2018
@LeadSongDog
Copy link

LeadSongDog commented Mar 22, 2018

@salmman-bhai
This overlaps #49 but doesn't dupe it.

  1. Yes we should correct the data type so hyphens won't matter, but
  2. also make the isbn more useful in unifying publishers by making them block sequential/sortable
  3. Add checks for existing uses of an isbn
    3a. before creating new editions and
    3b. while deduping existing ones
  4. Vett that the proposed isbn has an extant external target eg via query at worldcat

@tfmorris
Copy link
Contributor

@LeadSongDog You said both dupe and not dupe at different times. I agree that #49 is a separate issue (data updates vs search).

This issue is very specific - indexing and searching for ISBNs. Let's not extend it to include all things ISBN related. Fixing the original issue, as framed by @cdrini, would provide useful benefit for the users.

@mekarpeles mekarpeles added the Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] label Apr 2, 2019
@brad2014
Copy link
Collaborator

brad2014 commented May 3, 2019

Is the fix going to be to (a) create a proper isbn normalization function (that accommodates isbn's with text in them, etc, as noted in #1194), (b) on setting an isbn, first normalizes it before storing it, and (c) on search, first normalizes it before searching?

@cdrini - I'd like to change the title of this issue to "Properly normalize incoming isbn prior to update or search" and mark #49 as a duplicate. Acceptable?

@brad2014 brad2014 added Type: Bug Something isn't working. [managed] isbn labels May 3, 2019
@cdrini
Copy link
Collaborator Author

cdrini commented May 3, 2019

The root cause of #49 might be this, but I have no clue what LC Bot is/does, so not certain.

The fix would basically be to update our solr config and do a full reindex, but we don't have a flow for a full reindex at the moment (that's what #1843 is working on). I'll rename it to "Solr should normalize ISBNs".

@cdrini cdrini changed the title Solr should index and query ISBNs without '-' Solr should normalize ISBNs May 3, 2019
@brad2014 brad2014 added Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] and removed Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] labels May 10, 2019
@hornc
Copy link
Collaborator

hornc commented Jul 25, 2019

@cdrini Is still an issue? I didn't explicitly try to fix this issue, but I think some of recent work and refactoring around ISBN lookups, and search improvements to apply isbn conversions consistently may have fixed this.

Now when I do a category ALL search for isbn:978-84-00-04725-2

https://openlibrary.org/search?q=isbn:978-84-00-04725-2

I am redirected straight to https://openlibrary.org/books/OL3849512M/Hombres_y_documentos_de_la_filosofi%CC%81a_espan%CC%83ola which shows that both hyphen stripping and isbn13<->isbn10 conversion is happening on search.

@hornc hornc added the Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] label Jul 25, 2019
@cdrini
Copy link
Collaborator Author

cdrini commented Jul 26, 2019

Oh sweet! That's awesome! Hmmm; so when is the normalization process happening when you query for isbn:978-84-00-04725-2? in work_search?

@xayhewalo xayhewalo added this to Un-Triaged in Triage Oct 20, 2019
@hornc
Copy link
Collaborator

hornc commented Nov 12, 2019

@cdrini do you agree we can close this issue? I think the "fix" for this is what broke #2623 :)

@cdrini
Copy link
Collaborator Author

cdrini commented Nov 12, 2019

@cdrini cdrini closed this as completed Nov 12, 2019
Triage automation moved this from Un-Triaged to Closed Nov 12, 2019
@cdrini
Copy link
Collaborator Author

cdrini commented Nov 12, 2019

(Note technically this should be done on solr directly (not in the python); Solr actually has processors for ISBNs that handle all this stuff. But we can deal with it later; I'm content closing this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Solr Issues related to the configuration or use of the Solr subsystem. [managed] Needs: Review This issue/PR needs to be reviewed in order to be closed or merged (see comments). [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Bug Something isn't working. [managed]
Projects
No open projects
Triage
  
Closed
Development

No branches or pull requests

6 participants