Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dash - for entering ISBN-13 when submitting an edition of a book #4343

Closed
serv opened this issue Jan 1, 2021 · 17 comments · Fixed by #4403
Closed

Allow dash - for entering ISBN-13 when submitting an edition of a book #4343

serv opened this issue Jan 1, 2021 · 17 comments · Fixed by #4403
Labels
Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed]

Comments

@serv
Copy link

serv commented Jan 1, 2021

Describe the problem that you'd like solved

When entering ISBN-13 when submitting an edition of a book, can you allow dashes? Currently only numbers allowed.
numbersonly

When I try to copy over the value from Amazon which has a dash in the value, the UI throws an error.
amazon

Proposal & Constraints

Allow dashes

Additional context

It seems like dashes are also allowed by https://www.isbn-13.info/

Stakeholders

@serv serv added the Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] label Jan 1, 2021
@Sabreen-Parveen
Copy link
Collaborator

@serv I would like to work on this issue

@BrittanyBunk
Copy link
Contributor

that would be awesome and then it can automatically convert it to a number without the hyphen

@Sabreen-Parveen
Copy link
Collaborator

@serv instead of adding only dashes, I think we should add a proper validation function to check whether it satisfies the condition of isbn 13 or not as shown in the following snippet:

def is_isbn13(n):
  n = n.replace('-','').replace(' ', '')
  if len(n) != 13:
      return False
  product = (sum(int(ch) for ch in n[::2]) 
            + sum(int(ch) * 3 for ch in n[1::2]))
  return product % 10 == 0

@BrittanyBunk
Copy link
Contributor

@Sabreen-Parveen that looks great - adding the validation check after the hyphen/spaces are gone. I would actually recommend adding a comma and period to the list of characters to remove, as sometimes when copying/pasting, people may accidentally highlight those. Also you could add in letters too (except for X) as ones to remove, as maybe someone is putting that into the copy/paste too. The rest of the characters are fine to give a warning (unless I missed one).

@LeadSongDog
Copy link

It’s great to test that it is a numerically-allowed isbn13, but not every such number corresponds to a published book. That can only be tested by searching for libraries holding it or vendors with available inventory.

@BrittanyBunk
Copy link
Contributor

@LeadSongDog I would also say it would be ridiculous for someone to wait while the computer verifies if their ISBN is valid.

What really is better is if someone puts in an ISBN (10 or 13), that the one they don't have magically appears. The reason is that it's so hard to look for a book with just one ISBN - whether it's the searcher or the search engine. To me, that's way more important than caring if the ISBN is correct or not - as that can be corrected. I may be wrong here, but from the user side - having more is better than less - as you can always remove, but being without during that time is hard.

I'm not saying that your issue isn't valid, I'm saying the first priority is having 2 ISBNs at the same time and then the second priority is to have a crawler double check the ISBNs. I personally use BookFinder for all my stuff.

@cclauss
Copy link
Collaborator

cclauss commented Jan 5, 2021

@Sabreen-Parveen My sense is that we have two operations.

  1. n = n.replace('-','').replace(' ', '')
  2. the remaining ISBN validation logic.

Please put these into two separate PRs because:

  1. is trivial but will make life easier for users so we should land it quickly.
  2. is cool but might overlap with validation logic that we import from https://pypi.org/project/isbnlib/ so this might take longer to get reviewed and land.

@BrittanyBunk
Copy link
Contributor

@cclauss good idea, as bookfinder doesn't have a way to download all their isbn's.

@BrittanyBunk
Copy link
Contributor

I just wanted to say I'm really glad people are thinking about and are working on this issue. It's a common issue that I bet a lot of people face.

@Sabreen-Parveen
Copy link
Collaborator

I have noticed in the add books page there is no checking whether it has 13 digits or not so, we can simply add the isbn and the input element accepts the value. Should I add validation function in add books page also? The function will only check if the isbn has 13 digits or not and check for the dashes too.

@BrittanyBunk
Copy link
Contributor

@Sabreen-Parveen everywhere is preferable - provided the dashes don't appear in what the public sees - it's important to keep the data consistent and concise for the data dumps - all those extra dashes are going to only increase the gb's there, which will make it much harder to use and store. It also costs more money.

@tfmorris
Copy link
Contributor

I don't think LeadSongDog was suggesting that the lookup actually be done, but before calling things "ridiculous," it would probably be worth weighing the benefit vs the cost. If an invalid ISBN could be flagged in 100 msec, that'd be totally worth the wait.

This whole thing seems to have gone very far astray though. Dashes are actually good not only because they allow for cut-and-paste of real world data, but because they present the information in the form that users are familiar with. In my opinion, they should be preserved. The only place they need to be removed (or normalized) is in the normalized search field -- and there only needs to be one of those, you don't need both 10 & 13 digit forms since they're equivalent.

So, the protocol should be:

  • accept 10 digit or 13 digit ISBN with or without dashes and store it as entered
  • warn on, but don't disallow (since they're sometimes printed wrong), ISBNs with invalid check digits
  • build the search index based on a normalized ISBN 13 without dashes
  • when searching convert the user entered ISBN to the normalized form for search

And just for clarity, 50M ISBNs with a few dashes each is <0.2 GB and fractions of a penny.

This work should be directed by someone that understands ISBNs, their usage in publishing, and search technology.

@cclauss
Copy link
Collaborator

cclauss commented Jan 13, 2021

If an invalid ISBN could be flagged in 100 msec, that'd be totally worth the wait.

Agreed.

This work should be directed by someone that understands ISBNs, their usage in publishing, and search technology.

Who would you recommend?

@tfmorris
Copy link
Contributor

@seabelis understands ISBNs. I'm not sure who you'd look to for search, but 95% of what you need is in my 2017 comment #609 (comment)

Between the ol-tech mailing list, Github issues, etc there's plenty of institutional knowledge about how to do this correctly. It's just not present in this thread, so I'm trying to keep things from being hijacked.

@LeadSongDog
Copy link

@tfmorris There are several reasonable options for external validation of ISBNs. It may be worth comparing response times. Possibilities include Google, Baidu, and other generic search engine giants, but we should consider more specific options too:
https://isbnsearch.org/isbn/9788400047252
https://www.worldcat.org/search?q=bn%3A9788400047252

@BrittanyBunk
Copy link
Contributor

BrittanyBunk commented Jan 13, 2021

@tfmorris I think there's some confusion, but I'm referring to the end-product, that the dashes don't appear in the data sets (unrelated to the website). I didn't bring up everything (as I don't want to derail, as I could, but I won't), but what I said can be problematic making everyone aware of every angle while working on this is supposed to help us stay on track.

@seabelis
Copy link
Collaborator

Strongly agree with @tfmorris "warn on, but don't disallow (since they're sometimes printed wrong), ISBNs with invalid check digits".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants