Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dates for books with ISBNs with publish date < 1400 #9270

Closed
mekarpeles opened this issue May 14, 2024 · 1 comment
Closed

Remove dates for books with ISBNs with publish date < 1400 #9270

mekarpeles opened this issue May 14, 2024 · 1 comment
Assignees
Labels
Affects: Librarians Issues related to features that librarians particularly need. [managed] Lead: @scottbarnes Issues overseen by Scott (Community Imports) metadata Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Priority: 2 Important, as time permits. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Bug Something isn't working. [managed]

Comments

@mekarpeles
Copy link
Member

mekarpeles commented May 14, 2024

Problem

Some books that have ISBN have very old dates that may suggest a metadata inaccuracy. We should try to find a heuristic to remove dates from items that are clearly wrong (e.g. the date 1 in this set seems suspect)

Relevant URL(s)

https://openlibrary.org/search?q=publish_year%3A%5B*+TO+1400%5D+AND+isbn%3A*

Notes from this Issue's Lead

@judec and @hornc should have more details

Proposal & constraints

NB! Some works were published pre 1400s and then were re-published as modern editions with ISBN so we may want a way to address this.

Related files

Stakeholders

@mekarpeles mekarpeles added Type: Bug Something isn't working. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead Priority: 2 Important, as time permits. [managed] labels May 14, 2024
@mekarpeles mekarpeles changed the title Remove dates for books with ISBNs with publish date < 1500 Remove dates for books with ISBNs with publish date < 1400 May 14, 2024
@mekarpeles mekarpeles added Lead: @scottbarnes Issues overseen by Scott (Community Imports) Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels May 14, 2024
@mekarpeles mekarpeles added this to the Sprint 2024-05 milestone May 14, 2024
@mekarpeles mekarpeles added metadata Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Affects: Librarians Issues related to features that librarians particularly need. [managed] labels May 14, 2024
@scottbarnes
Copy link
Collaborator

4619 editions with pre-1400 publication years that had their dates removed:
pre-1400-publish-year-with-publisher--date-popped.jsonl.gz.

The criteria used were a publication year < 1400 and the record had a publisher listed. However, no publisher existed then, so these editions likely have incorrect dates. 123 editions were identified as Arabic by having /languages/ara or by the title field had >= 50% off its characters identified as Arabic unicode. These should be examined to see if their dates were likely entered in the format of the Solar Hijri calendar.
arabic_records.jsonl.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Librarians Issues related to features that librarians particularly need. [managed] Lead: @scottbarnes Issues overseen by Scott (Community Imports) metadata Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Priority: 2 Important, as time permits. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Bug Something isn't working. [managed]
Projects
None yet
Development

No branches or pull requests

2 participants