Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmapping 2018 Q2 #845

Closed
mekarpeles opened this issue Mar 12, 2018 · 19 comments
Closed

Roadmapping 2018 Q2 #845

mekarpeles opened this issue Mar 12, 2018 · 19 comments

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Mar 12, 2018

@LeadSongDog, @hornc, @skylerbunny, @cdrini, @salman-bhai, @bfalling, @whatisgalen

If you can add your top 5-10 issues or features you believe should be prioritized in Q2 (preferably w/ github issue #'s)...
https://docs.google.com/spreadsheets/d/10Cp2xwcLY4NqN5gvDz6Jf8dcsz_eI5cF8lF6ZzfiskI/edit#gid=0

We'll be discussing these points this upcoming Tuesday on zoom @ 11:30pm PT during our weekly community call: https://zoom.us/j/369477551

P.S. @anandology, @EdwardBetts, @rajbot, @mouse-reeve, et al -- if you have features (or votes) you'd like to push for, please feel free to add them to the spreadsheet!

Next Tuesday, we'll take the results of the spreadsheet and fill out https://github.com/internetarchive/openlibrary/projects/6

@LeadSongDog
Copy link

Several issues are more or less blockers for author cleanup and deduping. These should be getting more attention. Only after authority is deduped will it be feasible to dedupe works and editions.

@mekarpeles
Copy link
Member Author

@LeadSongDog do you know which issues those are?

@GerardMeijssen
Copy link

GerardMeijssen commented Mar 12, 2018

  • The BioDiversity Heritage Library has its content in the Internet Archive. I expect that IA is aware of the BHL identifiers. We are adding these identifiers to Wikidata and are disambiguating them in the process. Many of these authors have books as well in the Open Library so there are OL identifiers. The request is to share identifiers so that both our work in disambiguation is optimized.

  • The OL identifiers of Freebase become possible to import into Wikidata in one batch. When this happens, please run the process so that we can have the latest OL identifier and have them disambiguated.

  • The basic data for authors like date of birth and date of death .. please update the information to what Wikidata holds.. It is a service to your readers and we will celebrate your use of our data

  • Can we please have a list of all the identifiers for books by authors who have a Wikidata identifier? Included should be a name, an authorID, a LoC identifier or ISBN. Objective is to load all of them into Wikidata and seek a wider audience.
    Thanks,
    GerardM

@mekarpeles
Copy link
Member Author

mekarpeles commented Mar 12, 2018

@GerardMeijssen I'm not sure exactly what is required for point #1 -- it would be most helpful and get the most exposure if each of these were opened as separate issues which we can add to our triage spreadsheet.

Who is coordinating this Freebase batch import? When/how will we learn when this happens?

re: author info, when you say, "please update the information to what Wikidata holds" are you talking about synchronizing the keys? Or pulling in wikidata values into OL?

re: "Can we please have a list of all the identifiers for books by authors who have a Wikidata identifier?", is the request for a 1-time data dump (e.g. our existing monthly authors dump)? Or an API to retrieve all authors with wikidata IDs?

@GerardMeijssen
Copy link

The first thing is to expose BHL identifiers in combination with IA / OL identifiers.
You are disambiguating and so is Wikidata.

  • When we add a BHL identifier it would be good to have a method to add your (IA or OL) identifiers.

  • When there are multiple links for the same author, further processing is our standard disambiguation process (in place for OL).

  • It is for the Biodiversity Heritage Library to consider what their policy is for disambiguation.. In the mean time we do the donkey work. We can and will invite people to help when these processes are in place.

@GerardMeijssen
Copy link

The Freebase import is on my radar. When this is done, I will ask (Charles ?) to run the update functionality.

@GerardMeijssen
Copy link

GerardMeijssen commented Mar 12, 2018

As to updating the information at OL, I ask for you to import information like date of birth and date of death. Particularly when Wikidata has info and you don't it will be an improvement for the OL readers.

When you have information where we do not or where there is a difference, we appreciate a list so that we can curate Wikidata.

@GerardMeijssen
Copy link

By importing the books for the authors we have in common, at Wikidata we will have the information to enable people to read books from the OL .. We do not necessarily need a dump, what I can do is get authorisation for running a bot.. having you run a bot makes the collaboration even more prominent

@tfmorris
Copy link
Contributor

tfmorris commented Mar 12, 2018

I've moved mine from a comment into the spreadsheet now that I have edit rights. I'll update them with issue numbers, etc later, although they align with what @LeadSongDog.

  • BHL - I'm opposed to promoting them until they give at least minimal credit to IA/OL which is is the source of 99% of their data. The only reason that BHL identifiers even exist is that they mint them to make the connection to IA opaque.

  • Freebase - Any OL related data from that source needs to be used with great care, because it dates from a period before any author dedupe had been done, so the OLIDs can point to redirects, deleted records, etc (I say this as someone who's worked with Freebase since early 2010).

  • Duplicating data in general - I think we should have a more general discussion about this before we starting copying data to and fro. Having multiple, duplicate, editable data stores makes the reconciliation problem very difficult. I'd be much more tempted to push as much of this as possible to Wikidata and just pull things from there when available. This includes identifiers, biographical info like birth/death dates, and a whole host of other data.

@GerardMeijssen
Copy link

GerardMeijssen commented Mar 12, 2018 via email

@LeadSongDog
Copy link

LeadSongDog commented Mar 12, 2018

Issues with authority:
#790, #757, #756, #714, #699, #669, #667. #604, #513, #498, #486, #366, #352, #351, #349, #178, #149, #145, #89, #77

@tfmorris
Copy link
Contributor

Issues with authority:

Those look like they're mostly issues with author records and/or author search. "authority" is an archaic librarian's term rooted in their belief that they're in charge of everything. (Not that I have an issue with authority. :-) ) I added # signs to all the issue numbers so they'll act as hot links.

@LeadSongDog
Copy link

Well, I think of authority simply as answers to "Who authored what?", but https://www.loc.gov/standards/mads/mads-doc.html says:

The element is a container that includes a standardized "authoritative" form of an agent (person or organization), an event, a title, or a term (topic, genre, geographic). The authority container may only be repeated to give multiple authoritative forms in different languages or scripts

The geographicSubdivision attribute can be used with the element to indicate whether or not a concept can append a geographic facet, such as the name of a country or other jurisdiction, region, or geographic feature. This information is important to some controlled vocabularies, such as LCSH. For vocabularies to which this does not apply, the attribute would not be used. The geographicSubdivision attribute is comparable to MARC Authority 008/06, and can carry the following values:

none - no geographic facet applies
direct - a geographic facet may be applied without its larger geographic entity
indirect - a geographic facet may be applied with the name of its larger geographic entity
not applicable - a geographic facet is not appropriate

Then, https://www.loc.gov/standards/sourcelist/name-title.html says a bunch more...
Personally I have no issues with authority either, so long as I'm the authority :-)

@mekarpeles
Copy link
Member Author

mekarpeles commented Mar 13, 2018

@here -- reminder, this week's Tuesday community call @ 11:30am PT we'll be having 2018 Q2 Planning.

Join the call
https://zoom.us/j/369477551

Please nominate issues for Q2
https://github.com/internetarchive/openlibrary/projects/7

Browse open issues
https://github.com/internetarchive/openlibrary/issues

Last quarter's goals; Q1
https://github.com/internetarchive/openlibrary/projects/3

Evolving project board for Q2
https://github.com/internetarchive/openlibrary/projects/6

@mekarpeles
Copy link
Member Author

mekarpeles commented Mar 13, 2018

@tfmorris sorry to make life difficult. Instead of using the gdoc spreadsheet, I've moved all our Q2 nominations to this board: https://github.com/internetarchive/openlibrary/projects/7

Your list is currently
Search alternate names
Search (dedupe req.)
Search I18N (UX & dedupe req)
Data quality (author dedupe to start)
Improved UX

Where applicable, can you please add existing issue cards to your name on that board? And or create issues where necessary for them? Thank you!! I've already added author search dupes (848) to your list. Note, I've also added internationalization (i18n) elsewhere on the board but issues apparently can only belong to a single column (as heads up, just so we don't re-create those existing issues)

@GerardMeijssen if you can do the same, that would be a huge help. You mentioned 3 or so points above -- if each of these can be turned into an issue with the correct context, we can add it to the Q2 planning board

@LeadSongDog I've already added / migrated all the issues you nominated -- thank you!

@hornc + @cdrini + @bfalling if you two can also update the Q2 planning board w/ your issue nominations, it would be a big help!

https://github.com/internetarchive/openlibrary/projects/7

@sbshah97
Copy link
Contributor

Also @mekarpeles @hornc

  1. Improve Display of Wikidata Identifier on Edition Page #811
  2. Update wikipedia citation output #817

I think you can remove these from the Board. They're almost done! I'm just awaiting Merge for the two for now!

@mekarpeles
Copy link
Member Author

@GerardMeijssen I'll try to add the issues you listed in this thread

@mekarpeles
Copy link
Member Author

Here's a 1st draft consolidated/prioritized list of Q2 goals
https://github.com/internetarchive/openlibrary/projects/7

We'll continue to integrate author + search related issues as we have a better idea where to spray the firehose

Next Tuesday we'll followup to discuss the final prioritized list. Closing this issue for now! Thanks everyone for adding your issues to the board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants