Roadmapping 2018 Q2 #845

mekarpeles · 2018-03-12T00:09:39Z

@LeadSongDog, @hornc, @skylerbunny, @cdrini, @salman-bhai, @bfalling, @whatisgalen

If you can add your top 5-10 issues or features you believe should be prioritized in Q2 (preferably w/ github issue #'s)...
https://docs.google.com/spreadsheets/d/10Cp2xwcLY4NqN5gvDz6Jf8dcsz_eI5cF8lF6ZzfiskI/edit#gid=0

We'll be discussing these points this upcoming Tuesday on zoom @ 11:30pm PT during our weekly community call: https://zoom.us/j/369477551

P.S. @anandology, @EdwardBetts, @rajbot, @mouse-reeve, et al -- if you have features (or votes) you'd like to push for, please feel free to add them to the spreadsheet!

Next Tuesday, we'll take the results of the spreadsheet and fill out https://github.com/internetarchive/openlibrary/projects/6

LeadSongDog · 2018-03-12T05:16:13Z

Several issues are more or less blockers for author cleanup and deduping. These should be getting more attention. Only after authority is deduped will it be feasible to dedupe works and editions.

mekarpeles · 2018-03-12T05:29:09Z

@LeadSongDog do you know which issues those are?

GerardMeijssen · 2018-03-12T06:15:39Z

The BioDiversity Heritage Library has its content in the Internet Archive. I expect that IA is aware of the BHL identifiers. We are adding these identifiers to Wikidata and are disambiguating them in the process. Many of these authors have books as well in the Open Library so there are OL identifiers. The request is to share identifiers so that both our work in disambiguation is optimized.
The OL identifiers of Freebase become possible to import into Wikidata in one batch. When this happens, please run the process so that we can have the latest OL identifier and have them disambiguated.
The basic data for authors like date of birth and date of death .. please update the information to what Wikidata holds.. It is a service to your readers and we will celebrate your use of our data
Can we please have a list of all the identifiers for books by authors who have a Wikidata identifier? Included should be a name, an authorID, a LoC identifier or ISBN. Objective is to load all of them into Wikidata and seek a wider audience.
Thanks,
GerardM

mekarpeles · 2018-03-12T08:20:54Z

@GerardMeijssen I'm not sure exactly what is required for point #1 -- it would be most helpful and get the most exposure if each of these were opened as separate issues which we can add to our triage spreadsheet.

Who is coordinating this Freebase batch import? When/how will we learn when this happens?

re: author info, when you say, "please update the information to what Wikidata holds" are you talking about synchronizing the keys? Or pulling in wikidata values into OL?

re: "Can we please have a list of all the identifiers for books by authors who have a Wikidata identifier?", is the request for a 1-time data dump (e.g. our existing monthly authors dump)? Or an API to retrieve all authors with wikidata IDs?

GerardMeijssen · 2018-03-12T09:33:10Z

The first thing is to expose BHL identifiers in combination with IA / OL identifiers.
You are disambiguating and so is Wikidata.

When we add a BHL identifier it would be good to have a method to add your (IA or OL) identifiers.
When there are multiple links for the same author, further processing is our standard disambiguation process (in place for OL).
It is for the Biodiversity Heritage Library to consider what their policy is for disambiguation.. In the mean time we do the donkey work. We can and will invite people to help when these processes are in place.

GerardMeijssen · 2018-03-12T09:34:14Z

The Freebase import is on my radar. When this is done, I will ask (Charles ?) to run the update functionality.

GerardMeijssen · 2018-03-12T09:39:54Z

As to updating the information at OL, I ask for you to import information like date of birth and date of death. Particularly when Wikidata has info and you don't it will be an improvement for the OL readers.

When you have information where we do not or where there is a difference, we appreciate a list so that we can curate Wikidata.

GerardMeijssen · 2018-03-12T12:40:07Z

By importing the books for the authors we have in common, at Wikidata we will have the information to enable people to read books from the OL .. We do not necessarily need a dump, what I can do is get authorisation for running a bot.. having you run a bot makes the collaboration even more prominent

tfmorris · 2018-03-12T14:43:00Z

I've moved mine from a comment into the spreadsheet now that I have edit rights. I'll update them with issue numbers, etc later, although they align with what @LeadSongDog.

BHL - I'm opposed to promoting them until they give at least minimal credit to IA/OL which is is the source of 99% of their data. The only reason that BHL identifiers even exist is that they mint them to make the connection to IA opaque.
Freebase - Any OL related data from that source needs to be used with great care, because it dates from a period before any author dedupe had been done, so the OLIDs can point to redirects, deleted records, etc (I say this as someone who's worked with Freebase since early 2010).
Duplicating data in general - I think we should have a more general discussion about this before we starting copying data to and fro. Having multiple, duplicate, editable data stores makes the reconciliation problem very difficult. I'd be much more tempted to push as much of this as possible to Wikidata and just pull things from there when available. This includes identifiers, biographical info like birth/death dates, and a whole host of other data.

GerardMeijssen · 2018-03-12T15:16:33Z

Hoi, When we link Wikidata OL / IA for BHL, we will gain a lot of friends in many libraries worldwide. The BHL is already very happy that we are including data into Wikidata and they will be supremely happy when we together provide them with an even better service. Yes, the data of Freebase is stale. It is exactly why the inclusion will be synchronised with Charles because we will want to update Wikidata afterwards with the latest and greatest information of OL / IA. This is easier than the manual process that is happening now. Once this is done, there will be no new stale information in Wikidata. This makes this a win for everyone involved. The only really important thing in what we do is linking identifiers. This is key, not the associated data. I am very happy when the IA and OL find use for the Wikidata data. However, it is for them to decide what to do. Our data is freely available. For me it is key that we collaborate and share a mission of bringing more and better information, get people to read is (for me personally) a dream come true. Thanks, GerardM

…

On 12 March 2018 at 15:43, Tom Morris ***@***.***> wrote: I've moved mine from a comment into the spreadsheet now that I have edit rights. I'll update them with issue numbers, etc later, although they align with what @LeadSongDog <https://github.com/leadsongdog>. - BHL - I'm opposed to promoting them until they give at least minimal credit to IA/OL which is is the source of 99% of their data. The only reason that BHL identifiers even exist that they mint them to make the connection to IA opaque. - Freebase - Any OL related data from that source needs to be used with great care, because it dates from a period before *any* author dedupe had been done, so the OLIDs can point to redirects, deleted records, etc (I say this as someone who's worked with Freebase since early 2010). - Duplicating data in general - I think we should have a more general discussion about this before we starting copying data to and fro. Having multiple, duplicate, editable data stores makes the reconciliation problem very difficult. I'd be much more tempted to push as much of this as possible to Wikidata and just pull things from there when available. This includes identifiers, biographical info like birth/death dates, and a whole host of other data. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#845 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AdQumH6ZtofFI4Z1Urz44JBB5yXABVUnks5tdol3gaJpZM4Sl7Ho> .

LeadSongDog · 2018-03-12T17:37:45Z

Issues with authority:
#790, #757, #756, #714, #699, #669, #667. #604, #513, #498, #486, #366, #352, #351, #349, #178, #149, #145, #89, #77

tfmorris · 2018-03-12T17:49:28Z

Issues with authority:

Those look like they're mostly issues with author records and/or author search. "authority" is an archaic librarian's term rooted in their belief that they're in charge of everything. (Not that I have an issue with authority. :-) ) I added # signs to all the issue numbers so they'll act as hot links.

LeadSongDog · 2018-03-12T20:29:21Z

Well, I think of authority simply as answers to "Who authored what?", but https://www.loc.gov/standards/mads/mads-doc.html says:

The element is a container that includes a standardized "authoritative" form of an agent (person or organization), an event, a title, or a term (topic, genre, geographic). The authority container may only be repeated to give multiple authoritative forms in different languages or scripts

The geographicSubdivision attribute can be used with the element to indicate whether or not a concept can append a geographic facet, such as the name of a country or other jurisdiction, region, or geographic feature. This information is important to some controlled vocabularies, such as LCSH. For vocabularies to which this does not apply, the attribute would not be used. The geographicSubdivision attribute is comparable to MARC Authority 008/06, and can carry the following values:

none - no geographic facet applies
direct - a geographic facet may be applied without its larger geographic entity
indirect - a geographic facet may be applied with the name of its larger geographic entity
not applicable - a geographic facet is not appropriate

Then, https://www.loc.gov/standards/sourcelist/name-title.html says a bunch more...
Personally I have no issues with authority either, so long as I'm the authority :-)

mekarpeles · 2018-03-13T07:00:44Z

@here -- reminder, this week's Tuesday community call @ 11:30am PT we'll be having 2018 Q2 Planning.

Join the call
https://zoom.us/j/369477551

Please nominate issues for Q2
https://github.com/internetarchive/openlibrary/projects/7

Browse open issues
https://github.com/internetarchive/openlibrary/issues

Last quarter's goals; Q1
https://github.com/internetarchive/openlibrary/projects/3

Evolving project board for Q2
https://github.com/internetarchive/openlibrary/projects/6

mekarpeles · 2018-03-13T07:54:53Z

@tfmorris sorry to make life difficult. Instead of using the gdoc spreadsheet, I've moved all our Q2 nominations to this board: https://github.com/internetarchive/openlibrary/projects/7

Your list is currently
Search alternate names
Search (dedupe req.)
Search I18N (UX & dedupe req)
Data quality (author dedupe to start)
Improved UX

Where applicable, can you please add existing issue cards to your name on that board? And or create issues where necessary for them? Thank you!! I've already added author search dupes (848) to your list. Note, I've also added internationalization (i18n) elsewhere on the board but issues apparently can only belong to a single column (as heads up, just so we don't re-create those existing issues)

@GerardMeijssen if you can do the same, that would be a huge help. You mentioned 3 or so points above -- if each of these can be turned into an issue with the correct context, we can add it to the Q2 planning board

@LeadSongDog I've already added / migrated all the issues you nominated -- thank you!

@hornc + @cdrini + @bfalling if you two can also update the Q2 planning board w/ your issue nominations, it would be a big help!

https://github.com/internetarchive/openlibrary/projects/7

sbshah97 · 2018-03-13T16:24:45Z

salman-bhai [9:53 PM]
I'd like to add these Issues to my Board

Export public reading logs (public and private) as json,csv #830 Export public reading logs (public and private) as json,csv #830
Graceful Error should occur when user w/ MAX LOANS attempts borrow #439 Graceful Error should occur when user w/ MAX LOANS attempts borrow #439
Remove deprecated backticks from Python code #846 Remove deprecated backticks from Python code #846
Add ability to import reading log / bookshelves from Goodreads #835 (Slightly challenging) Add ability to import reading log / bookshelves from Goodreads #835

In addition to these kindly add these Issues as well, if they are not to be closed(these pertain to Recaptcha v2)

In addition to this I'd like to work on Docker with Charles. Not sure an Issue has been created for that as of now!

sbshah97 · 2018-03-13T16:26:09Z

Also @mekarpeles @hornc

I think you can remove these from the Board. They're almost done! I'm just awaiting Merge for the two for now!

mekarpeles · 2018-03-14T00:53:23Z

@GerardMeijssen I'll try to add the issues you listed in this thread

mekarpeles · 2018-03-14T00:55:30Z

Here's a 1st draft consolidated/prioritized list of Q2 goals
https://github.com/internetarchive/openlibrary/projects/7

We'll continue to integrate author + search related issues as we have a better idea where to spray the firehose

Next Tuesday we'll followup to discuss the final prioritized list. Closing this issue for now! Thanks everyone for adding your issues to the board.

mekarpeles closed this as completed Mar 14, 2018

mekarpeles mentioned this issue Mar 14, 2018

Normalize Unicode #149

Closed

tfmorris mentioned this issue Feb 18, 2019

Scrape enhanced metadata from BHL #1902

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmapping 2018 Q2 #845

Roadmapping 2018 Q2 #845

mekarpeles commented Mar 12, 2018 •

edited

LeadSongDog commented Mar 12, 2018

mekarpeles commented Mar 12, 2018

GerardMeijssen commented Mar 12, 2018 •

edited

mekarpeles commented Mar 12, 2018 •

edited

GerardMeijssen commented Mar 12, 2018

GerardMeijssen commented Mar 12, 2018

GerardMeijssen commented Mar 12, 2018 •

edited

GerardMeijssen commented Mar 12, 2018

tfmorris commented Mar 12, 2018 •

edited

GerardMeijssen commented Mar 12, 2018 via email

LeadSongDog commented Mar 12, 2018 •

edited by tfmorris

tfmorris commented Mar 12, 2018

LeadSongDog commented Mar 12, 2018

mekarpeles commented Mar 13, 2018 •

edited

mekarpeles commented Mar 13, 2018 •

edited

sbshah97 commented Mar 13, 2018 •

edited

sbshah97 commented Mar 13, 2018

mekarpeles commented Mar 14, 2018

mekarpeles commented Mar 14, 2018

Roadmapping 2018 Q2 #845

Roadmapping 2018 Q2 #845

Comments

mekarpeles commented Mar 12, 2018 • edited

LeadSongDog commented Mar 12, 2018

mekarpeles commented Mar 12, 2018

GerardMeijssen commented Mar 12, 2018 • edited

mekarpeles commented Mar 12, 2018 • edited

GerardMeijssen commented Mar 12, 2018

GerardMeijssen commented Mar 12, 2018

GerardMeijssen commented Mar 12, 2018 • edited

GerardMeijssen commented Mar 12, 2018

tfmorris commented Mar 12, 2018 • edited

GerardMeijssen commented Mar 12, 2018 via email

LeadSongDog commented Mar 12, 2018 • edited by tfmorris

tfmorris commented Mar 12, 2018

LeadSongDog commented Mar 12, 2018

mekarpeles commented Mar 13, 2018 • edited

mekarpeles commented Mar 13, 2018 • edited

sbshah97 commented Mar 13, 2018 • edited

sbshah97 commented Mar 13, 2018

mekarpeles commented Mar 14, 2018

mekarpeles commented Mar 14, 2018

mekarpeles commented Mar 12, 2018 •

edited

GerardMeijssen commented Mar 12, 2018 •

edited

mekarpeles commented Mar 12, 2018 •

edited

GerardMeijssen commented Mar 12, 2018 •

edited

tfmorris commented Mar 12, 2018 •

edited

LeadSongDog commented Mar 12, 2018 •

edited by tfmorris

mekarpeles commented Mar 13, 2018 •

edited

mekarpeles commented Mar 13, 2018 •

edited

sbshah97 commented Mar 13, 2018 •

edited