-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmapping 2018 Q2 #845
Comments
Several issues are more or less blockers for author cleanup and deduping. These should be getting more attention. Only after authority is deduped will it be feasible to dedupe works and editions. |
@LeadSongDog do you know which issues those are? |
|
@GerardMeijssen I'm not sure exactly what is required for point #1 -- it would be most helpful and get the most exposure if each of these were opened as separate issues which we can add to our triage spreadsheet. Who is coordinating this Freebase batch import? When/how will we learn when this happens? re: author info, when you say, "please update the information to what Wikidata holds" are you talking about synchronizing the keys? Or pulling in wikidata values into OL? re: "Can we please have a list of all the identifiers for books by authors who have a Wikidata identifier?", is the request for a 1-time data dump (e.g. our existing monthly authors dump)? Or an API to retrieve all authors with wikidata IDs? |
The first thing is to expose BHL identifiers in combination with IA / OL identifiers.
|
The Freebase import is on my radar. When this is done, I will ask (Charles ?) to run the update functionality. |
As to updating the information at OL, I ask for you to import information like date of birth and date of death. Particularly when Wikidata has info and you don't it will be an improvement for the OL readers. When you have information where we do not or where there is a difference, we appreciate a list so that we can curate Wikidata. |
By importing the books for the authors we have in common, at Wikidata we will have the information to enable people to read books from the OL .. We do not necessarily need a dump, what I can do is get authorisation for running a bot.. having you run a bot makes the collaboration even more prominent |
I've moved mine from a comment into the spreadsheet now that I have edit rights. I'll update them with issue numbers, etc later, although they align with what @LeadSongDog.
|
Hoi,
When we link Wikidata OL / IA for BHL, we will gain a lot of friends in
many libraries worldwide. The BHL is already very happy that we are
including data into Wikidata and they will be supremely happy when we
together provide them with an even better service.
Yes, the data of Freebase is stale. It is exactly why the inclusion will be
synchronised with Charles because we will want to update Wikidata
afterwards with the latest and greatest information of OL / IA. This is
easier than the manual process that is happening now. Once this is done,
there will be no new stale information in Wikidata. This makes this a win
for everyone involved.
The only really important thing in what we do is linking identifiers. This
is key, not the associated data. I am very happy when the IA and OL find
use for the Wikidata data. However, it is for them to decide what to do.
Our data is freely available. For me it is key that we collaborate and
share a mission of bringing more and better information, get people to read
is (for me personally) a dream come true.
Thanks,
GerardM
…On 12 March 2018 at 15:43, Tom Morris ***@***.***> wrote:
I've moved mine from a comment into the spreadsheet now that I have edit
rights. I'll update them with issue numbers, etc later, although they align
with what @LeadSongDog <https://github.com/leadsongdog>.
-
BHL - I'm opposed to promoting them until they give at least minimal
credit to IA/OL which is is the source of 99% of their data. The only
reason that BHL identifiers even exist that they mint them to make the
connection to IA opaque.
-
Freebase - Any OL related data from that source needs to be used with
great care, because it dates from a period before *any* author dedupe
had been done, so the OLIDs can point to redirects, deleted records, etc (I
say this as someone who's worked with Freebase since early 2010).
-
Duplicating data in general - I think we should have a more general
discussion about this before we starting copying data to and fro. Having
multiple, duplicate, editable data stores makes the reconciliation problem
very difficult. I'd be much more tempted to push as much of this as
possible to Wikidata and just pull things from there when available. This
includes identifiers, biographical info like birth/death dates, and a whole
host of other data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#845 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AdQumH6ZtofFI4Z1Urz44JBB5yXABVUnks5tdol3gaJpZM4Sl7Ho>
.
|
Those look like they're mostly issues with author records and/or author search. "authority" is an archaic librarian's term rooted in their belief that they're in charge of everything. (Not that I have an issue with authority. :-) ) I added # signs to all the issue numbers so they'll act as hot links. |
Well, I think of authority simply as answers to "Who authored what?", but https://www.loc.gov/standards/mads/mads-doc.html says: The element is a container that includes a standardized "authoritative" form of an agent (person or organization), an event, a title, or a term (topic, genre, geographic). The authority container may only be repeated to give multiple authoritative forms in different languages or scripts
Then, https://www.loc.gov/standards/sourcelist/name-title.html says a bunch more... |
@here -- reminder, this week's Tuesday community call @ 11:30am PT we'll be having 2018 Q2 Planning. Join the call Please nominate issues for Q2 Browse open issues Last quarter's goals; Q1 Evolving project board for Q2 |
@tfmorris sorry to make life difficult. Instead of using the gdoc spreadsheet, I've moved all our Q2 nominations to this board: https://github.com/internetarchive/openlibrary/projects/7 Your list is currently Where applicable, can you please add existing issue cards to your name on that board? And or create issues where necessary for them? Thank you!! I've already added author search dupes (848) to your list. Note, I've also added internationalization (i18n) elsewhere on the board but issues apparently can only belong to a single column (as heads up, just so we don't re-create those existing issues) @GerardMeijssen if you can do the same, that would be a huge help. You mentioned 3 or so points above -- if each of these can be turned into an issue with the correct context, we can add it to the Q2 planning board @LeadSongDog I've already added / migrated all the issues you nominated -- thank you! @hornc + @cdrini + @bfalling if you two can also update the Q2 planning board w/ your issue nominations, it would be a big help! |
Also @mekarpeles @hornc I think you can remove these from the Board. They're almost done! I'm just awaiting Merge for the two for now! |
@GerardMeijssen I'll try to add the issues you listed in this thread |
Here's a 1st draft consolidated/prioritized list of Q2 goals We'll continue to integrate author + search related issues as we have a better idea where to spray the firehose Next Tuesday we'll followup to discuss the final prioritized list. Closing this issue for now! Thanks everyone for adding your issues to the board. |
@LeadSongDog, @hornc, @skylerbunny, @cdrini, @salman-bhai, @bfalling, @whatisgalen
If you can add your top 5-10 issues or features you believe should be prioritized in Q2 (preferably w/ github issue #'s)...
https://docs.google.com/spreadsheets/d/10Cp2xwcLY4NqN5gvDz6Jf8dcsz_eI5cF8lF6ZzfiskI/edit#gid=0
We'll be discussing these points this upcoming Tuesday on zoom @ 11:30pm PT during our weekly community call: https://zoom.us/j/369477551
P.S. @anandology, @EdwardBetts, @rajbot, @mouse-reeve, et al -- if you have features (or votes) you'd like to push for, please feel free to add them to the spreadsheet!
Next Tuesday, we'll take the results of the spreadsheet and fill out https://github.com/internetarchive/openlibrary/projects/6
The text was updated successfully, but these errors were encountered: